How to ignore whitespace/ case sensitivity with dedupe
Hi all, I've followed the instructions at this link http://wiki.apache.org/solr/Deduplication and got the basic dedupe field working. However, it doesn't seem to recognize case differences or white space differences even thought I've defined the type of the fields to be used for dedupe as well as the signature field as followings in schema.xml fieldType autoGeneratePhraseQueries=true class=solr.TextField name=text_ws_lower positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType field name=name type=text_ws_lower/ field name=signatureField type=text_ws_lower/ and in the solrconfig.xml updateRequestProcessorChain name=dedupe processor class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory bool name=enabledtrue/bool bool name=overwriteDupesfalse/bool str name=signatureFieldsignatureField/str str name=fieldsname/str str name=signatureClassorg.apache.solr.update.processor.Lookup3Signature/str /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain I know a possible solution is to lowercase and remove white spaces for the field name before submiting documents to solr, but is there any other alternatives so that when the following data is given Name: JOHN SMITH and jOhn SMITh the documents have the same outcome in signatureField? Thanks heaps Cheers tinman -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-ignore-whitespace-case-sensitivity-with-dedupe-tp2997624p2997624.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to ignore whitespace/ case sensitivity with dedupe
By default, stored = true, indexed = true. Any case, this is an example output from solr search console. result name=response numFound=2 start=0 doc str name=id1234/str str name=nameJOHN SMITH /str str name=signatureField5430fbe9e6374611/str/doc doc str name=id1233/str str name=name john SMITh/str str name=signatureField49867a7835ff6741/str/doc /result As you can see, the 2 signature fields are different. And I want the overrides = false as I want to use field collapsing for removing dedupe at query time. Thanks tinman -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-ignore-whitespace-case-sensitivity-with-dedupe-tp2997624p2997738.html Sent from the Solr - User mailing list archive at Nabble.com.