RE: Ignoring Duplicates in Multivalue Field
Hi Ahmet, When I add the RunUpdateProcessorFactory Solr didn't remove any duplications. Any other idea? -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID] Sent: Monday, November 03, 2014 1:35 AM To: solr-user@lucene.apache.org Subject: Re: Ignoring Duplicates in Multivalue Field Hi Tomer, What happens when you add processor class=solr.RunUpdateProcessorFactory / to your chain? Ahmet On Sunday, November 2, 2014 1:22 PM, Tomer Levi tomer.l...@nice.com wrote: Hi, I’m trying to make my “update” request handler ignore multivalue duplications in updates. To make my use case clear, let’s assume my index already contains a document like: { id:”100”, “myMultValueField”: [“1”,”2”,”3”] } Later I would like to send an update like: { id:”100”,” myMultValueField” {“add”:”2”} } How can I make the update request handler understand that “2” already exist and ignore it? I tried to add update chain below but it didn’t work for me. updateRequestProcessorChain name=uniq-fields processor class=solr.UniqFieldsUpdateProcessorFactory str name=fieldRegex myMultValueField /str /processor /updateRequestProcessorChain And add it to my requestHandler: requestHandler name=/update class=solr.UpdateRequestHandler lst name=defaults str name=update.chainuniq-fields/str /lst /requestHandler Tomer Levi Software Engineer Big Data Group Product Technology Unit (T) +972 (9) 775-2693 tomer.l...@nice.com www.nice.com
Re: Ignoring Duplicates in Multivalue Field
The update processors are only processing the values in the source data, not the data that has already been indexed and stored. We probably need to file a Jira to add an insert field value option that merges in the new field value, skipping it if it already exists or appending it to the end of the existing list of field values for a multivalued field. You could try... a combination of both remove and add, assuming that Solr applies them in the order specified, to remove any existing value and then add it to the end. See: https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents -- Jack Krupansky -Original Message- From: Tomer Levi Sent: Monday, November 3, 2014 4:19 AM To: solr-user@lucene.apache.org ; Ahmet Arslan Subject: RE: Ignoring Duplicates in Multivalue Field Hi Ahmet, When I add the RunUpdateProcessorFactory Solr didn't remove any duplications. Any other idea? -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID] Sent: Monday, November 03, 2014 1:35 AM To: solr-user@lucene.apache.org Subject: Re: Ignoring Duplicates in Multivalue Field Hi Tomer, What happens when you add processor class=solr.RunUpdateProcessorFactory / to your chain? Ahmet On Sunday, November 2, 2014 1:22 PM, Tomer Levi tomer.l...@nice.com wrote: Hi, I’m trying to make my “update” request handler ignore multivalue duplications in updates. To make my use case clear, let’s assume my index already contains a document like: { id:”100”, “myMultValueField”: [“1”,”2”,”3”] } Later I would like to send an update like: { id:”100”,” myMultValueField” {“add”:”2”} } How can I make the update request handler understand that “2” already exist and ignore it? I tried to add update chain below but it didn’t work for me. updateRequestProcessorChain name=uniq-fields processor class=solr.UniqFieldsUpdateProcessorFactory str name=fieldRegex myMultValueField /str /processor /updateRequestProcessorChain And add it to my requestHandler: requestHandler name=/update class=solr.UpdateRequestHandler lst name=defaults str name=update.chainuniq-fields/str /lst /requestHandler Tomer Levi Software Engineer Big Data Group Product Technology Unit (T) +972 (9) 775-2693 tomer.l...@nice.com www.nice.com
Re: Ignoring Duplicates in Multivalue Field
From memory, if you use UniqFieldsUpdateProcessor after DistributedUpdateProcessor, then you will be filtering on the set [1, 2, 3, 2]. *updateRequestProcessorChain name=uniq-fields* *processor class=solr.DistributedUpdateProcessorFactory/ * * processor class=solr.UniqFieldsUpdateProcessorFactory* *str name=fieldRegex myMultValueField /str* * /processor* *processor class=solr.RunUpdateProcessorFactory /* */updateRequestProcessorChain* On 4 November 2014 01:37, Jack Krupansky j...@basetechnology.com wrote: The update processors are only processing the values in the source data, not the data that has already been indexed and stored. We probably need to file a Jira to add an insert field value option that merges in the new field value, skipping it if it already exists or appending it to the end of the existing list of field values for a multivalued field. You could try... a combination of both remove and add, assuming that Solr applies them in the order specified, to remove any existing value and then add it to the end. See: https://cwiki.apache.org/confluence/display/solr/ Updating+Parts+of+Documents -- Jack Krupansky -Original Message- From: Tomer Levi Sent: Monday, November 3, 2014 4:19 AM To: solr-user@lucene.apache.org ; Ahmet Arslan Subject: RE: Ignoring Duplicates in Multivalue Field Hi Ahmet, When I add the RunUpdateProcessorFactory Solr didn't remove any duplications. Any other idea? -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID] Sent: Monday, November 03, 2014 1:35 AM To: solr-user@lucene.apache.org Subject: Re: Ignoring Duplicates in Multivalue Field Hi Tomer, What happens when you add processor class=solr.RunUpdateProcessorFactory / to your chain? Ahmet On Sunday, November 2, 2014 1:22 PM, Tomer Levi tomer.l...@nice.com wrote: Hi, I’m trying to make my “update” request handler ignore multivalue duplications in updates. To make my use case clear, let’s assume my index already contains a document like: { id:”100”, “myMultValueField”: [“1”,”2”,”3”] } Later I would like to send an update like: { id:”100”,” myMultValueField” {“add”:”2”} } How can I make the update request handler understand that “2” already exist and ignore it? I tried to add update chain below but it didn’t work for me. updateRequestProcessorChain name=uniq-fields processor class=solr.UniqFieldsUpdateProcessorFacto ry str name=fieldRegex myMultValueField /str /processor /updateRequestProcessorChain And add it to my requestHandler: requestHandler name=/update class=solr.UpdateRequestHandler lst name=defaults str name=update.chainuniq-fields/str /lst /requestHandler Tomer Levi Software Engineer Big Data Group Product Technology Unit (T) +972 (9) 775-2693 tomer.l...@nice.com www.nice.com
Ignoring Duplicates in Multivalue Field
Hi, I'm trying to make my update request handler ignore multivalue duplications in updates. To make my use case clear, let's assume my index already contains a document like: { id:100, myMultValueField: [1,2,3] } Later I would like to send an update like: { id:100, myMultValueField {add:2} } How can I make the update request handler understand that 2 already exist and ignore it? I tried to add update chain below but it didn't work for me. updateRequestProcessorChain name=uniq-fields processor class=solr.UniqFieldsUpdateProcessorFactory str name=fieldRegex myMultValueField /str /processor /updateRequestProcessorChain And add it to my requestHandler: requestHandler name=/update class=solr.UpdateRequestHandler lst name=defaults str name=update.chainuniq-fields/str /lst /requestHandler Tomer Levi Software Engineer Big Data Group Product Technology Unit (T) +972 (9) 775-2693 tomer.l...@nice.commailto:tomer.l...@nice.com www.nice.comhttp://www.nice.com/ [cid:image001.png@01CFF69B.BA456EB0]http://twitter.com/NICE_Systems/[cid:image002.png@01CFF69B.BA456EB0]http://www.facebook.com/pages/NICE-Systems/149072782602/[cid:image003.png@01CFF69B.BA456EB0]http://www.linkedin.com/company/nice-systems[cid:image004.png@01CFF69B.BA456EB0]http://www.nice.com/blog [cid:image005.jpg@01CFF69B.BA456EB0]http://www.nice.com/big-data-solutions
Re: Ignoring Duplicates in Multivalue Field
Hi Tomer, What happens when you add processor class=solr.RunUpdateProcessorFactory / to your chain? Ahmet On Sunday, November 2, 2014 1:22 PM, Tomer Levi tomer.l...@nice.com wrote: Hi, I’m trying to make my “update” request handler ignore multivalue duplications in updates. To make my use case clear, let’s assume my index already contains a document like: { id:”100”, “myMultValueField”: [“1”,”2”,”3”] } Later I would like to send an update like: { id:”100”,” myMultValueField” {“add”:”2”} } How can I make the update request handler understand that “2” already exist and ignore it? I tried to add update chain below but it didn’t work for me. updateRequestProcessorChain name=uniq-fields processor class=solr.UniqFieldsUpdateProcessorFactory str name=fieldRegex myMultValueField /str /processor /updateRequestProcessorChain And add it to my requestHandler: requestHandler name=/update class=solr.UpdateRequestHandler lst name=defaults str name=update.chainuniq-fields/str /lst /requestHandler Tomer Levi Software Engineer Big Data Group Product Technology Unit (T) +972 (9) 775-2693 tomer.l...@nice.com www.nice.com