RE: Ignoring Duplicates in Multivalue Field

2014-11-03 Thread Tomer Levi
Hi Ahmet,
When I add the RunUpdateProcessorFactory Solr didn't remove any duplications.
Any other idea?


-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID] 
Sent: Monday, November 03, 2014 1:35 AM
To: solr-user@lucene.apache.org
Subject: Re: Ignoring Duplicates in Multivalue Field

Hi Tomer,

What happens when you add   processor class=solr.RunUpdateProcessorFactory 
/ to your chain?

Ahmet



On Sunday, November 2, 2014 1:22 PM, Tomer Levi tomer.l...@nice.com wrote:



Hi,
I’m trying to make my “update” request handler ignore multivalue duplications 
in updates.
To make my use case clear, let’s assume my index already contains a document 
like:
{
   id:”100”, 
 “myMultValueField”: [“1”,”2”,”3”]
}

Later I would like to send an update like:
{
   id:”100”,” 
   myMultValueField” {“add”:”2”}
}

How can I make the update request handler understand that “2” already exist and 
ignore it?
I tried to add update chain below but it didn’t work for me.

updateRequestProcessorChain name=uniq-fields
   processor class=solr.UniqFieldsUpdateProcessorFactory
str name=fieldRegex myMultValueField /str
  /processor
   /updateRequestProcessorChain

And add it to my requestHandler:
requestHandler name=/update class=solr.UpdateRequestHandler   
   lst name=defaults
 str name=update.chainuniq-fields/str
   /lst
/requestHandler

Tomer Levi 
Software Engineer  
Big Data Group 
Product  Technology Unit 
(T) +972 (9) 775-2693 

tomer.l...@nice.com  
www.nice.com   


Re: Ignoring Duplicates in Multivalue Field

2014-11-03 Thread Jack Krupansky
The update processors are only processing the values in the source data, 
not the data that has already been indexed and stored.


We probably need to file a Jira to add an insert field value option that 
merges in the new field value, skipping it if it already exists or appending 
it to the end of the existing list of field values for a multivalued field.


You could try... a combination of both remove and add, assuming that 
Solr applies them in the order specified, to remove any existing value and 
then add it to the end.


See:
https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents

-- Jack Krupansky

-Original Message- 
From: Tomer Levi

Sent: Monday, November 3, 2014 4:19 AM
To: solr-user@lucene.apache.org ; Ahmet Arslan
Subject: RE: Ignoring Duplicates in Multivalue Field

Hi Ahmet,
When I add the RunUpdateProcessorFactory Solr didn't remove any 
duplications.

Any other idea?


-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID]
Sent: Monday, November 03, 2014 1:35 AM
To: solr-user@lucene.apache.org
Subject: Re: Ignoring Duplicates in Multivalue Field

Hi Tomer,

What happens when you add   processor 
class=solr.RunUpdateProcessorFactory / to your chain?


Ahmet



On Sunday, November 2, 2014 1:22 PM, Tomer Levi tomer.l...@nice.com wrote:



Hi,
I’m trying to make my “update” request handler ignore multivalue 
duplications in updates.
To make my use case clear, let’s assume my index already contains a document 
like:

{
  id:”100”,
“myMultValueField”: [“1”,”2”,”3”]
}

Later I would like to send an update like:
{
  id:”100”,”
  myMultValueField” {“add”:”2”}
}

How can I make the update request handler understand that “2” already exist 
and ignore it?

I tried to add update chain below but it didn’t work for me.

updateRequestProcessorChain name=uniq-fields
  processor class=solr.UniqFieldsUpdateProcessorFactory
   str name=fieldRegex myMultValueField 
/str

 /processor
  /updateRequestProcessorChain

And add it to my requestHandler:
requestHandler name=/update class=solr.UpdateRequestHandler
  lst name=defaults
str name=update.chainuniq-fields/str
  /lst
   /requestHandler

Tomer Levi
Software Engineer
Big Data Group
Product  Technology Unit
(T) +972 (9) 775-2693

tomer.l...@nice.com
www.nice.com 



Re: Ignoring Duplicates in Multivalue Field

2014-11-03 Thread Matthew Nigl
From memory, if you use UniqFieldsUpdateProcessor after
DistributedUpdateProcessor, then you will be filtering on the set [1,
2, 3, 2].

*updateRequestProcessorChain name=uniq-fields*

   *processor
class=solr.DistributedUpdateProcessorFactory/ *

*   processor
class=solr.UniqFieldsUpdateProcessorFactory*

*str name=fieldRegex myMultValueField
/str*

*  /processor*

  *processor class=solr.RunUpdateProcessorFactory /*

*/updateRequestProcessorChain*

On 4 November 2014 01:37, Jack Krupansky j...@basetechnology.com wrote:

 The update processors are only processing the values in the source data,
 not the data that has already been indexed and stored.

 We probably need to file a Jira to add an insert field value option that
 merges in the new field value, skipping it if it already exists or
 appending it to the end of the existing list of field values for a
 multivalued field.

 You could try... a combination of both remove and add, assuming that
 Solr applies them in the order specified, to remove any existing value and
 then add it to the end.

 See:
 https://cwiki.apache.org/confluence/display/solr/
 Updating+Parts+of+Documents

 -- Jack Krupansky

 -Original Message- From: Tomer Levi
 Sent: Monday, November 3, 2014 4:19 AM
 To: solr-user@lucene.apache.org ; Ahmet Arslan
 Subject: RE: Ignoring Duplicates in Multivalue Field


 Hi Ahmet,
 When I add the RunUpdateProcessorFactory Solr didn't remove any
 duplications.
 Any other idea?


 -Original Message-
 From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID]
 Sent: Monday, November 03, 2014 1:35 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Ignoring Duplicates in Multivalue Field

 Hi Tomer,

 What happens when you add   processor class=solr.RunUpdateProcessorFactory
 / to your chain?

 Ahmet



 On Sunday, November 2, 2014 1:22 PM, Tomer Levi tomer.l...@nice.com
 wrote:



 Hi,
 I’m trying to make my “update” request handler ignore multivalue
 duplications in updates.
 To make my use case clear, let’s assume my index already contains a
 document like:
 {
   id:”100”,
 “myMultValueField”: [“1”,”2”,”3”]
 }

 Later I would like to send an update like:
 {
   id:”100”,”
   myMultValueField” {“add”:”2”}
 }

 How can I make the update request handler understand that “2” already
 exist and ignore it?
 I tried to add update chain below but it didn’t work for me.

 updateRequestProcessorChain name=uniq-fields
   processor class=solr.UniqFieldsUpdateProcessorFacto
 ry
str name=fieldRegex myMultValueField
 /str
  /processor
   /updateRequestProcessorChain

 And add it to my requestHandler:
 requestHandler name=/update class=solr.UpdateRequestHandler
   lst name=defaults
 str name=update.chainuniq-fields/str
   /lst
/requestHandler

 Tomer Levi
 Software Engineer
 Big Data Group
 Product  Technology Unit
 (T) +972 (9) 775-2693

 tomer.l...@nice.com
 www.nice.com



Ignoring Duplicates in Multivalue Field

2014-11-02 Thread Tomer Levi
Hi,
I'm trying to make my update request handler ignore multivalue duplications 
in updates.
To make my use case clear, let's assume my index already contains a document 
like:
{
   id:100,
 myMultValueField: [1,2,3]
}

Later I would like to send an update like:
{
   id:100,
   myMultValueField {add:2}
}

How can I make the update request handler understand that 2 already exist and 
ignore it?
I tried to add update chain below but it didn't work for me.

updateRequestProcessorChain name=uniq-fields
   processor class=solr.UniqFieldsUpdateProcessorFactory
str name=fieldRegex myMultValueField /str
  /processor
   /updateRequestProcessorChain

And add it to my requestHandler:
requestHandler name=/update class=solr.UpdateRequestHandler
   lst name=defaults
 str name=update.chainuniq-fields/str
   /lst
/requestHandler

Tomer Levi

Software Engineer
Big Data Group

Product  Technology Unit

(T) +972 (9) 775-2693



tomer.l...@nice.commailto:tomer.l...@nice.com

www.nice.comhttp://www.nice.com/

[cid:image001.png@01CFF69B.BA456EB0]http://twitter.com/NICE_Systems/[cid:image002.png@01CFF69B.BA456EB0]http://www.facebook.com/pages/NICE-Systems/149072782602/[cid:image003.png@01CFF69B.BA456EB0]http://www.linkedin.com/company/nice-systems[cid:image004.png@01CFF69B.BA456EB0]http://www.nice.com/blog




[cid:image005.jpg@01CFF69B.BA456EB0]http://www.nice.com/big-data-solutions





Re: Ignoring Duplicates in Multivalue Field

2014-11-02 Thread Ahmet Arslan
Hi Tomer,

What happens when you add   processor class=solr.RunUpdateProcessorFactory 
/ to your chain?

Ahmet



On Sunday, November 2, 2014 1:22 PM, Tomer Levi tomer.l...@nice.com wrote:



Hi,
I’m trying to make my “update” request handler ignore multivalue duplications 
in updates.
To make my use case clear, let’s assume my index already contains a document 
like:
{
   id:”100”, 
 “myMultValueField”: [“1”,”2”,”3”]
}

Later I would like to send an update like:
{
   id:”100”,” 
   myMultValueField” {“add”:”2”}
}

How can I make the update request handler understand that “2” already exist and 
ignore it?
I tried to add update chain below but it didn’t work for me.

updateRequestProcessorChain name=uniq-fields
   processor class=solr.UniqFieldsUpdateProcessorFactory
str name=fieldRegex myMultValueField /str
  /processor
   /updateRequestProcessorChain

And add it to my requestHandler:
requestHandler name=/update class=solr.UpdateRequestHandler   
   lst name=defaults
 str name=update.chainuniq-fields/str
   /lst
/requestHandler

Tomer Levi 
Software Engineer  
Big Data Group 
Product  Technology Unit 
(T) +972 (9) 775-2693 

tomer.l...@nice.com  
www.nice.com