[ 
https://issues.apache.org/jira/browse/SOLR-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13954701#comment-13954701
 ] 

Alaknantha commented on SOLR-3862:
----------------------------------

Yonik, Are you looking at the older patch? 
https://issues.apache.org/jira/secure/attachment/12637687/SOLR-3862.patch is my 
latest patch where I got rid of the regular expression usage. 

**1) Do we want a way to specify the removal of multiple values?
Perhaps "remove" : [ "A","B","C" ]
***Yes, this patch supports removal of multiple values.

**2) What are the downsides to using regex? Someone may not realize that the 
values being used are regular expressions until they are in production and 
values that accidentally have wildcards in them are used? Or they may simply 
forget to do wildcard escaping code since everything would "just work" until 
they did encounter them?
***Yes, I ran into this regular expression issue when I tried to use the field 
modifier "remove" provided by the older patch 
https://issues.apache.org/jira/secure/attachment/12589605/SOLR-3862-3.patch in 
my project. That's why I got rid of the usage of the regular expression and use 
the "value" comparisons. Here is the issue that I ran into:

When invoked in the zoo keeper mode, the input field value comes in as a list 
whereas bypassing ZK and directly hitting Solr, it comes in as a String to the 
patch code segment to handle the atomic updates using field modifier "remove". 

The original patch SOLR-3862-3.patch creates a regular expression pattern on 
the incoming field value to be removed. The pattern is used to create a matcher 
and iterate through the original list of values. If the incoming field value is 
a list, the matcher does not match correctly because of the additional 
parenthesis like below:

In the below example, "CA" is sent in as a input field value to be removed. 
Since the patch code was using the toString(), the list values are encapsulated 
within the parenthesis like [CA]. So, this pattern can match only to "C" or "A" 
and not to "CA". So, I had to get into the Solr code to troubleshoot this issue

Hitting Solr using Zoo Keeper:
Pattern p = Pattern.compile("[CA]");
Matcher m = p.matcher("CA");
boolean b = m.matches();  returns false and so the remove does not work if the 
incoming field value comes in a list.

Hitting Solr directly:
Pattern p = Pattern.compile("CA");
Matcher m = p.matcher("CA");
boolean b = m.matches();  returns true  and so the remove works if the incoming 
field value comes in as a String.

**3) Perhaps we want a separate way to specify "value" vs "regex". I assume 
"value" will be a much more common usecase than regex (although I do like the 
power that regex brings).
***I agree with you that "value" is a most common use case and that's the 
reason, I got rid of the "regex".

Please review this patch 
https://issues.apache.org/jira/secure/attachment/12637687/SOLR-3862.patch that 
has my fix.


> add "remove" as update option for atomically removing a value from a 
> multivalued field
> --------------------------------------------------------------------------------------
>
>                 Key: SOLR-3862
>                 URL: https://issues.apache.org/jira/browse/SOLR-3862
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrCloud
>    Affects Versions: 4.0-BETA
>            Reporter: Jim Musil
>            Assignee: Erick Erickson
>         Attachments: SOLR-3862-2.patch, SOLR-3862-3.patch, SOLR-3862-4.patch, 
> SOLR-3862.patch, SOLR-3862.patch
>
>
> Currently you can atomically "add" a value to a multivalued field. It would 
> be useful to be able to "remove" a value from a multivalued field. 
> When you "set" a multivalued field to null, it destroys all values.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to