[ 
https://issues.apache.org/jira/browse/SOLR-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959054#comment-13959054
 ] 

Steve Rowe commented on SOLR-4864:
----------------------------------

Sunil, one other thing I forgot to mention: you should add an example of using 
the param to the class javadocs on {{RegexReplaceProcessorFactory}}, and also 
explain how the new param works.

> RegexReplaceProcessorFactory should support pattern capture group 
> substitution in replacement string
> ----------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-4864
>                 URL: https://issues.apache.org/jira/browse/SOLR-4864
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>    Affects Versions: 4.3
>            Reporter: Jack Krupansky
>         Attachments: SOLR-4864.patch
>
>
> It is unfortunate the the replacement string for RegexReplaceProcessorFactory 
> is a pure, "quoted" (escaped) literal and does not support pattern capture 
> group substitution. This processor should be enhanced to support full, 
> standard pattern capture group substitution.
> The test case I used:
> {code}
>   <updateRequestProcessorChain name="regex-mark-special-words">
>     <processor class="solr.RegexReplaceProcessorFactory">
>       <str name="fieldRegex">.*</str>
>       <str name="pattern">([^a-zA-Z]|^)(cat|dog|fox)([^a-zA-Z]|$)</str>
>       <str name="replacement">$1&lt;&lt;$2&gt;&gt;$3</str>
>     </processor>
>     <processor class="solr.LogUpdateProcessorFactory" />
>     <processor class="solr.RunUpdateProcessorFactory" />
>   </updateRequestProcessorChain>
> {code}
> Indexing with this command against the standard Solr example with the above 
> addition to solrconfig:
> {code}
>   curl 
> "http://localhost:8983/solr/update?commit=true&update.chain=regex-mark-special-words";
>  \
>   -H 'Content-type:application/json' -d '
>   [{"id": "doc-1",
>     "title": "Hello World",
>     "content": "The cat and the dog jumped over the fox.",
>     "other_ss": ["cat","cat bird", "lazy dog", "red fox den"]}]'
> {code}
> Alas, the resulting document consists of:
> {code}
>   "id":"doc-1",
>   "title":["Hello World"],
>   "content":["The$1<<$2>>$3and the$1<<$2>>$3jumped over the$1<<$2>>$3"],
>   "other_ss":["$1<<$2>>$3",
>     "$1<<$2>>$3bird",
>     "lazy$1<<$2>>$3",
>     "red$1<<$2>>$3den"],
> {code}
> The Javadoc for RegexReplaceProcessorFactory uses the exact same terminology 
> of  "replacement string", as does Java's Matcher.replaceAll, but clearly the 
> semantics are distinct, with replaceAll supporting pattern capture group 
> substitution for its "replacement string", while RegexReplaceProcessorFactory 
> interprets "replacement string" as being a literal. At a minimum, the 
> RegexReplaceProcessorFactory Javadoc should explicitly state that the string 
> is a literal that does not support pattern capture group substitution.
> The relevant code in RegexReplaceProcessorFactory#init:
> {code}
> replacement = Matcher.quoteReplacement(replacementParam.toString());
> {code}
> Possible options for the enhancement:
> 1. Simply skip the quoteReplacement and fully support pattern capture group 
> substitution with no additional changes. Does have a minor backcompat issue.
> 2. Add an alternative to "replacement", say "nonQuotedReplacement" that is 
> not quoted as "replacement" is.
> 3. Add an option, say "quotedReplacement" that defaults to "true" for 
> backcompat, but can be set to "false" to support full replaceAll pattern 
> capture group substitution.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to