[jira] [Commented] (LUCENE-5620) LowerCaseFilter.preserveOriginal

Manuel Lenormand (JIRA) Sat, 19 Apr 2014 14:29:27 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13974977#comment-13974977
 ]


Manuel Lenormand commented on LUCENE-5620:
------------------------------------------

My answer regards a Solr usecase but as it uses the Lucene filters I think it 
can contribute to the discussion.

On one of our morphology projects we discussed the field splitting issue. We 
wanted to enable a stemmed an non stemmed search for these different languages, 
mainly for advanced users who wish to control their search terms.

The drawbacks of the field splitting were 
1) QParser flexibility- (not being forced to use a dismax defType in order to 
query multiple fields in a single query.
 2) "readability" - the developer / user could see in a single place all the 
terms a query could match in an indexed document via the admin UI without 
asking him to understand a parsedQuery string or the qf param.
3) term position - enabling a phrase query that would match "originalTerm 
stemmedTerm". Enabling it in a splitted field would mean saving the original 
term (dictionary and posting) twice,
3) perf (more of an anecdote) - as the terms were generally suffix stemmed we 
had good chances of loading the same term block and posting list to memory as 
they should be sequential.

I do agree a PreserveOriginalSnapshot could be a good resolution

> LowerCaseFilter.preserveOriginal
> --------------------------------
>
>                 Key: LUCENE-5620
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5620
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Mike Sokolov
>         Attachments: LUCENE-5620.patch
>
>
> Following closely the model of LUCENE-5437 (which worked on 
> ASCIIFoldingFilter), this patch adds the ability to preserve the original 
> token to LowerCaseFilter.  This is useful if you want an all-lowercase search 
> term to match without regard to case, while search terms with uppercase 
> letters match in a case-sensitive manner. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-5620) LowerCaseFilter.preserveOriginal

Reply via email to