[
https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818593#comment-13818593
]
Furkan KAMACI commented on SOLR-5332:
-------------------------------------
I've added preserveOriginal capability to EdgeNGramFilterFactory and attached a
patch to SOLR-5152. I want to make clear something about the problem that is
pointed at this issue. The schema that is described at here:
http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html
uses LowerCaseFilterFactory before EdgeNGramFilterFactory. There is an
explanation about it:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LowerCaseTokenizerFactory
and says that: "Creates tokens by lowercasing all letters and dropping
non-letters." So non-letters will be dropped "before" tokens are retrieved by
EdgeNGramFilterFactory.
My patch preserves original token if preserveOriginal is set to true and token
length is less than minGramSize or greater than maxGramSize.
> Add "preserve original" setting to the EdgeNGramFilterFactory
> -------------------------------------------------------------
>
> Key: SOLR-5332
> URL: https://issues.apache.org/jira/browse/SOLR-5332
> Project: Solr
> Issue Type: Wish
> Reporter: Alexander S.
>
> Hi, as described here:
> http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html
> the problem is in that if you have these 2 strings to index:
> 1. facebook.com/someuser.1
> 2. facebook.com/someveryandverylongusername
> and the edge ngram filter factory with min and max gram size settings 2 and
> 25, search requests for these urls will fail.
> But search requests for:
> 1. facebook.com/someuser
> 2. facebook.com/someveryandverylonguserna
> will work properly.
> It's because first url has "1" at the end, which is lover than the allowed
> min gram size. In the second url the user name is longer than the max gram
> size (27 characters).
> Would be good to have a "preserve original" option, that will add the
> original string to the index if it does not fit the allowed gram size, so
> that "1" and "someveryandverylongusername" tokens will also be added to the
> index.
> Best,
> Alex
--
This message was sent by Atlassian JIRA
(v6.1#6144)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]