[jira] [Commented] (SOLR-5332) Add "preserve original" setting to the EdgeNGramFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510060#comment-16510060 ] Ingomar Wesp commented on SOLR-5332: Given that LUCENE-7960 has been closed, I think this issue can be marked as fixed, too. > Add "preserve original" setting to the EdgeNGramFilterFactory > - > > Key: SOLR-5332 > URL: https://issues.apache.org/jira/browse/SOLR-5332 > Project: Solr > Issue Type: Wish >Affects Versions: 4.4, 4.5, 4.5.1, 4.6 >Reporter: Alexander S. >Priority: Major > Fix For: 5.2, 6.0 > > > Hi, as described here: > http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html > the problem is in that if you have these 2 strings to index: > 1. facebook.com/someuser.1 > 2. facebook.com/someveryandverylongusername > and the edge ngram filter factory with min and max gram size settings 2 and > 25, search requests for these urls will fail. > But search requests for: > 1. facebook.com/someuser > 2. facebook.com/someveryandverylonguserna > will work properly. > It's because first url has "1" at the end, which is lover than the allowed > min gram size. In the second url the user name is longer than the max gram > size (27 characters). > Would be good to have a "preserve original" option, that will add the > original string to the index if it does not fit the allowed gram size, so > that "1" and "someveryandverylongusername" tokens will also be added to the > index. > Best, > Alex -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5332) Add "preserve original" setting to the EdgeNGramFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423817#comment-16423817 ] Thomas Wöckinger commented on SOLR-5332: So what can be done to get this into the main line? > Add "preserve original" setting to the EdgeNGramFilterFactory > - > > Key: SOLR-5332 > URL: https://issues.apache.org/jira/browse/SOLR-5332 > Project: Solr > Issue Type: Wish >Affects Versions: 4.4, 4.5, 4.5.1, 4.6 >Reporter: Alexander S. >Priority: Major > Fix For: 5.2, 6.0 > > > Hi, as described here: > http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html > the problem is in that if you have these 2 strings to index: > 1. facebook.com/someuser.1 > 2. facebook.com/someveryandverylongusername > and the edge ngram filter factory with min and max gram size settings 2 and > 25, search requests for these urls will fail. > But search requests for: > 1. facebook.com/someuser > 2. facebook.com/someveryandverylonguserna > will work properly. > It's because first url has "1" at the end, which is lover than the allowed > min gram size. In the second url the user name is longer than the max gram > size (27 characters). > Would be good to have a "preserve original" option, that will add the > original string to the index if it does not fit the allowed gram size, so > that "1" and "someveryandverylongusername" tokens will also be added to the > index. > Best, > Alex -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345260#comment-14345260 ] Furkan KAMACI commented on SOLR-5332: - [~simon.endele] You can check my patch at SOLR-5152. I've applied a patch there and this issue become a duplicate. Add preserve original setting to the EdgeNGramFilterFactory - Key: SOLR-5332 URL: https://issues.apache.org/jira/browse/SOLR-5332 Project: Solr Issue Type: Wish Affects Versions: 4.4, 4.5, 4.5.1, 4.6 Reporter: Alexander S. Fix For: 5.1 Hi, as described here: http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html the problem is in that if you have these 2 strings to index: 1. facebook.com/someuser.1 2. facebook.com/someveryandverylongusername and the edge ngram filter factory with min and max gram size settings 2 and 25, search requests for these urls will fail. But search requests for: 1. facebook.com/someuser 2. facebook.com/someveryandverylonguserna will work properly. It's because first url has 1 at the end, which is lover than the allowed min gram size. In the second url the user name is longer than the max gram size (27 characters). Would be good to have a preserve original option, that will add the original string to the index if it does not fit the allowed gram size, so that 1 and someveryandverylongusername tokens will also be added to the index. Best, Alex -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343414#comment-14343414 ] Simon Endele commented on SOLR-5332: +1 for this feature. We use the EdgeNGramFilterFactory on a tokenized field (in order to implement a prefix search on index time) with minGramSize=3. Unfortunately we observed that tokens with length 1 or 2 are actually deleted, unexpectedly from our point of view. Using a second field (though complicated IMHO) would address query-issues, but it gets awkward when it comes to highlighting or phrase searches. For instance when searching for us rep - the field with EdgeNGramFilterFactory highlights rep in representative, but not US as this token has been removed, - the field without EdgeNGramFilterFactory highlights US, but not representative as it has no prefixes indexed. Bringing these highlightings together in one string is a quite complex task. Not speaking of a phrase search, which does not work at all for the example above. We use minGramSize=3 to reduce collisions of prefixes and abbreviations (like US and usage) and reduce the index size. I admit, this does not prevent all collisions (e.g. USA still collides with usage), but it's a compromise. Nevertheless, minGramSize is a nice feature of EdgeNGramFilterFactory, but it lacks a preserveOriginal flag IMO. Add preserve original setting to the EdgeNGramFilterFactory - Key: SOLR-5332 URL: https://issues.apache.org/jira/browse/SOLR-5332 Project: Solr Issue Type: Wish Affects Versions: 4.4, 4.5, 4.5.1, 4.6 Reporter: Alexander S. Hi, as described here: http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html the problem is in that if you have these 2 strings to index: 1. facebook.com/someuser.1 2. facebook.com/someveryandverylongusername and the edge ngram filter factory with min and max gram size settings 2 and 25, search requests for these urls will fail. But search requests for: 1. facebook.com/someuser 2. facebook.com/someveryandverylonguserna will work properly. It's because first url has 1 at the end, which is lover than the allowed min gram size. In the second url the user name is longer than the max gram size (27 characters). Would be good to have a preserve original option, that will add the original string to the index if it does not fit the allowed gram size, so that 1 and someveryandverylongusername tokens will also be added to the index. Best, Alex -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13833582#comment-13833582 ] Furkan KAMACI commented on SOLR-5332: - [~aheaven] if you change the Fix Version/s to the next release this issue can be regarded. Add preserve original setting to the EdgeNGramFilterFactory - Key: SOLR-5332 URL: https://issues.apache.org/jira/browse/SOLR-5332 Project: Solr Issue Type: Wish Affects Versions: 4.4 Reporter: Alexander S. Hi, as described here: http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html the problem is in that if you have these 2 strings to index: 1. facebook.com/someuser.1 2. facebook.com/someveryandverylongusername and the edge ngram filter factory with min and max gram size settings 2 and 25, search requests for these urls will fail. But search requests for: 1. facebook.com/someuser 2. facebook.com/someveryandverylonguserna will work properly. It's because first url has 1 at the end, which is lover than the allowed min gram size. In the second url the user name is longer than the max gram size (27 characters). Would be good to have a preserve original option, that will add the original string to the index if it does not fit the allowed gram size, so that 1 and someveryandverylongusername tokens will also be added to the index. Best, Alex -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13833811#comment-13833811 ] Robert Muir commented on SOLR-5332: --- Why not just use another field? Its the same cost either way as this setting: except it works today and we dont have to maintain it. Additionally you maintain more control: you can control boosting etc across the different fields Add preserve original setting to the EdgeNGramFilterFactory - Key: SOLR-5332 URL: https://issues.apache.org/jira/browse/SOLR-5332 Project: Solr Issue Type: Wish Affects Versions: 4.4, 4.5, 4.5.1, 4.6 Reporter: Alexander S. Hi, as described here: http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html the problem is in that if you have these 2 strings to index: 1. facebook.com/someuser.1 2. facebook.com/someveryandverylongusername and the edge ngram filter factory with min and max gram size settings 2 and 25, search requests for these urls will fail. But search requests for: 1. facebook.com/someuser 2. facebook.com/someveryandverylonguserna will work properly. It's because first url has 1 at the end, which is lover than the allowed min gram size. In the second url the user name is longer than the max gram size (27 characters). Would be good to have a preserve original option, that will add the original string to the index if it does not fit the allowed gram size, so that 1 and someveryandverylongusername tokens will also be added to the index. Best, Alex -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13834074#comment-13834074 ] James Dyer commented on SOLR-5332: -- We have a use case where we use a modified version of EdgeNGramFilter to preserve the original. The field we used this on is multi-valued. We change all user queries against the field to phrases with slop to prevent partial matches across values. But our users also want to be able to enter sub-strings on this field. (Because all queries are phrase queries, wildcards are not an option.) So had this functionality existed we would have been spared of having to implement it ourselves. (I didn't contribute the code because I couldn't imagine it had broad applicability. But it seems that with this issue, at least a few others out there have cases for it as well) Add preserve original setting to the EdgeNGramFilterFactory - Key: SOLR-5332 URL: https://issues.apache.org/jira/browse/SOLR-5332 Project: Solr Issue Type: Wish Affects Versions: 4.4, 4.5, 4.5.1, 4.6 Reporter: Alexander S. Hi, as described here: http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html the problem is in that if you have these 2 strings to index: 1. facebook.com/someuser.1 2. facebook.com/someveryandverylongusername and the edge ngram filter factory with min and max gram size settings 2 and 25, search requests for these urls will fail. But search requests for: 1. facebook.com/someuser 2. facebook.com/someveryandverylonguserna will work properly. It's because first url has 1 at the end, which is lover than the allowed min gram size. In the second url the user name is longer than the max gram size (27 characters). Would be good to have a preserve original option, that will add the original string to the index if it does not fit the allowed gram size, so that 1 and someveryandverylongusername tokens will also be added to the index. Best, Alex -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13834135#comment-13834135 ] Robert Muir commented on SOLR-5332: --- James but the issue is still the same. There is no savings of doing this in the same field! So to me its more clear to query on foo_exact:whatever if you want an exact match versus doing it in a roundabout way with a sloppy phrase query. Add preserve original setting to the EdgeNGramFilterFactory - Key: SOLR-5332 URL: https://issues.apache.org/jira/browse/SOLR-5332 Project: Solr Issue Type: Wish Affects Versions: 4.4, 4.5, 4.5.1, 4.6 Reporter: Alexander S. Hi, as described here: http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html the problem is in that if you have these 2 strings to index: 1. facebook.com/someuser.1 2. facebook.com/someveryandverylongusername and the edge ngram filter factory with min and max gram size settings 2 and 25, search requests for these urls will fail. But search requests for: 1. facebook.com/someuser 2. facebook.com/someveryandverylonguserna will work properly. It's because first url has 1 at the end, which is lover than the allowed min gram size. In the second url the user name is longer than the max gram size (27 characters). Would be good to have a preserve original option, that will add the original string to the index if it does not fit the allowed gram size, so that 1 and someveryandverylongusername tokens will also be added to the index. Best, Alex -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13834141#comment-13834141 ] James Dyer commented on SOLR-5332: -- there is if a user enters 2 keywords, the one matches an edgengram and the other matches an original keyword. Our case involves book contributors. If a book has 2 contributors, John Smith Edward Jones, we want the user to get a result if they query edward jones or e jones or ed jones, but not edward smith nor e smith, etc. The only solution I could come up with involved with a combination of edge n-grams and the original keywords in the same field. I think there are valid usecases for this, perhaps not very many. Add preserve original setting to the EdgeNGramFilterFactory - Key: SOLR-5332 URL: https://issues.apache.org/jira/browse/SOLR-5332 Project: Solr Issue Type: Wish Affects Versions: 4.4, 4.5, 4.5.1, 4.6 Reporter: Alexander S. Hi, as described here: http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html the problem is in that if you have these 2 strings to index: 1. facebook.com/someuser.1 2. facebook.com/someveryandverylongusername and the edge ngram filter factory with min and max gram size settings 2 and 25, search requests for these urls will fail. But search requests for: 1. facebook.com/someuser 2. facebook.com/someveryandverylonguserna will work properly. It's because first url has 1 at the end, which is lover than the allowed min gram size. In the second url the user name is longer than the max gram size (27 characters). Would be good to have a preserve original option, that will add the original string to the index if it does not fit the allowed gram size, so that 1 and someveryandverylongusername tokens will also be added to the index. Best, Alex -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13834223#comment-13834223 ] Furkan KAMACI commented on SOLR-5332: - Actually there is same situation at WordDelimiterFilterFactory. It splits words into new ones but still has a preserveOriginal capability too. Add preserve original setting to the EdgeNGramFilterFactory - Key: SOLR-5332 URL: https://issues.apache.org/jira/browse/SOLR-5332 Project: Solr Issue Type: Wish Affects Versions: 4.4, 4.5, 4.5.1, 4.6 Reporter: Alexander S. Hi, as described here: http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html the problem is in that if you have these 2 strings to index: 1. facebook.com/someuser.1 2. facebook.com/someveryandverylongusername and the edge ngram filter factory with min and max gram size settings 2 and 25, search requests for these urls will fail. But search requests for: 1. facebook.com/someuser 2. facebook.com/someveryandverylonguserna will work properly. It's because first url has 1 at the end, which is lover than the allowed min gram size. In the second url the user name is longer than the max gram size (27 characters). Would be good to have a preserve original option, that will add the original string to the index if it does not fit the allowed gram size, so that 1 and someveryandverylongusername tokens will also be added to the index. Best, Alex -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13834285#comment-13834285 ] Robert Muir commented on SOLR-5332: --- Just because WordDelimiterFilter has an option doesnt mean other filters should have it, its hardly a model citizen. Probably even more reason to really think about what is happening and question if its the right thing to do. For the use case described in the issue, a separate field suffices and is likely more flexible and just as efficient. I admit i dont fully understand what James is doing. I'm just saying I dont think our filters need options like preserve or inject because I see generally no value versus just using another field: its typically just users who dont understand that the underlying cost in an inverted index is the same. Add preserve original setting to the EdgeNGramFilterFactory - Key: SOLR-5332 URL: https://issues.apache.org/jira/browse/SOLR-5332 Project: Solr Issue Type: Wish Affects Versions: 4.4, 4.5, 4.5.1, 4.6 Reporter: Alexander S. Hi, as described here: http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html the problem is in that if you have these 2 strings to index: 1. facebook.com/someuser.1 2. facebook.com/someveryandverylongusername and the edge ngram filter factory with min and max gram size settings 2 and 25, search requests for these urls will fail. But search requests for: 1. facebook.com/someuser 2. facebook.com/someveryandverylonguserna will work properly. It's because first url has 1 at the end, which is lover than the allowed min gram size. In the second url the user name is longer than the max gram size (27 characters). Would be good to have a preserve original option, that will add the original string to the index if it does not fit the allowed gram size, so that 1 and someveryandverylongusername tokens will also be added to the index. Best, Alex -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13834318#comment-13834318 ] Furkan KAMACI commented on SOLR-5332: - I just gave an example use case of that option. I mean that: EdgeNGram may have that option or this option may be removed from WordDelimiter too it depends on whichever is a good choice. Of course it does not mean that if WordDelimiter has that option others should have too. However they have similar use cases and WordDelimiter one has that option. On the other hand this issue is a duplicate of another one as I mentioned at my comment. This issue has some problems at description section as I mentioned too so we should not directly care about it as a use case. I implemented a wish for community because some people needs and wants it (I do not use it at my current application/s). It is up to us to decide using it or not. Add preserve original setting to the EdgeNGramFilterFactory - Key: SOLR-5332 URL: https://issues.apache.org/jira/browse/SOLR-5332 Project: Solr Issue Type: Wish Affects Versions: 4.4, 4.5, 4.5.1, 4.6 Reporter: Alexander S. Hi, as described here: http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html the problem is in that if you have these 2 strings to index: 1. facebook.com/someuser.1 2. facebook.com/someveryandverylongusername and the edge ngram filter factory with min and max gram size settings 2 and 25, search requests for these urls will fail. But search requests for: 1. facebook.com/someuser 2. facebook.com/someveryandverylonguserna will work properly. It's because first url has 1 at the end, which is lover than the allowed min gram size. In the second url the user name is longer than the max gram size (27 characters). Would be good to have a preserve original option, that will add the original string to the index if it does not fit the allowed gram size, so that 1 and someveryandverylongusername tokens will also be added to the index. Best, Alex -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13818596#comment-13818596 ] Furkan KAMACI commented on SOLR-5332: - This issue can be marked as duplicated because of that issue: https://issues.apache.org/jira/browse/SOLR-5152 Add preserve original setting to the EdgeNGramFilterFactory - Key: SOLR-5332 URL: https://issues.apache.org/jira/browse/SOLR-5332 Project: Solr Issue Type: Wish Reporter: Alexander S. Hi, as described here: http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html the problem is in that if you have these 2 strings to index: 1. facebook.com/someuser.1 2. facebook.com/someveryandverylongusername and the edge ngram filter factory with min and max gram size settings 2 and 25, search requests for these urls will fail. But search requests for: 1. facebook.com/someuser 2. facebook.com/someveryandverylonguserna will work properly. It's because first url has 1 at the end, which is lover than the allowed min gram size. In the second url the user name is longer than the max gram size (27 characters). Would be good to have a preserve original option, that will add the original string to the index if it does not fit the allowed gram size, so that 1 and someveryandverylongusername tokens will also be added to the index. Best, Alex -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13818593#comment-13818593 ] Furkan KAMACI commented on SOLR-5332: - I've added preserveOriginal capability to EdgeNGramFilterFactory and attached a patch to SOLR-5152. I want to make clear something about the problem that is pointed at this issue. The schema that is described at here: http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html uses LowerCaseFilterFactory before EdgeNGramFilterFactory. There is an explanation about it: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LowerCaseTokenizerFactory and says that: Creates tokens by lowercasing all letters and dropping non-letters. So non-letters will be dropped before tokens are retrieved by EdgeNGramFilterFactory. My patch preserves original token if preserveOriginal is set to true and token length is less than minGramSize or greater than maxGramSize. Add preserve original setting to the EdgeNGramFilterFactory - Key: SOLR-5332 URL: https://issues.apache.org/jira/browse/SOLR-5332 Project: Solr Issue Type: Wish Reporter: Alexander S. Hi, as described here: http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html the problem is in that if you have these 2 strings to index: 1. facebook.com/someuser.1 2. facebook.com/someveryandverylongusername and the edge ngram filter factory with min and max gram size settings 2 and 25, search requests for these urls will fail. But search requests for: 1. facebook.com/someuser 2. facebook.com/someveryandverylonguserna will work properly. It's because first url has 1 at the end, which is lover than the allowed min gram size. In the second url the user name is longer than the max gram size (27 characters). Would be good to have a preserve original option, that will add the original string to the index if it does not fit the allowed gram size, so that 1 and someveryandverylongusername tokens will also be added to the index. Best, Alex -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org