[jira] [Commented] (SOLR-5332) Add "preserve original" setting to the EdgeNGramFilterFactory

2018-06-12 Thread Ingomar Wesp (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510060#comment-16510060
 ] 

Ingomar Wesp commented on SOLR-5332:


Given that LUCENE-7960 has been closed, I think this issue can be marked as 
fixed, too.

> Add "preserve original" setting to the EdgeNGramFilterFactory
> -
>
> Key: SOLR-5332
> URL: https://issues.apache.org/jira/browse/SOLR-5332
> Project: Solr
>  Issue Type: Wish
>Affects Versions: 4.4, 4.5, 4.5.1, 4.6
>Reporter: Alexander S.
>Priority: Major
> Fix For: 5.2, 6.0
>
>
> Hi, as described here: 
> http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html
>  the problem is in that if you have these 2 strings to index:
> 1. facebook.com/someuser.1
> 2. facebook.com/someveryandverylongusername
> and the edge ngram filter factory with min and max gram size settings 2 and 
> 25, search requests for these urls will fail.
> But search requests for:
> 1. facebook.com/someuser
> 2. facebook.com/someveryandverylonguserna
> will work properly.
> It's because first url has "1" at the end, which is lover than the allowed 
> min gram size. In the second url the user name is longer than the max gram 
> size (27 characters).
> Would be good to have a "preserve original" option, that will add the 
> original string to the index if it does not fit the allowed gram size, so 
> that "1" and "someveryandverylongusername" tokens will also be added to the 
> index.
> Best,
> Alex



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5332) Add "preserve original" setting to the EdgeNGramFilterFactory

2018-04-03 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423817#comment-16423817
 ] 

Thomas Wöckinger commented on SOLR-5332:


So what can be done to get this into the main line?

> Add "preserve original" setting to the EdgeNGramFilterFactory
> -
>
> Key: SOLR-5332
> URL: https://issues.apache.org/jira/browse/SOLR-5332
> Project: Solr
>  Issue Type: Wish
>Affects Versions: 4.4, 4.5, 4.5.1, 4.6
>Reporter: Alexander S.
>Priority: Major
> Fix For: 5.2, 6.0
>
>
> Hi, as described here: 
> http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html
>  the problem is in that if you have these 2 strings to index:
> 1. facebook.com/someuser.1
> 2. facebook.com/someveryandverylongusername
> and the edge ngram filter factory with min and max gram size settings 2 and 
> 25, search requests for these urls will fail.
> But search requests for:
> 1. facebook.com/someuser
> 2. facebook.com/someveryandverylonguserna
> will work properly.
> It's because first url has "1" at the end, which is lover than the allowed 
> min gram size. In the second url the user name is longer than the max gram 
> size (27 characters).
> Would be good to have a "preserve original" option, that will add the 
> original string to the index if it does not fit the allowed gram size, so 
> that "1" and "someveryandverylongusername" tokens will also be added to the 
> index.
> Best,
> Alex



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

2015-03-03 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345260#comment-14345260
 ] 

Furkan KAMACI commented on SOLR-5332:
-

[~simon.endele] You can check my patch at SOLR-5152. I've applied a patch there 
and this issue become a duplicate.

 Add preserve original setting to the EdgeNGramFilterFactory
 -

 Key: SOLR-5332
 URL: https://issues.apache.org/jira/browse/SOLR-5332
 Project: Solr
  Issue Type: Wish
Affects Versions: 4.4, 4.5, 4.5.1, 4.6
Reporter: Alexander S.
 Fix For: 5.1


 Hi, as described here: 
 http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html
  the problem is in that if you have these 2 strings to index:
 1. facebook.com/someuser.1
 2. facebook.com/someveryandverylongusername
 and the edge ngram filter factory with min and max gram size settings 2 and 
 25, search requests for these urls will fail.
 But search requests for:
 1. facebook.com/someuser
 2. facebook.com/someveryandverylonguserna
 will work properly.
 It's because first url has 1 at the end, which is lover than the allowed 
 min gram size. In the second url the user name is longer than the max gram 
 size (27 characters).
 Would be good to have a preserve original option, that will add the 
 original string to the index if it does not fit the allowed gram size, so 
 that 1 and someveryandverylongusername tokens will also be added to the 
 index.
 Best,
 Alex



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

2015-03-02 Thread Simon Endele (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343414#comment-14343414
 ] 

Simon Endele commented on SOLR-5332:


+1 for this feature.
We use the EdgeNGramFilterFactory on a tokenized field (in order to implement a 
prefix search on index time) with minGramSize=3.
Unfortunately we observed that tokens with length 1 or 2 are actually deleted, 
unexpectedly from our point of view.

Using a second field (though complicated IMHO) would address query-issues, but 
it gets awkward when it comes to highlighting or phrase searches.
For instance when searching for us rep
- the field with EdgeNGramFilterFactory highlights rep in representative, 
but not US as this token has been removed,
- the field without EdgeNGramFilterFactory highlights US, but not 
representative as it has no prefixes indexed.

Bringing these highlightings together in one string is a quite complex task.
Not speaking of a phrase search, which does not work at all for the example 
above.

We use minGramSize=3 to reduce collisions of prefixes and abbreviations (like 
US and usage) and reduce the index size.
I admit, this does not prevent all collisions (e.g. USA still collides with 
usage), but it's a compromise.

Nevertheless, minGramSize is a nice feature of EdgeNGramFilterFactory, but it 
lacks a preserveOriginal flag IMO.

 Add preserve original setting to the EdgeNGramFilterFactory
 -

 Key: SOLR-5332
 URL: https://issues.apache.org/jira/browse/SOLR-5332
 Project: Solr
  Issue Type: Wish
Affects Versions: 4.4, 4.5, 4.5.1, 4.6
Reporter: Alexander S.

 Hi, as described here: 
 http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html
  the problem is in that if you have these 2 strings to index:
 1. facebook.com/someuser.1
 2. facebook.com/someveryandverylongusername
 and the edge ngram filter factory with min and max gram size settings 2 and 
 25, search requests for these urls will fail.
 But search requests for:
 1. facebook.com/someuser
 2. facebook.com/someveryandverylonguserna
 will work properly.
 It's because first url has 1 at the end, which is lover than the allowed 
 min gram size. In the second url the user name is longer than the max gram 
 size (27 characters).
 Would be good to have a preserve original option, that will add the 
 original string to the index if it does not fit the allowed gram size, so 
 that 1 and someveryandverylongusername tokens will also be added to the 
 index.
 Best,
 Alex



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

2013-11-27 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13833582#comment-13833582
 ] 

Furkan KAMACI commented on SOLR-5332:
-

[~aheaven] if you change the Fix Version/s to the next release this issue can 
be regarded.

 Add preserve original setting to the EdgeNGramFilterFactory
 -

 Key: SOLR-5332
 URL: https://issues.apache.org/jira/browse/SOLR-5332
 Project: Solr
  Issue Type: Wish
Affects Versions: 4.4
Reporter: Alexander S.

 Hi, as described here: 
 http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html
  the problem is in that if you have these 2 strings to index:
 1. facebook.com/someuser.1
 2. facebook.com/someveryandverylongusername
 and the edge ngram filter factory with min and max gram size settings 2 and 
 25, search requests for these urls will fail.
 But search requests for:
 1. facebook.com/someuser
 2. facebook.com/someveryandverylonguserna
 will work properly.
 It's because first url has 1 at the end, which is lover than the allowed 
 min gram size. In the second url the user name is longer than the max gram 
 size (27 characters).
 Would be good to have a preserve original option, that will add the 
 original string to the index if it does not fit the allowed gram size, so 
 that 1 and someveryandverylongusername tokens will also be added to the 
 index.
 Best,
 Alex



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

2013-11-27 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13833811#comment-13833811
 ] 

Robert Muir commented on SOLR-5332:
---

Why not just use another field? Its the same cost either way as this setting: 
except it works today and we dont have to maintain it.

Additionally you maintain more control: you can control boosting etc across the 
different fields

 Add preserve original setting to the EdgeNGramFilterFactory
 -

 Key: SOLR-5332
 URL: https://issues.apache.org/jira/browse/SOLR-5332
 Project: Solr
  Issue Type: Wish
Affects Versions: 4.4, 4.5, 4.5.1, 4.6
Reporter: Alexander S.

 Hi, as described here: 
 http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html
  the problem is in that if you have these 2 strings to index:
 1. facebook.com/someuser.1
 2. facebook.com/someveryandverylongusername
 and the edge ngram filter factory with min and max gram size settings 2 and 
 25, search requests for these urls will fail.
 But search requests for:
 1. facebook.com/someuser
 2. facebook.com/someveryandverylonguserna
 will work properly.
 It's because first url has 1 at the end, which is lover than the allowed 
 min gram size. In the second url the user name is longer than the max gram 
 size (27 characters).
 Would be good to have a preserve original option, that will add the 
 original string to the index if it does not fit the allowed gram size, so 
 that 1 and someveryandverylongusername tokens will also be added to the 
 index.
 Best,
 Alex



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

2013-11-27 Thread James Dyer (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13834074#comment-13834074
 ] 

James Dyer commented on SOLR-5332:
--

We have a use case where we use a modified version of EdgeNGramFilter to 
preserve the original.  The field we used this on is multi-valued.  We change 
all user queries against the field to phrases with slop to prevent partial 
matches across values.  But our users also want to be able to enter sub-strings 
on this field.  (Because all queries are phrase queries, wildcards are not an 
option.)  So had this functionality existed we would have been spared of having 
to implement it ourselves.  (I didn't contribute the code because I couldn't 
imagine it had broad applicability.  But it seems that with this issue, at 
least a few others out there have cases for it as well)

 Add preserve original setting to the EdgeNGramFilterFactory
 -

 Key: SOLR-5332
 URL: https://issues.apache.org/jira/browse/SOLR-5332
 Project: Solr
  Issue Type: Wish
Affects Versions: 4.4, 4.5, 4.5.1, 4.6
Reporter: Alexander S.

 Hi, as described here: 
 http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html
  the problem is in that if you have these 2 strings to index:
 1. facebook.com/someuser.1
 2. facebook.com/someveryandverylongusername
 and the edge ngram filter factory with min and max gram size settings 2 and 
 25, search requests for these urls will fail.
 But search requests for:
 1. facebook.com/someuser
 2. facebook.com/someveryandverylonguserna
 will work properly.
 It's because first url has 1 at the end, which is lover than the allowed 
 min gram size. In the second url the user name is longer than the max gram 
 size (27 characters).
 Would be good to have a preserve original option, that will add the 
 original string to the index if it does not fit the allowed gram size, so 
 that 1 and someveryandverylongusername tokens will also be added to the 
 index.
 Best,
 Alex



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

2013-11-27 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13834135#comment-13834135
 ] 

Robert Muir commented on SOLR-5332:
---

James but the issue is still the same. There is no savings of doing this in the 
same field!

So to me its more clear to query on foo_exact:whatever if you want an exact 
match versus doing it in a roundabout way with a sloppy phrase query.

 Add preserve original setting to the EdgeNGramFilterFactory
 -

 Key: SOLR-5332
 URL: https://issues.apache.org/jira/browse/SOLR-5332
 Project: Solr
  Issue Type: Wish
Affects Versions: 4.4, 4.5, 4.5.1, 4.6
Reporter: Alexander S.

 Hi, as described here: 
 http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html
  the problem is in that if you have these 2 strings to index:
 1. facebook.com/someuser.1
 2. facebook.com/someveryandverylongusername
 and the edge ngram filter factory with min and max gram size settings 2 and 
 25, search requests for these urls will fail.
 But search requests for:
 1. facebook.com/someuser
 2. facebook.com/someveryandverylonguserna
 will work properly.
 It's because first url has 1 at the end, which is lover than the allowed 
 min gram size. In the second url the user name is longer than the max gram 
 size (27 characters).
 Would be good to have a preserve original option, that will add the 
 original string to the index if it does not fit the allowed gram size, so 
 that 1 and someveryandverylongusername tokens will also be added to the 
 index.
 Best,
 Alex



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

2013-11-27 Thread James Dyer (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13834141#comment-13834141
 ] 

James Dyer commented on SOLR-5332:
--

there is if a user enters 2 keywords, the one matches an edgengram and the 
other matches an original keyword.  Our case involves book contributors.  If a 
book has 2 contributors, John Smith  Edward Jones, we want the user to get a 
result if they query edward jones or e jones or ed jones, but not edward 
smith nor e smith, etc.  The only solution I could come up with involved 
with a combination of edge n-grams and the original keywords in the same field. 
 I think there are valid usecases for this, perhaps not very many.

 Add preserve original setting to the EdgeNGramFilterFactory
 -

 Key: SOLR-5332
 URL: https://issues.apache.org/jira/browse/SOLR-5332
 Project: Solr
  Issue Type: Wish
Affects Versions: 4.4, 4.5, 4.5.1, 4.6
Reporter: Alexander S.

 Hi, as described here: 
 http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html
  the problem is in that if you have these 2 strings to index:
 1. facebook.com/someuser.1
 2. facebook.com/someveryandverylongusername
 and the edge ngram filter factory with min and max gram size settings 2 and 
 25, search requests for these urls will fail.
 But search requests for:
 1. facebook.com/someuser
 2. facebook.com/someveryandverylonguserna
 will work properly.
 It's because first url has 1 at the end, which is lover than the allowed 
 min gram size. In the second url the user name is longer than the max gram 
 size (27 characters).
 Would be good to have a preserve original option, that will add the 
 original string to the index if it does not fit the allowed gram size, so 
 that 1 and someveryandverylongusername tokens will also be added to the 
 index.
 Best,
 Alex



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

2013-11-27 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13834223#comment-13834223
 ] 

Furkan KAMACI commented on SOLR-5332:
-

Actually there is same situation at WordDelimiterFilterFactory. It splits words 
into new ones but still has a preserveOriginal capability too.

 Add preserve original setting to the EdgeNGramFilterFactory
 -

 Key: SOLR-5332
 URL: https://issues.apache.org/jira/browse/SOLR-5332
 Project: Solr
  Issue Type: Wish
Affects Versions: 4.4, 4.5, 4.5.1, 4.6
Reporter: Alexander S.

 Hi, as described here: 
 http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html
  the problem is in that if you have these 2 strings to index:
 1. facebook.com/someuser.1
 2. facebook.com/someveryandverylongusername
 and the edge ngram filter factory with min and max gram size settings 2 and 
 25, search requests for these urls will fail.
 But search requests for:
 1. facebook.com/someuser
 2. facebook.com/someveryandverylonguserna
 will work properly.
 It's because first url has 1 at the end, which is lover than the allowed 
 min gram size. In the second url the user name is longer than the max gram 
 size (27 characters).
 Would be good to have a preserve original option, that will add the 
 original string to the index if it does not fit the allowed gram size, so 
 that 1 and someveryandverylongusername tokens will also be added to the 
 index.
 Best,
 Alex



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

2013-11-27 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13834285#comment-13834285
 ] 

Robert Muir commented on SOLR-5332:
---

Just because WordDelimiterFilter has an option doesnt mean other filters should 
have it, its hardly a model citizen. Probably even more reason to really think 
about what is happening and question if its the right thing to do.

For the use case described in the issue, a separate field suffices and is 
likely more flexible and just as efficient. 

I admit i dont fully understand what James is doing. 

I'm just saying I dont think our filters need options like preserve or 
inject because I see generally no value versus just using another field: its 
typically just users who dont understand that the underlying cost in an 
inverted index is the same.

 Add preserve original setting to the EdgeNGramFilterFactory
 -

 Key: SOLR-5332
 URL: https://issues.apache.org/jira/browse/SOLR-5332
 Project: Solr
  Issue Type: Wish
Affects Versions: 4.4, 4.5, 4.5.1, 4.6
Reporter: Alexander S.

 Hi, as described here: 
 http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html
  the problem is in that if you have these 2 strings to index:
 1. facebook.com/someuser.1
 2. facebook.com/someveryandverylongusername
 and the edge ngram filter factory with min and max gram size settings 2 and 
 25, search requests for these urls will fail.
 But search requests for:
 1. facebook.com/someuser
 2. facebook.com/someveryandverylonguserna
 will work properly.
 It's because first url has 1 at the end, which is lover than the allowed 
 min gram size. In the second url the user name is longer than the max gram 
 size (27 characters).
 Would be good to have a preserve original option, that will add the 
 original string to the index if it does not fit the allowed gram size, so 
 that 1 and someveryandverylongusername tokens will also be added to the 
 index.
 Best,
 Alex



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

2013-11-27 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13834318#comment-13834318
 ] 

Furkan KAMACI commented on SOLR-5332:
-

I just gave an example use case of that option. I mean that: EdgeNGram may have 
that option or this option may be removed from WordDelimiter too it depends on 
whichever is a good choice. Of course it does not mean that if WordDelimiter 
has that option others should have too. However they have similar use cases and 
WordDelimiter one has that option. 

On the other hand this issue is a duplicate of another one as I mentioned at my 
comment. This issue has some problems at description section as I mentioned too 
so we should not directly care about it as a use case. I implemented a wish for 
community because some people needs and wants it (I do not use it at my current 
application/s). It is up to us to decide using it or not.

 Add preserve original setting to the EdgeNGramFilterFactory
 -

 Key: SOLR-5332
 URL: https://issues.apache.org/jira/browse/SOLR-5332
 Project: Solr
  Issue Type: Wish
Affects Versions: 4.4, 4.5, 4.5.1, 4.6
Reporter: Alexander S.

 Hi, as described here: 
 http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html
  the problem is in that if you have these 2 strings to index:
 1. facebook.com/someuser.1
 2. facebook.com/someveryandverylongusername
 and the edge ngram filter factory with min and max gram size settings 2 and 
 25, search requests for these urls will fail.
 But search requests for:
 1. facebook.com/someuser
 2. facebook.com/someveryandverylonguserna
 will work properly.
 It's because first url has 1 at the end, which is lover than the allowed 
 min gram size. In the second url the user name is longer than the max gram 
 size (27 characters).
 Would be good to have a preserve original option, that will add the 
 original string to the index if it does not fit the allowed gram size, so 
 that 1 and someveryandverylongusername tokens will also be added to the 
 index.
 Best,
 Alex



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

2013-11-10 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13818596#comment-13818596
 ] 

Furkan KAMACI commented on SOLR-5332:
-

This issue can be marked as duplicated because of that issue: 
https://issues.apache.org/jira/browse/SOLR-5152

 Add preserve original setting to the EdgeNGramFilterFactory
 -

 Key: SOLR-5332
 URL: https://issues.apache.org/jira/browse/SOLR-5332
 Project: Solr
  Issue Type: Wish
Reporter: Alexander S.

 Hi, as described here: 
 http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html
  the problem is in that if you have these 2 strings to index:
 1. facebook.com/someuser.1
 2. facebook.com/someveryandverylongusername
 and the edge ngram filter factory with min and max gram size settings 2 and 
 25, search requests for these urls will fail.
 But search requests for:
 1. facebook.com/someuser
 2. facebook.com/someveryandverylonguserna
 will work properly.
 It's because first url has 1 at the end, which is lover than the allowed 
 min gram size. In the second url the user name is longer than the max gram 
 size (27 characters).
 Would be good to have a preserve original option, that will add the 
 original string to the index if it does not fit the allowed gram size, so 
 that 1 and someveryandverylongusername tokens will also be added to the 
 index.
 Best,
 Alex



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

2013-11-10 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13818593#comment-13818593
 ] 

Furkan KAMACI commented on SOLR-5332:
-

I've added preserveOriginal capability to EdgeNGramFilterFactory and attached a 
patch to SOLR-5152. I want to make clear something about the problem that is 
pointed at this issue. The schema that is described at here: 
http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html
 uses LowerCaseFilterFactory before EdgeNGramFilterFactory. There is an 
explanation about it: 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LowerCaseTokenizerFactory
 and says that: Creates tokens by lowercasing all letters and dropping 
non-letters. So non-letters will be dropped before tokens are retrieved by 
EdgeNGramFilterFactory. 

My patch preserves original token if preserveOriginal is set to true and token 
length is less than minGramSize or greater than maxGramSize.

 Add preserve original setting to the EdgeNGramFilterFactory
 -

 Key: SOLR-5332
 URL: https://issues.apache.org/jira/browse/SOLR-5332
 Project: Solr
  Issue Type: Wish
Reporter: Alexander S.

 Hi, as described here: 
 http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html
  the problem is in that if you have these 2 strings to index:
 1. facebook.com/someuser.1
 2. facebook.com/someveryandverylongusername
 and the edge ngram filter factory with min and max gram size settings 2 and 
 25, search requests for these urls will fail.
 But search requests for:
 1. facebook.com/someuser
 2. facebook.com/someveryandverylonguserna
 will work properly.
 It's because first url has 1 at the end, which is lover than the allowed 
 min gram size. In the second url the user name is longer than the max gram 
 size (27 characters).
 Would be good to have a preserve original option, that will add the 
 original string to the index if it does not fit the allowed gram size, so 
 that 1 and someveryandverylongusername tokens will also be added to the 
 index.
 Best,
 Alex



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org