subject:"\[jira\] \[Commented\] \(SOLR\-5332\) Add preserve original setting to the EdgeNGramFilterFactory"

[jira] [Commented] (SOLR-5332) Add "preserve original" setting to the EdgeNGramFilterFactory

2018-06-12 Thread Ingomar Wesp (JIRA)



[ 
https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510060#comment-16510060
 ] 

Ingomar Wesp commented on SOLR-5332:


Given that LUCENE-7960 has been closed, I think this issue can be marked as 
fixed, too.

> Add "preserve original" setting to the EdgeNGramFilterFactory
> -
>
> Key: SOLR-5332
> URL: https://issues.apache.org/jira/browse/SOLR-5332
> Project: Solr
>  Issue Type: Wish
>Affects Versions: 4.4, 4.5, 4.5.1, 4.6
>Reporter: Alexander S.
>Priority: Major
> Fix For: 5.2, 6.0
>
>
> Hi, as described here: 
> http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html
>  the problem is in that if you have these 2 strings to index:
> 1. facebook.com/someuser.1
> 2. facebook.com/someveryandverylongusername
> and the edge ngram filter factory with min and max gram size settings 2 and 
> 25, search requests for these urls will fail.
> But search requests for:
> 1. facebook.com/someuser
> 2. facebook.com/someveryandverylonguserna
> will work properly.
> It's because first url has "1" at the end, which is lover than the allowed 
> min gram size. In the second url the user name is longer than the max gram 
> size (27 characters).
> Would be good to have a "preserve original" option, that will add the 
> original string to the index if it does not fit the allowed gram size, so 
> that "1" and "someveryandverylongusername" tokens will also be added to the 
> index.
> Best,
> Alex



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5332) Add "preserve original" setting to the EdgeNGramFilterFactory

2018-04-03 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423817#comment-16423817
 ] 

Thomas Wöckinger commented on SOLR-5332:


So what can be done to get this into the main line?

> Add "preserve original" setting to the EdgeNGramFilterFactory
> -
>
> Key: SOLR-5332
> URL: https://issues.apache.org/jira/browse/SOLR-5332
> Project: Solr
>  Issue Type: Wish
>Affects Versions: 4.4, 4.5, 4.5.1, 4.6
>Reporter: Alexander S.
>Priority: Major
> Fix For: 5.2, 6.0
>
>
> Hi, as described here: 
> http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html
>  the problem is in that if you have these 2 strings to index:
> 1. facebook.com/someuser.1
> 2. facebook.com/someveryandverylongusername
> and the edge ngram filter factory with min and max gram size settings 2 and 
> 25, search requests for these urls will fail.
> But search requests for:
> 1. facebook.com/someuser
> 2. facebook.com/someveryandverylonguserna
> will work properly.
> It's because first url has "1" at the end, which is lover than the allowed 
> min gram size. In the second url the user name is longer than the max gram 
> size (27 characters).
> Would be good to have a "preserve original" option, that will add the 
> original string to the index if it does not fit the allowed gram size, so 
> that "1" and "someveryandverylongusername" tokens will also be added to the 
> index.
> Best,
> Alex



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

2015-03-03 Thread Furkan KAMACI (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345260#comment-14345260
]

Furkan KAMACI commented on SOLR-5332:
-

[~simon.endele] You can check my patch at SOLR-5152. I've applied a patch there
and this issue become a duplicate.

Add preserve original setting to the EdgeNGramFilterFactory
-

Key: SOLR-5332
URL: https://issues.apache.org/jira/browse/SOLR-5332
Project: Solr
Issue Type: Wish
Affects Versions: 4.4, 4.5, 4.5.1, 4.6
Reporter: Alexander S.
Fix For: 5.1

Hi, as described here:
http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html
the problem is in that if you have these 2 strings to index:
1. facebook.com/someuser.1
2. facebook.com/someveryandverylongusername
and the edge ngram filter factory with min and max gram size settings 2 and
25, search requests for these urls will fail.
But search requests for:
1. facebook.com/someuser
2. facebook.com/someveryandverylonguserna
will work properly.
It's because first url has 1 at the end, which is lover than the allowed
min gram size. In the second url the user name is longer than the max gram
size (27 characters).
Would be good to have a preserve original option, that will add the
original string to the index if it does not fit the allowed gram size, so
that 1 and someveryandverylongusername tokens will also be added to the
index.
Best,
Alex

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

2015-03-02 Thread Simon Endele (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343414#comment-14343414
]

Simon Endele commented on SOLR-5332:

+1 for this feature.
We use the EdgeNGramFilterFactory on a tokenized field (in order to implement a
prefix search on index time) with minGramSize=3.
Unfortunately we observed that tokens with length 1 or 2 are actually deleted,
unexpectedly from our point of view.

Using a second field (though complicated IMHO) would address query-issues, but
it gets awkward when it comes to highlighting or phrase searches.
For instance when searching for us rep
- the field with EdgeNGramFilterFactory highlights rep in representative,
but not US as this token has been removed,
- the field without EdgeNGramFilterFactory highlights US, but not
representative as it has no prefixes indexed.

Bringing these highlightings together in one string is a quite complex task.
Not speaking of a phrase search, which does not work at all for the example
above.

We use minGramSize=3 to reduce collisions of prefixes and abbreviations (like
US and usage) and reduce the index size.
I admit, this does not prevent all collisions (e.g. USA still collides with
usage), but it's a compromise.

Nevertheless, minGramSize is a nice feature of EdgeNGramFilterFactory, but it
lacks a preserveOriginal flag IMO.

Add preserve original setting to the EdgeNGramFilterFactory
-

Key: SOLR-5332
URL: https://issues.apache.org/jira/browse/SOLR-5332
Project: Solr
Issue Type: Wish
Affects Versions: 4.4, 4.5, 4.5.1, 4.6
Reporter: Alexander S.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

2013-11-27 Thread Furkan KAMACI (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13833582#comment-13833582
]

Furkan KAMACI commented on SOLR-5332:
-

[~aheaven] if you change the Fix Version/s to the next release this issue can
be regarded.

Add preserve original setting to the EdgeNGramFilterFactory
-

Key: SOLR-5332
URL: https://issues.apache.org/jira/browse/SOLR-5332
Project: Solr
Issue Type: Wish
Affects Versions: 4.4
Reporter: Alexander S.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

2013-11-27 Thread Robert Muir (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13833811#comment-13833811
]

Robert Muir commented on SOLR-5332:
---

Why not just use another field? Its the same cost either way as this setting:
except it works today and we dont have to maintain it.

Additionally you maintain more control: you can control boosting etc across the
different fields

Add preserve original setting to the EdgeNGramFilterFactory
-

Key: SOLR-5332
URL: https://issues.apache.org/jira/browse/SOLR-5332
Project: Solr
Issue Type: Wish
Affects Versions: 4.4, 4.5, 4.5.1, 4.6
Reporter: Alexander S.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

2013-11-27 Thread James Dyer (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13834074#comment-13834074
]

James Dyer commented on SOLR-5332:
--

We have a use case where we use a modified version of EdgeNGramFilter to
preserve the original. The field we used this on is multi-valued. We change
all user queries against the field to phrases with slop to prevent partial
matches across values. But our users also want to be able to enter sub-strings
on this field. (Because all queries are phrase queries, wildcards are not an
option.) So had this functionality existed we would have been spared of having
to implement it ourselves. (I didn't contribute the code because I couldn't
imagine it had broad applicability. But it seems that with this issue, at
least a few others out there have cases for it as well)

Add preserve original setting to the EdgeNGramFilterFactory
-

Key: SOLR-5332
URL: https://issues.apache.org/jira/browse/SOLR-5332
Project: Solr
Issue Type: Wish
Affects Versions: 4.4, 4.5, 4.5.1, 4.6
Reporter: Alexander S.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

2013-11-27 Thread Robert Muir (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13834135#comment-13834135
]

Robert Muir commented on SOLR-5332:
---

James but the issue is still the same. There is no savings of doing this in the
same field!

So to me its more clear to query on foo_exact:whatever if you want an exact
match versus doing it in a roundabout way with a sloppy phrase query.

Add preserve original setting to the EdgeNGramFilterFactory
-

Key: SOLR-5332
URL: https://issues.apache.org/jira/browse/SOLR-5332
Project: Solr
Issue Type: Wish
Affects Versions: 4.4, 4.5, 4.5.1, 4.6
Reporter: Alexander S.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

2013-11-27 Thread James Dyer (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13834141#comment-13834141
]

James Dyer commented on SOLR-5332:
--

there is if a user enters 2 keywords, the one matches an edgengram and the
other matches an original keyword. Our case involves book contributors. If a
book has 2 contributors, John Smith Edward Jones, we want the user to get a
result if they query edward jones or e jones or ed jones, but not edward
smith nor e smith, etc. The only solution I could come up with involved
with a combination of edge n-grams and the original keywords in the same field.
I think there are valid usecases for this, perhaps not very many.

Add preserve original setting to the EdgeNGramFilterFactory
-

Key: SOLR-5332
URL: https://issues.apache.org/jira/browse/SOLR-5332
Project: Solr
Issue Type: Wish
Affects Versions: 4.4, 4.5, 4.5.1, 4.6
Reporter: Alexander S.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

2013-11-27 Thread Furkan KAMACI (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13834223#comment-13834223
]

Furkan KAMACI commented on SOLR-5332:
-

Actually there is same situation at WordDelimiterFilterFactory. It splits words
into new ones but still has a preserveOriginal capability too.

Add preserve original setting to the EdgeNGramFilterFactory
-

Key: SOLR-5332
URL: https://issues.apache.org/jira/browse/SOLR-5332
Project: Solr
Issue Type: Wish
Affects Versions: 4.4, 4.5, 4.5.1, 4.6
Reporter: Alexander S.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

2013-11-27 Thread Robert Muir (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13834285#comment-13834285
]

Robert Muir commented on SOLR-5332:
---

Just because WordDelimiterFilter has an option doesnt mean other filters should
have it, its hardly a model citizen. Probably even more reason to really think
about what is happening and question if its the right thing to do.

For the use case described in the issue, a separate field suffices and is
likely more flexible and just as efficient.

I admit i dont fully understand what James is doing.

I'm just saying I dont think our filters need options like preserve or
inject because I see generally no value versus just using another field: its
typically just users who dont understand that the underlying cost in an
inverted index is the same.

Add preserve original setting to the EdgeNGramFilterFactory
-

Key: SOLR-5332
URL: https://issues.apache.org/jira/browse/SOLR-5332
Project: Solr
Issue Type: Wish
Affects Versions: 4.4, 4.5, 4.5.1, 4.6
Reporter: Alexander S.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

2013-11-27 Thread Furkan KAMACI (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13834318#comment-13834318
]

Furkan KAMACI commented on SOLR-5332:
-

I just gave an example use case of that option. I mean that: EdgeNGram may have
that option or this option may be removed from WordDelimiter too it depends on
whichever is a good choice. Of course it does not mean that if WordDelimiter
has that option others should have too. However they have similar use cases and
WordDelimiter one has that option.

On the other hand this issue is a duplicate of another one as I mentioned at my
comment. This issue has some problems at description section as I mentioned too
so we should not directly care about it as a use case. I implemented a wish for
community because some people needs and wants it (I do not use it at my current
application/s). It is up to us to decide using it or not.

Add preserve original setting to the EdgeNGramFilterFactory
-

Key: SOLR-5332
URL: https://issues.apache.org/jira/browse/SOLR-5332
Project: Solr
Issue Type: Wish
Affects Versions: 4.4, 4.5, 4.5.1, 4.6
Reporter: Alexander S.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

2013-11-10 Thread Furkan KAMACI (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13818596#comment-13818596
]

Furkan KAMACI commented on SOLR-5332:
-

This issue can be marked as duplicated because of that issue:
https://issues.apache.org/jira/browse/SOLR-5152

Add preserve original setting to the EdgeNGramFilterFactory
-

Key: SOLR-5332
URL: https://issues.apache.org/jira/browse/SOLR-5332
Project: Solr
Issue Type: Wish
Reporter: Alexander S.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

2013-11-10 Thread Furkan KAMACI (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13818593#comment-13818593
]

Furkan KAMACI commented on SOLR-5332:
-

I've added preserveOriginal capability to EdgeNGramFilterFactory and attached a
patch to SOLR-5152. I want to make clear something about the problem that is
pointed at this issue. The schema that is described at here:
http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html
uses LowerCaseFilterFactory before EdgeNGramFilterFactory. There is an
explanation about it:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LowerCaseTokenizerFactory
and says that: Creates tokens by lowercasing all letters and dropping
non-letters. So non-letters will be dropped before tokens are retrieved by
EdgeNGramFilterFactory.

My patch preserves original token if preserveOriginal is set to true and token
length is less than minGramSize or greater than maxGramSize.

Add preserve original setting to the EdgeNGramFilterFactory
-

Key: SOLR-5332
URL: https://issues.apache.org/jira/browse/SOLR-5332
Project: Solr
Issue Type: Wish
Reporter: Alexander S.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5332) Add "preserve original" setting to the EdgeNGramFilterFactory

[jira] [Commented] (SOLR-5332) Add "preserve original" setting to the EdgeNGramFilterFactory

[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

[jira] [Commented] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

14 matches

Site Navigation

Mail list logo

Footer information