[jira] [Comment Edited] (SOLR-5379) Query-time multi-word synonym expansion

2013-10-23 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13803086#comment-13803086
 ] 

Otis Gospodnetic edited comment on SOLR-5379 at 10/23/13 6:09 PM:
--

[~tiennm] How does this differ from SOLR-4381?  Which cases does SOLR-4381 not 
handle that this patch handles?


was (Author: otis):
[~tiennm] How does this diff from SOLR-4381?  Which cases does SOLR-4381 not 
handle that this patch handles?

> Query-time multi-word synonym expansion
> ---
>
> Key: SOLR-5379
> URL: https://issues.apache.org/jira/browse/SOLR-5379
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Nguyen Manh Tien
>  Labels: multi-word, queryparser, synonym
> Fix For: 4.5.1, 4.6
>
> Attachments: quoted.patch, synonym-expander.patch
>
>
> While dealing with synonym at query time, solr failed to work with multi-word 
> synonyms due to some reasons:
> - First the lucene queryparser tokenizes user query by space so it split 
> multi-word term into two terms before feeding to synonym filter, so synonym 
> filter can't recognized multi-word term to do expansion
> - Second, if synonym filter expand into multiple terms which contains 
> multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
> handle synonyms. But MultiPhraseQuery don't work with term have different 
> number of words.
> For the first one, we can extend quoted all multi-word synonym in user query 
> so that lucene queryparser don't split it. There are a jira task related to 
> this one https://issues.apache.org/jira/browse/LUCENE-2605.
> For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
> SHOULD which contains multiple PhraseQuery in case tokens stream have 
> multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5379) Query-time multi-word synonym expansion

2013-10-25 Thread Marco Wong (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13805179#comment-13805179
 ] 

Marco Wong edited comment on SOLR-5379 at 10/25/13 8:56 AM:


Excuse me, for the synonym-expander.patch, when I have a ShingleFilter in query 
time analyzer which emits bigram TermQuery like Term(a b), does the updated 
SolrQueryParserBase will emitting PhraseQuery(Term(a), Term(b)), making my 
existing tokenization logic fail?


was (Author: marcowong):
Excuse me, for the synonym-expander.patch, does the updated SolrQueryParserBase 
will emitting PhraseQuery(Term(a), Term(b)), when I have a ShingleFilter in 
query time analyzer which emits bigram like Term(a b), and makes my existing 
tokenization logic fail?

> Query-time multi-word synonym expansion
> ---
>
> Key: SOLR-5379
> URL: https://issues.apache.org/jira/browse/SOLR-5379
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Nguyen Manh Tien
>  Labels: multi-word, queryparser, synonym
> Fix For: 4.5.1, 4.6
>
> Attachments: quoted.patch, synonym-expander.patch
>
>
> While dealing with synonym at query time, solr failed to work with multi-word 
> synonyms due to some reasons:
> - First the lucene queryparser tokenizes user query by space so it split 
> multi-word term into two terms before feeding to synonym filter, so synonym 
> filter can't recognized multi-word term to do expansion
> - Second, if synonym filter expand into multiple terms which contains 
> multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
> handle synonyms. But MultiPhraseQuery don't work with term have different 
> number of words.
> For the first one, we can extend quoted all multi-word synonym in user query 
> so that lucene queryparser don't split it. There are a jira task related to 
> this one https://issues.apache.org/jira/browse/LUCENE-2605.
> For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
> SHOULD which contains multiple PhraseQuery in case tokens stream have 
> multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5379) Query-time multi-word synonym expansion

2013-10-25 Thread Marco Wong (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13805179#comment-13805179
 ] 

Marco Wong edited comment on SOLR-5379 at 10/25/13 8:55 AM:


Excuse me, for the synonym-expander.patch, does the updated SolrQueryParserBase 
will emitting PhraseQuery(Term(a), Term(b)), when I have a ShingleFilter in 
query time analyzer which emits bigram like Term(a b), and makes my existing 
tokenization logic fail?


was (Author: marcowong):
Excuse me, for the synonym-expander.patch, does the updated SolrQueryParserBase 
will emitting PhraseQuery(Term(a), Term(b)), when I have a ShingleFilter in 
query time analyzer which emits bigram like Term(a b), which makes my existing 
tokenization logic fail?

> Query-time multi-word synonym expansion
> ---
>
> Key: SOLR-5379
> URL: https://issues.apache.org/jira/browse/SOLR-5379
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Nguyen Manh Tien
>  Labels: multi-word, queryparser, synonym
> Fix For: 4.5.1, 4.6
>
> Attachments: quoted.patch, synonym-expander.patch
>
>
> While dealing with synonym at query time, solr failed to work with multi-word 
> synonyms due to some reasons:
> - First the lucene queryparser tokenizes user query by space so it split 
> multi-word term into two terms before feeding to synonym filter, so synonym 
> filter can't recognized multi-word term to do expansion
> - Second, if synonym filter expand into multiple terms which contains 
> multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
> handle synonyms. But MultiPhraseQuery don't work with term have different 
> number of words.
> For the first one, we can extend quoted all multi-word synonym in user query 
> so that lucene queryparser don't split it. There are a jira task related to 
> this one https://issues.apache.org/jira/browse/LUCENE-2605.
> For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
> SHOULD which contains multiple PhraseQuery in case tokens stream have 
> multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5379) Query-time multi-word synonym expansion

2014-02-06 Thread Tien Nguyen Manh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13894248#comment-13894248
 ] 

Tien Nguyen Manh edited comment on SOLR-5379 at 2/7/14 7:03 AM:


[~markus17] It is not the desired behavious!.

your result above in first example with sync [seabiscuit,sea biscit,biscit]
q=sea biscit => (+(DisjunctionMaxQuery((name:sea)) 
DisjunctionMaxQuery(((name:seabiscuit name:"sea biscit" 
name:biscit)/no_coord

seem the default behaviour (without the SynonymQuotedDismaxQParser).
After using SynonymQuotedDismaxQParser, it should be the same result for all 
three queries q=biscit, q=seabiscuit, q=sea biscit



was (Author: tiennm):
[~markus17] It is not the desired behavious!.

your result above in first example with sync [seabiscuit,sea biscit,biscit]
q=sea biscit => (+(DisjunctionMaxQuery((name:sea)) 
DisjunctionMaxQuery(((name:seabiscuit name:"sea biscit" 
name:biscit)/no_coord

seem the default behaviour (without the patch).
After appling the patch, it should be the same result for all three queries 
q=biscit, q=seabiscuit, q=sea biscit


> Query-time multi-word synonym expansion
> ---
>
> Key: SOLR-5379
> URL: https://issues.apache.org/jira/browse/SOLR-5379
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Tien Nguyen Manh
>  Labels: multi-word, queryparser, synonym
> Fix For: 4.7
>
> Attachments: quoted.patch, synonym-expander.patch
>
>
> While dealing with synonym at query time, solr failed to work with multi-word 
> synonyms due to some reasons:
> - First the lucene queryparser tokenizes user query by space so it split 
> multi-word term into two terms before feeding to synonym filter, so synonym 
> filter can't recognized multi-word term to do expansion
> - Second, if synonym filter expand into multiple terms which contains 
> multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
> handle synonyms. But MultiPhraseQuery don't work with term have different 
> number of words.
> For the first one, we can extend quoted all multi-word synonym in user query 
> so that lucene queryparser don't split it. There are a jira task related to 
> this one https://issues.apache.org/jira/browse/LUCENE-2605.
> For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
> SHOULD which contains multiple PhraseQuery in case tokens stream have 
> multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5379) Query-time multi-word synonym expansion

2014-06-06 Thread Jeremy Anderson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14020221#comment-14020221
 ] 

Jeremy Anderson edited comment on SOLR-5379 at 6/6/14 6:55 PM:
---

I'm in the process of trying to get this logic ported into the 4.8.1 Released 
Tag.  I believe I've gotten the code ported over, but am having problems 
getting the unit test to run to confirm the correctness of the port.  The main 
reason is the differences in the conf/solrconfig.xml and conf/schema.xml files 
that exist in the root and I'm guessing those used by Tien when the 4.5.0 patch 
was created.  

I'm still a SOLR novice so I'm not quite sure how to properly replicate the 
schema and configuration settings to get the unit test to run.  I'm going to 
attach patch files shortly for the 4.8.1 code base along with the current 
stubbed out configuration files.

Any help anyone can provide would be greatly appreciated.  My end goal is to 
hopefully be able to get the multi-term synonym expansion logic to work with a 
4.8.1 deployment where we're using an extended version of the SolrQueryParser.  
(I'm not sure if the multi-term synonym logic is only usable with this patch by 
the new SynonymQuotedDismaxQParser or existing DismaxQarsers).

Notes on 4.8.1 port:
* There is now 2 parsers usable by the FSTSynonymFilterFactory: 
SolrSynonymParser & WordnetSynonymParser.  The latter of which I'm not sure if 
any additional logic needs to be implemented for proper usage of the tokenize 
parameter.
* All of the logic implemented in SolrQueryParserBase from 4.5.0 has now been 
moved into the utility QueryBuilder class.



was (Author: rpialum):
I'm in the process of trying to get this logic ported into the 4.8.1 Released 
Tag.  I believe I've gotten the code ported over, but am having problems 
getting the unit test to run to confirm the correctness of the port.  The main 
reason is the differences in the conf/solrconfig.xml and conf/schema.xml files 
that exist in the root and I'm guessing those used by Tien when the 4.5.0 patch 
was created.  

I'm still a SOLR novice so I'm not quite sure how to properly replicate the 
schema and configuration settings to get the unit test to run.  I'm going to 
attach patch files shortly for the 4.8.1 code base along with the current 
stubbed out configuration files.

Any help anyone can provide would be greatly appreciated.  My end goal is to 
hopefully be able to get the multi-term synonym expansion logic to work with a 
4.8.1 deployment where we're using an extended version of the SolrQueryParser.  
(I'm not sure if the multi-term synonym logic is only usable with this patch by 
the new SynonymQuotedDismaxQParser or existing DismaxQarsers).

Notes on 4.8.1 port:
* There is now 2 parsers usable by the FSTSynonymFilterFactory: 
SolrSynonymParser & WordnetSynonymParser.  The later of which I'm not sure if 
any additional logic needs to be implemented for proper usage of the tokenize 
parameter.
* All of the logic implemented in SolrQueryParserBase from 4.5.0 has now been 
moved into the utility QueryBuilder class.


> Query-time multi-word synonym expansion
> ---
>
> Key: SOLR-5379
> URL: https://issues.apache.org/jira/browse/SOLR-5379
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Tien Nguyen Manh
>  Labels: multi-word, queryparser, synonym
> Fix For: 4.9, 5.0
>
> Attachments: conf-test-files-4_8_1.patch, quoted-4_8_1.patch, 
> quoted.patch, synonym-expander-4_8_1.patch, synonym-expander.patch
>
>
> While dealing with synonym at query time, solr failed to work with multi-word 
> synonyms due to some reasons:
> - First the lucene queryparser tokenizes user query by space so it split 
> multi-word term into two terms before feeding to synonym filter, so synonym 
> filter can't recognized multi-word term to do expansion
> - Second, if synonym filter expand into multiple terms which contains 
> multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
> handle synonyms. But MultiPhraseQuery don't work with term have different 
> number of words.
> For the first one, we can extend quoted all multi-word synonym in user query 
> so that lucene queryparser don't split it. There are a jira task related to 
> this one https://issues.apache.org/jira/browse/LUCENE-2605.
> For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
> SHOULD which contains multiple PhraseQuery in case tokens stream have 
> multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org