[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2013-11-07 Thread Bill Steele (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816243#comment-13816243
 ] 

Bill Steele commented on SOLR-5379:
---

Hi Markus,

Which other multi-word synonym patches are you referring to?  We tested the 
Solr functionality built into Solr 4.2.1 which didn't support multiple word 
synonyms, and as far as I know that issue still exists in the latest.  This 
patch fixes that issue for us.  Not aware of other alternatives available.

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Nguyen Manh Tien
  Labels: multi-word, queryparser, synonym
 Fix For: 4.5.1, 4.6

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2013-11-05 Thread Bill Steele (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814281#comment-13814281
 ] 

Bill Steele commented on SOLR-5379:
---

We found this to be much more useful code for multiword synonyms.  We ran some 
tests, and when having a synonym set such as:

seabiscuit, sea biscuit, sea biscit

Search on the following:

seabiscuit article

Returned matches with the following terms

Sea biscit article
Sea biscuit article
Seabiscuit article
Biscuit Sea article
Sea article
Biscit article

With this patch, the above search query just returned the terms:

Sea biscit article
Sea biscuit article
Seabiscuit article



 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Nguyen Manh Tien
  Labels: multi-word, queryparser, synonym
 Fix For: 4.5.1, 4.6

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org