[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805179#comment-13805179 ] Marco Wong commented on SOLR-5379: -- Excuse me, for the synonym-expander.patch, does the updated SolrQueryParserBase will emitting PhraseQuery(Term(a), Term(b)), when I have a ShingleFilter in query time analyzer which emits bigram like Term(a b), which makes my existing tokenization logic fail? Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Nguyen Manh Tien Labels: multi-word, queryparser, synonym Fix For: 4.5.1, 4.6 Attachments: quoted.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805179#comment-13805179 ] Marco Wong edited comment on SOLR-5379 at 10/25/13 8:56 AM: Excuse me, for the synonym-expander.patch, when I have a ShingleFilter in query time analyzer which emits bigram TermQuery like Term(a b), does the updated SolrQueryParserBase will emitting PhraseQuery(Term(a), Term(b)), making my existing tokenization logic fail? was (Author: marcowong): Excuse me, for the synonym-expander.patch, does the updated SolrQueryParserBase will emitting PhraseQuery(Term(a), Term(b)), when I have a ShingleFilter in query time analyzer which emits bigram like Term(a b), and makes my existing tokenization logic fail? Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Nguyen Manh Tien Labels: multi-word, queryparser, synonym Fix For: 4.5.1, 4.6 Attachments: quoted.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805179#comment-13805179 ] Marco Wong edited comment on SOLR-5379 at 10/25/13 8:55 AM: Excuse me, for the synonym-expander.patch, does the updated SolrQueryParserBase will emitting PhraseQuery(Term(a), Term(b)), when I have a ShingleFilter in query time analyzer which emits bigram like Term(a b), and makes my existing tokenization logic fail? was (Author: marcowong): Excuse me, for the synonym-expander.patch, does the updated SolrQueryParserBase will emitting PhraseQuery(Term(a), Term(b)), when I have a ShingleFilter in query time analyzer which emits bigram like Term(a b), which makes my existing tokenization logic fail? Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Nguyen Manh Tien Labels: multi-word, queryparser, synonym Fix For: 4.5.1, 4.6 Attachments: quoted.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org