Re: Implementing Autocomplete/Query Suggest using Solr
On Wed, Dec 30, 2009 at 3:07 AM, Prasanna R plistma...@gmail.com wrote: I looked into the Solr/Lucene classes and found the required information. Am summarizing the same for the benefit of those that might refer to this thread in the future. The change I had to make was very simple - make a call to getPrefixQuery instead of getWildcardQuery in my custom-modified Solr dismax query parser class. However, this will make a fairly significant difference in terms of efficiency. The key difference between the lucene WildcardQuery and PrefixQuery lies in their respective term enumerators, specifically in the term comparators. The termCompare method for PrefixQuery is more light-weight than that of WildcardQuery and is essentially an optimization given that a prefix query is nothing but a specialized case of Wildcard query. Also, this is why the lucene query parser automatically creates a PrefixQuery for query terms of the form 'foo*' instead of a WildcardQuery. I don't understand this. There is nothing that one should need to do in Solr's code to make this work. Prefix queries are supported out of the box in Solr. And one final request for Comment to Shalin on this topic - I am guessing you ensured there were no duplicate terms in the field(s) used for autocompletion. For our first version, I am thinking of eliminating the duplicates outside of the results handler that gives suggestions since duplicate suggestions originate only from different document IDs in our system and we do want the list of document IDs matched. Is there a better/different way of doing the same? No, I guess not. -- Regards, Shalin Shekhar Mangar.
Re: Implementing Autocomplete/Query Suggest using Solr
On Mon, Jan 4, 2010 at 1:20 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Wed, Dec 30, 2009 at 3:07 AM, Prasanna R plistma...@gmail.com wrote: I looked into the Solr/Lucene classes and found the required information. Am summarizing the same for the benefit of those that might refer to this thread in the future. The change I had to make was very simple - make a call to getPrefixQuery instead of getWildcardQuery in my custom-modified Solr dismax query parser class. However, this will make a fairly significant difference in terms of efficiency. The key difference between the lucene WildcardQuery and PrefixQuery lies in their respective term enumerators, specifically in the term comparators. The termCompare method for PrefixQuery is more light-weight than that of WildcardQuery and is essentially an optimization given that a prefix query is nothing but a specialized case of Wildcard query. Also, this is why the lucene query parser automatically creates a PrefixQuery for query terms of the form 'foo*' instead of a WildcardQuery. I don't understand this. There is nothing that one should need to do in Solr's code to make this work. Prefix queries are supported out of the box in Solr. I am using the dismax query parser and I match on multiple fields with different boosts. I run a prefix query on some fields in combination with a regular field query on other fields. I do not know of any way in which one could specify a prefix query on a particular field in your dismax query out of the box in Solr 1.4. I had to update Solr to support additional syntax in a dismax query that lets you choose to create a prefix query on a particular field. As part of parsing this custom syntax, I was making a call to the getWildcardQuery which I simply changed to getPrefixQuery. Prasanna.
Re: Implementing Autocomplete/Query Suggest using Solr
We do auto-complete through prefix searches on shingles. Just to confirm, do you mean using EdgeNgram filter to produce letter ngrams of the tokens in the chosen field? No, I'm talking about prefix search on tokens produced by a ShingleFilter. I did not know about the Prefix query parser in Solr. Thanks a lot for pointing out the same. I find relatively little online material about the Solr/Lucene prefix query parser. Kindly point me to any useful resource that I might be missing. I looked into the Solr/Lucene classes and found the required information. Am summarizing the same for the benefit of those that might refer to this thread in the future. The change I had to make was very simple - make a call to getPrefixQuery instead of getWildcardQuery in my custom-modified Solr dismax query parser class. However, this will make a fairly significant difference in terms of efficiency. The key difference between the lucene WildcardQuery and PrefixQuery lies in their respective term enumerators, specifically in the term comparators. The termCompare method for PrefixQuery is more light-weight than that of WildcardQuery and is essentially an optimization given that a prefix query is nothing but a specialized case of Wildcard query. Also, this is why the lucene query parser automatically creates a PrefixQuery for query terms of the form 'foo*' instead of a WildcardQuery. A big thank you to Shalin for providing valuable guidance and insight. And one final request for Comment to Shalin on this topic - I am guessing you ensured there were no duplicate terms in the field(s) used for autocompletion. For our first version, I am thinking of eliminating the duplicates outside of the results handler that gives suggestions since duplicate suggestions originate only from different document IDs in our system and we do want the list of document IDs matched. Is there a better/different way of doing the same? Regards, Prasanna.
Re: Implementing Autocomplete/Query Suggest using Solr
On Wed, Dec 23, 2009 at 10:52 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Thu, Dec 24, 2009 at 2:39 AM, Prasanna R plistma...@gmail.com wrote: On Tue, Dec 22, 2009 at 11:49 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: I am curious how an approach that simply uses the wildcard query functionality on an indexed field would work. It works fine as long as the terms are not repeated across documents. I do not follow why terms repeating across documents would be an issue. As long as you can differentiate between multiple matches and rank them properly it should work right? A prefix search would return documents. If a field X being used for auto-complete has the same value in two documents then the user will see the same value being suggested twice. That is right. I will have to handle removing duplicate values from the results returned by the result handler. We do auto-complete through prefix searches on shingles. Just to confirm, do you mean using EdgeNgram filter to produce letter ngrams of the tokens in the chosen field? No, I'm talking about prefix search on tokens produced by a ShingleFilter. I did not know about the Prefix query parser in Solr. Thanks a lot for pointing out the same. I find relatively little online material about the Solr/Lucene prefix query parser. Kindly point me to any useful resource that I might be missing. Thanks again for all your help. Regards, Prasanna.
RE: Implementing Autocomplete/Query Suggest using Solr
In addition to what Shalin said, you could use the TermsComponent. However you will be better off using the Dismax request handler Ankit -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Wednesday, December 23, 2009 2:49 AM To: solr-user@lucene.apache.org Subject: Re: Implementing Autocomplete/Query Suggest using Solr On Wed, Dec 23, 2009 at 6:14 AM, Prasanna R plistma...@gmail.com wrote: I am curious how an approach that simply uses the wildcard query functionality on an indexed field would work. It works fine as long as the terms are not repeated across documents. While Solr does not support wildcard queries out of the box currently, it will definitely be included in the future and I believe the edismax parser already lets you do that. Solr supports prefix queries and there's a reverse wild card filter in trunk too. We do auto-complete through prefix searches on shingles. -- Regards, Shalin Shekhar Mangar.
Re: Implementing Autocomplete/Query Suggest using Solr
On Tue, Dec 22, 2009 at 11:49 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: I am curious how an approach that simply uses the wildcard query functionality on an indexed field would work. It works fine as long as the terms are not repeated across documents. I do not follow why terms repeating across documents would be an issue. As long as you can differentiate between multiple matches and rank them properly it should work right? While Solr does not support wildcard queries out of the box currently, it will definitely be included in the future and I believe the edismax parser already lets you do that. Solr supports prefix queries and there's a reverse wild card filter in trunk too. Are you referring to facet prefix queries as prefix queries? I looked at reversed wild card filter but think that the regular wild card matching as opposed to leading wild card matching is better suited for an auto-completion feature. We do auto-complete through prefix searches on shingles. Just to confirm, do you mean using EdgeNgram filter to produce letter ngrams of the tokens in the chosen field? Assuming the regular wild card query would also work, any thoughts on how it compares to the EdgeNGram approach in terms of added indexing cost, performance, etc.? Thanks a lot for your valuable inputs/comments. Prasanna.
Re: Implementing Autocomplete/Query Suggest using Solr
On Thu, Dec 24, 2009 at 2:39 AM, Prasanna R plistma...@gmail.com wrote: On Tue, Dec 22, 2009 at 11:49 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: I am curious how an approach that simply uses the wildcard query functionality on an indexed field would work. It works fine as long as the terms are not repeated across documents. I do not follow why terms repeating across documents would be an issue. As long as you can differentiate between multiple matches and rank them properly it should work right? A prefix search would return documents. If a field X being used for auto-complete has the same value in two documents then the user will see the same value being suggested twice. While Solr does not support wildcard queries out of the box currently, it will definitely be included in the future and I believe the edismax parser already lets you do that. Solr supports prefix queries and there's a reverse wild card filter in trunk too. Are you referring to facet prefix queries as prefix queries? I looked at reversed wild card filter but think that the regular wild card matching as opposed to leading wild card matching is better suited for an auto-completion feature. No, I'm talking about regular prefix search e.g. field:val* We do auto-complete through prefix searches on shingles. Just to confirm, do you mean using EdgeNgram filter to produce letter ngrams of the tokens in the chosen field? No, I'm talking about prefix search on tokens produced by a ShingleFilter. Assuming the regular wild card query would also work, any thoughts on how it compares to the EdgeNGram approach in terms of added indexing cost, performance, etc.? With EdgeNGram, you can do phrase (exact) matches which are faster. But if you have a big corpus of terms then EdgeNGramFilter can produce too many tokens. In some places we are using phrase search on n-gram, in other places (with more terms) we opted for prefix search on shingles. -- Regards, Shalin Shekhar Mangar.
Implementing Autocomplete/Query Suggest using Solr
There seem to be a couple of approaches that people have adopted in implementing a query suggestion / auto completion feature using Solr. Depending on the situation, one might use the terms component or go the way of using EdgeNGramFilters and then creating querying the index on the ngrammed field. I also found that there is a bug currently active in JIRA ( http://issues.apache.org/jira/browse/SOLR-1316) for creating an auto suggest component. I am curious how an approach that simply uses the wildcard query functionality on an indexed field would work. While Solr does not support wildcard queries out of the box currently, it will definitely be included in the future and I believe the edismax parser already lets you do that. Would using the wildcard query to implement autocomplete have high overhead and be less efficient than the other approaches? Am I missing anything here? Kindly comment and provide some guidance. Thanks, Prasanna.
Re: Implementing Autocomplete/Query Suggest using Solr
On Wed, Dec 23, 2009 at 6:14 AM, Prasanna R plistma...@gmail.com wrote: I am curious how an approach that simply uses the wildcard query functionality on an indexed field would work. It works fine as long as the terms are not repeated across documents. While Solr does not support wildcard queries out of the box currently, it will definitely be included in the future and I believe the edismax parser already lets you do that. Solr supports prefix queries and there's a reverse wild card filter in trunk too. We do auto-complete through prefix searches on shingles. -- Regards, Shalin Shekhar Mangar.