RE: Why do FQs make my spelling suggestions so slow?
It has been in the wiki, more or less. See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.count and following sections. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Wednesday, May 29, 2013 12:41 PM To: solr-user@lucene.apache.org Subject: Re: Why do FQs make my spelling suggestions so slow? James, this is very useful information. Can you please add this to the wiki? On Wed, May 29, 2013 at 10:36 PM, Dyer, James wrote: > Instead of "maxCollationTries=0", use a value greater than zero. Zero > means not to check if the collation will return hits. 1 means to test 1 > possible combination against the index and return it only if it returns > hits. 2 tries up to 2 possibilities, etc. As you have > "spellcheck.maxCollations=8", you'll probably want maxCollationTries at > least that large. Maybe 10-20 would be better. Make it as low as possible > to get generally good results, or as high as possible before the > performance on a query with many misspelled words gets too bad. > > Also, use a spellcheck.count greater than 2. This is as many corrections > per misspelled term you want it to consider. If using > DirectSolrSpellChecker, you can have it set low, 5-10 might be good. If > using IndexBased- or FileBased spell checkers, use at least 10. > > Also, do not use "onlyMorePopular" unless you indeed want every term in > the user's query to be replaced with higher-frequency terms (even > correctly-spelled terms get replaced). If you want it to suggest even for > words that are in the dictionary, try "spellcheck.alternativeTermCount" > instead. Try setting it to about half of "spellcheck.count" (but at least > 10 if using IndexBased- or FileBased spell checkers). > > James Dyer > Ingram Content Group > (615) 213-4311 > > > -Original Message- > From: Nicholas Fellows [mailto:n...@djdownload.com] > Sent: Wednesday, May 29, 2013 11:06 AM > To: solr-user@lucene.apache.org > Subject: Re: Why do FQs make my spelling suggestions so slow? > > I also have problems getting the solrspellchecker to utilise existing FQ > params correctly. > we have some fairly monster queries > > eg : http://pastebin.com/4XzGpfeC > > I cannot seem to get our FQ parameters to be honored when generating > results. > In essence i am getting collations that yield no results when the filter > query is applied. > > We have items that are by default not shown when out of stock or > forthcoming. the user > can select whether to show these or not. > > Is there something wrong with my query or perhaps my use case is not > supported? > > Im using nested query and local params etc > > Would very much appreciate some assistance on this one as 2days worth of > hacking, and pestering > people on IRC have not yet yeilded a solution for me. Im not even sure what > i am trying > is even possible! Some sort of clarification on this would really help! > > Cheers > > Nick... > > > > > On 29 May 2013 15:57, Andy Lester wrote: > > > > > On May 29, 2013, at 9:46 AM, "Dyer, James" > > > wrote: > > > > > Just an instanity check, I see I had misspelled "maxCollations" as > > "maxCollation" in my prior response. When you tested with this set the > > same as "maxCollationTries", did you correct my spelling? > > > > Yes, definitely. > > > > Thanks for the ticket. I am looking at the effects of turning on > > spellcheck.onlyMorePopular to true, which reduces the number of > collations > > it seems to do, but doesn't affect the underlying question of "is the > > spellchecker doing FQs properly?" > > > > Thanks, > > Andy > > > > -- > > Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance > > > > > > > -- > Nick Fellows > DJdownload.com > --- > 10 Greenland Street > London > NW10ND > United Kingdom > --- > n...@djdownload.com (E) > > --- > www.djdownload.com > > -- Regards, Shalin Shekhar Mangar.
Re: Why do FQs make my spelling suggestions so slow?
James, this is very useful information. Can you please add this to the wiki? On Wed, May 29, 2013 at 10:36 PM, Dyer, James wrote: > Instead of "maxCollationTries=0", use a value greater than zero. Zero > means not to check if the collation will return hits. 1 means to test 1 > possible combination against the index and return it only if it returns > hits. 2 tries up to 2 possibilities, etc. As you have > "spellcheck.maxCollations=8", you'll probably want maxCollationTries at > least that large. Maybe 10-20 would be better. Make it as low as possible > to get generally good results, or as high as possible before the > performance on a query with many misspelled words gets too bad. > > Also, use a spellcheck.count greater than 2. This is as many corrections > per misspelled term you want it to consider. If using > DirectSolrSpellChecker, you can have it set low, 5-10 might be good. If > using IndexBased- or FileBased spell checkers, use at least 10. > > Also, do not use "onlyMorePopular" unless you indeed want every term in > the user's query to be replaced with higher-frequency terms (even > correctly-spelled terms get replaced). If you want it to suggest even for > words that are in the dictionary, try "spellcheck.alternativeTermCount" > instead. Try setting it to about half of "spellcheck.count" (but at least > 10 if using IndexBased- or FileBased spell checkers). > > James Dyer > Ingram Content Group > (615) 213-4311 > > > -Original Message- > From: Nicholas Fellows [mailto:n...@djdownload.com] > Sent: Wednesday, May 29, 2013 11:06 AM > To: solr-user@lucene.apache.org > Subject: Re: Why do FQs make my spelling suggestions so slow? > > I also have problems getting the solrspellchecker to utilise existing FQ > params correctly. > we have some fairly monster queries > > eg : http://pastebin.com/4XzGpfeC > > I cannot seem to get our FQ parameters to be honored when generating > results. > In essence i am getting collations that yield no results when the filter > query is applied. > > We have items that are by default not shown when out of stock or > forthcoming. the user > can select whether to show these or not. > > Is there something wrong with my query or perhaps my use case is not > supported? > > Im using nested query and local params etc > > Would very much appreciate some assistance on this one as 2days worth of > hacking, and pestering > people on IRC have not yet yeilded a solution for me. Im not even sure what > i am trying > is even possible! Some sort of clarification on this would really help! > > Cheers > > Nick... > > > > > On 29 May 2013 15:57, Andy Lester wrote: > > > > > On May 29, 2013, at 9:46 AM, "Dyer, James" > > > wrote: > > > > > Just an instanity check, I see I had misspelled "maxCollations" as > > "maxCollation" in my prior response. When you tested with this set the > > same as "maxCollationTries", did you correct my spelling? > > > > Yes, definitely. > > > > Thanks for the ticket. I am looking at the effects of turning on > > spellcheck.onlyMorePopular to true, which reduces the number of > collations > > it seems to do, but doesn't affect the underlying question of "is the > > spellchecker doing FQs properly?" > > > > Thanks, > > Andy > > > > -- > > Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance > > > > > > > -- > Nick Fellows > DJdownload.com > --- > 10 Greenland Street > London > NW10ND > United Kingdom > --- > n...@djdownload.com (E) > > --- > www.djdownload.com > > -- Regards, Shalin Shekhar Mangar.
RE: Why do FQs make my spelling suggestions so slow?
Instead of "maxCollationTries=0", use a value greater than zero. Zero means not to check if the collation will return hits. 1 means to test 1 possible combination against the index and return it only if it returns hits. 2 tries up to 2 possibilities, etc. As you have "spellcheck.maxCollations=8", you'll probably want maxCollationTries at least that large. Maybe 10-20 would be better. Make it as low as possible to get generally good results, or as high as possible before the performance on a query with many misspelled words gets too bad. Also, use a spellcheck.count greater than 2. This is as many corrections per misspelled term you want it to consider. If using DirectSolrSpellChecker, you can have it set low, 5-10 might be good. If using IndexBased- or FileBased spell checkers, use at least 10. Also, do not use "onlyMorePopular" unless you indeed want every term in the user's query to be replaced with higher-frequency terms (even correctly-spelled terms get replaced). If you want it to suggest even for words that are in the dictionary, try "spellcheck.alternativeTermCount" instead. Try setting it to about half of "spellcheck.count" (but at least 10 if using IndexBased- or FileBased spell checkers). James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Nicholas Fellows [mailto:n...@djdownload.com] Sent: Wednesday, May 29, 2013 11:06 AM To: solr-user@lucene.apache.org Subject: Re: Why do FQs make my spelling suggestions so slow? I also have problems getting the solrspellchecker to utilise existing FQ params correctly. we have some fairly monster queries eg : http://pastebin.com/4XzGpfeC I cannot seem to get our FQ parameters to be honored when generating results. In essence i am getting collations that yield no results when the filter query is applied. We have items that are by default not shown when out of stock or forthcoming. the user can select whether to show these or not. Is there something wrong with my query or perhaps my use case is not supported? Im using nested query and local params etc Would very much appreciate some assistance on this one as 2days worth of hacking, and pestering people on IRC have not yet yeilded a solution for me. Im not even sure what i am trying is even possible! Some sort of clarification on this would really help! Cheers Nick... On 29 May 2013 15:57, Andy Lester wrote: > > On May 29, 2013, at 9:46 AM, "Dyer, James" > wrote: > > > Just an instanity check, I see I had misspelled "maxCollations" as > "maxCollation" in my prior response. When you tested with this set the > same as "maxCollationTries", did you correct my spelling? > > Yes, definitely. > > Thanks for the ticket. I am looking at the effects of turning on > spellcheck.onlyMorePopular to true, which reduces the number of collations > it seems to do, but doesn't affect the underlying question of "is the > spellchecker doing FQs properly?" > > Thanks, > Andy > > -- > Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance > > -- Nick Fellows DJdownload.com --- 10 Greenland Street London NW10ND United Kingdom --- n...@djdownload.com (E) --- www.djdownload.com
Re: Why do FQs make my spelling suggestions so slow?
I also have problems getting the solrspellchecker to utilise existing FQ params correctly. we have some fairly monster queries eg : http://pastebin.com/4XzGpfeC I cannot seem to get our FQ parameters to be honored when generating results. In essence i am getting collations that yield no results when the filter query is applied. We have items that are by default not shown when out of stock or forthcoming. the user can select whether to show these or not. Is there something wrong with my query or perhaps my use case is not supported? Im using nested query and local params etc Would very much appreciate some assistance on this one as 2days worth of hacking, and pestering people on IRC have not yet yeilded a solution for me. Im not even sure what i am trying is even possible! Some sort of clarification on this would really help! Cheers Nick... On 29 May 2013 15:57, Andy Lester wrote: > > On May 29, 2013, at 9:46 AM, "Dyer, James" > wrote: > > > Just an instanity check, I see I had misspelled "maxCollations" as > "maxCollation" in my prior response. When you tested with this set the > same as "maxCollationTries", did you correct my spelling? > > Yes, definitely. > > Thanks for the ticket. I am looking at the effects of turning on > spellcheck.onlyMorePopular to true, which reduces the number of collations > it seems to do, but doesn't affect the underlying question of "is the > spellchecker doing FQs properly?" > > Thanks, > Andy > > -- > Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance > > -- Nick Fellows DJdownload.com --- 10 Greenland Street London NW10ND United Kingdom --- n...@djdownload.com (E) --- www.djdownload.com
Re: Why do FQs make my spelling suggestions so slow?
On May 29, 2013, at 9:46 AM, "Dyer, James" wrote: > Just an instanity check, I see I had misspelled "maxCollations" as > "maxCollation" in my prior response. When you tested with this set the same > as "maxCollationTries", did you correct my spelling? Yes, definitely. Thanks for the ticket. I am looking at the effects of turning on spellcheck.onlyMorePopular to true, which reduces the number of collations it seems to do, but doesn't affect the underlying question of "is the spellchecker doing FQs properly?" Thanks, Andy -- Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance
RE: Why do FQs make my spelling suggestions so slow?
Andy, I opened this ticket so that someone can eventaully invistigate: https://issues.apache.org/jira/browse/SOLR-4874 Just an instanity check, I see I had misspelled "maxCollations" as "maxCollation" in my prior response. When you tested with this set the same as "maxCollationTries", did you correct my spelling? The thought is that by requiring it to return this many collations back, you are guaranteed to make it try the maximum time every time,giving yourself a cleaner test. I am trying to isolate here if spellcheck is not running the queries properly or if the queries just naturally take that long to run over and over again. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Andy Lester [mailto:a...@petdance.com] Sent: Tuesday, May 28, 2013 4:22 PM To: solr-user@lucene.apache.org Subject: Re: Why do FQs make my spelling suggestions so slow? Thanks for looking at this. > What are the QTimes for the 0fq,1fq,2fq,4fq & 4fq cases with spellcheck > entirely turned off? Is it about (or a little more than) half the total when > maxCollationTries=1 ? With spellcheck off I get 8ms for 4fq query. > Also, with the varying # of fq's, how many collation tries does it take to > get 10 collations? I don't know. How can I tell? > Possibly, a better way to test this is to set maxCollations = > maxCollationTries. The reason is that it quits "trying" once it finds > "maxCollations", so if with 0fq's, lots of combinations can generate hits and > it doesn't need to try very many to get to 10. But with more fq's, fewer > collations will pan out so now it is trying more up to 100 before (if ever) > it gets to 10. It does just fine doing 100 collations so long as there are no FQs. It seems to me that the FQs are taking an inordinate amount of extra time. 100 collations in (roughly) the same amount of time as a single collation, so long as there are no FQs. Why are the FQs such a drag on the collation process? > (I'm assuming you have all non-search components like faceting turned off). Yes, definitely. > So say with 2fq's it takes 10ms for the query to complete with spellcheck > off, and 20ms with "maxCollation = maxCollationTries = 1", then it will take > about 110ms with "maxCollation = maxCollationTries = 10". I can do maxCollation = maxCollationTries = 100 and it comes back in 14ms, so long as I have FQs off. Add a single FQ and it becomes 13499ms. I can do maxCollation = maxCollationTries = 1000 and it comes back in 45ms, so long as I have FQs off. Add a single FQ and it becomes 62038ms. > But I think you're just setting maxCollationTries too high. You're asking it > to do too much work in trying teens of combinations. The results I get back with 100 tries are about twice as many as I get with 10 tries. That's a big difference to the user where it's trying to figure misspelled phrases. Andy -- Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance
Re: Why do FQs make my spelling suggestions so slow?
Thanks for looking at this. > What are the QTimes for the 0fq,1fq,2fq,4fq & 4fq cases with spellcheck > entirely turned off? Is it about (or a little more than) half the total when > maxCollationTries=1 ? With spellcheck off I get 8ms for 4fq query. > Also, with the varying # of fq's, how many collation tries does it take to > get 10 collations? I don't know. How can I tell? > Possibly, a better way to test this is to set maxCollations = > maxCollationTries. The reason is that it quits "trying" once it finds > "maxCollations", so if with 0fq's, lots of combinations can generate hits and > it doesn't need to try very many to get to 10. But with more fq's, fewer > collations will pan out so now it is trying more up to 100 before (if ever) > it gets to 10. It does just fine doing 100 collations so long as there are no FQs. It seems to me that the FQs are taking an inordinate amount of extra time. 100 collations in (roughly) the same amount of time as a single collation, so long as there are no FQs. Why are the FQs such a drag on the collation process? > (I'm assuming you have all non-search components like faceting turned off). Yes, definitely. > So say with 2fq's it takes 10ms for the query to complete with spellcheck > off, and 20ms with "maxCollation = maxCollationTries = 1", then it will take > about 110ms with "maxCollation = maxCollationTries = 10". I can do maxCollation = maxCollationTries = 100 and it comes back in 14ms, so long as I have FQs off. Add a single FQ and it becomes 13499ms. I can do maxCollation = maxCollationTries = 1000 and it comes back in 45ms, so long as I have FQs off. Add a single FQ and it becomes 62038ms. > But I think you're just setting maxCollationTries too high. You're asking it > to do too much work in trying teens of combinations. The results I get back with 100 tries are about twice as many as I get with 10 tries. That's a big difference to the user where it's trying to figure misspelled phrases. Andy -- Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance
RE: Why do FQs make my spelling suggestions so slow?
Andy, What are the QTimes for the 0fq,1fq,2fq,4fq & 4fq cases with spellcheck entirely turned off? Is it about (or a little more than) half the total when maxCollationTries=1 ? Also, with the varying # of fq's, how many collation tries does it take to get 10 collations? Possibly, a better way to test this is to set maxCollations = maxCollationTries. The reason is that it quits "trying" once it finds "maxCollations", so if with 0fq's, lots of combinations can generate hits and it doesn't need to try very many to get to 10. But with more fq's, fewer collations will pan out so now it is trying more up to 100 before (if ever) it gets to 10. I would predict that for each "try" it has to do (and you can force this by setting maxCollations = maxCollationTries), qtime will grow linerally per try. (I'm assuming you have all non-search components like faceting turned off). So say with 2fq's it takes 10ms for the query to complete with spellcheck off, and 20ms with "maxCollation = maxCollationTries = 1", then it will take about 110ms with "maxCollation = maxCollationTries = 10". Now if you are finding that with a certain # of fq's, qtime with spellcheck off is, for instance, 2ms, 1 try is 10ms, 2 tries is 19ms, etc, then this is more than liner growth. In this case we would need to look at how spell check applies fq's and see if there is a bug with it using the cache correctly. But I think you're just setting maxCollationTries too high. You're asking it to do too much work in trying teens of combinations. Really, this feature was designed to spellcheck and not suggest. But see https://issues.apache.org/jira/browse/SOLR-3240 , which is committed to the 4x branch for inclusion in an eventual 4.4 release. This will make the time to do collation tries growth less than linear, possibly making it more suitable for suggest. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Andy Lester [mailto:a...@petdance.com] Sent: Tuesday, May 28, 2013 2:29 PM To: solr-user@lucene.apache.org Subject: Why do FQs make my spelling suggestions so slow? I'm working on using spellcheck for giving suggestions, and collations are giving me good results, but they turn out to be very slow if my original query has any FQs in it. We can do 100 maxCollationTries in no time at all, but if there are FQs in the query, things get very slow. As maxCollationTries and the count of FQs increase, things get very slow very quickly. 1102050 100 MaxCollationTries 0FQs 8 9101110 1FQ 11 160 599 1597 1668 2FQs20 346 1163 3360 3361 3FQs29 474 1852 5039 5095 4FQs36 589 2463 6797 6807 All times are QTimes of ms. See that top row? With no FQs, 50 MaxCollationTries comes back instantly. Add just one FQ, though, and things go bad, and they get worse as I add more of the FQs. Also note that things seem to level off at 100 MaxCollationTries. Here's a query that I've been using as a test: df=title_tracings_t& fl=flrid,nodeid,title_tracings_t& q=bagdad+AND+diaries+AND+-parent_tracings:(bagdad+AND+diaries)& spellcheck.q=bagdad+AND+diaries& rows=4& wt=xml& sort=popular_score+desc,+grouping+asc,+copyrightyear+desc,+flrid+asc& spellcheck=true& spellcheck.dictionary=direct& spellcheck.onlyMorePopular=false& spellcheck.count=15& spellcheck.extendedResults=false& spellcheck.collate=true& spellcheck.maxCollations=10& spellcheck.maxCollationTries=50& spellcheck.collateExtendedResults=true& spellcheck.alternativeTermCount=5& spellcheck.maxResultsForSuggest=10& debugQuery=off& fq=((grouping:"1"+OR+grouping:"2"+OR+grouping:"3")+OR+solrtype:"N")& fq=((item_source:"F"+OR+item_source:"B"+OR+item_source:"M")+OR+solrtype:"N")& fq={!tag%3Dgrouping}((grouping:"1"+OR+grouping:"2")+OR+solrtype:"N")& fq={!tag%3Dlanguagecode}(languagecode:"eng"+OR+solrtype:"N")& The only thing that changes between tests is the value of spellcheck.maxCollationTries and how many FQs are at the end. Am I doing something wrong? Do the collation internals not handle FQs correctly? The lookup/hit counts on filterCache seem to be increasing just fine. It will do N lookups, N hits, so I'm not thinking that caching is the problem. We'd really like to be able to use the spellchecker but the results with only 10-20 maxCollationTries aren't nearly as good as if we can bump that up to 100, but we can't afford the slow response time. We also can't do without the FQs. Thanks, Andy -- Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance