RE: Why do FQs make my spelling suggestions so slow?

2013-05-29 Thread Dyer, James
It has been in the wiki, more or less.  See 
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.count and following 
sections.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] 
Sent: Wednesday, May 29, 2013 12:41 PM
To: solr-user@lucene.apache.org
Subject: Re: Why do FQs make my spelling suggestions so slow?

James, this is very useful information. Can you please add this to the wiki?


On Wed, May 29, 2013 at 10:36 PM, Dyer, James
wrote:

> Instead of "maxCollationTries=0", use a value greater than zero.  Zero
> means not to check if the collation will return hits.  1 means to test 1
> possible combination against the index and return it only if it returns
> hits.  2 tries up to 2 possibilities, etc.  As you have
> "spellcheck.maxCollations=8", you'll probably want maxCollationTries at
> least that large.  Maybe 10-20 would be better.  Make it as low as possible
> to get generally good results, or as high as possible before the
> performance on a query with many misspelled words gets too bad.
>
> Also, use a spellcheck.count greater than 2.  This is as many corrections
> per misspelled term you want it to consider.  If using
> DirectSolrSpellChecker, you can have it set low, 5-10 might be good.  If
> using IndexBased- or FileBased spell checkers, use at least 10.
>
> Also, do not use "onlyMorePopular" unless you indeed want every term in
> the user's query to be replaced with higher-frequency terms (even
> correctly-spelled terms get replaced).  If you want it to suggest even for
> words that are in the dictionary, try "spellcheck.alternativeTermCount"
> instead.  Try setting it to about half of "spellcheck.count" (but at least
> 10 if using IndexBased- or FileBased spell checkers).
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: Nicholas Fellows [mailto:n...@djdownload.com]
> Sent: Wednesday, May 29, 2013 11:06 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Why do FQs make my spelling suggestions so slow?
>
> I also have problems getting the solrspellchecker to utilise existing FQ
> params correctly.
> we have some fairly monster queries
>
> eg : http://pastebin.com/4XzGpfeC
>
> I cannot seem to get our FQ parameters to be honored when generating
> results.
> In essence i am getting collations that yield no results when the filter
> query is applied.
>
> We have items that are by default not shown when out of stock or
> forthcoming. the user
> can select whether to show these or not.
>
> Is there something wrong with my query or perhaps my use case is not
> supported?
>
> Im using nested query and local params etc
>
> Would very much appreciate some assistance on this one as 2days worth of
> hacking, and pestering
> people on IRC have not yet yeilded a solution for me. Im not even sure what
> i am trying
> is even possible! Some sort of clarification on this would really help!
>
> Cheers
>
> Nick...
>
>
>
>
> On 29 May 2013 15:57, Andy Lester  wrote:
>
> >
> > On May 29, 2013, at 9:46 AM, "Dyer, James"  >
> > wrote:
> >
> > > Just an instanity check, I see I had misspelled "maxCollations" as
> > "maxCollation" in my prior response.  When you tested with this set the
> > same as "maxCollationTries", did you correct my spelling?
> >
> > Yes, definitely.
> >
> > Thanks for the ticket.  I am looking at the effects of turning on
> > spellcheck.onlyMorePopular to true, which reduces the number of
> collations
> > it seems to do, but doesn't affect the underlying question of "is the
> > spellchecker doing FQs properly?"
> >
> > Thanks,
> > Andy
> >
> > --
> > Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance
> >
> >
>
>
> --
> Nick Fellows
> DJdownload.com
> ---
> 10 Greenland Street
> London
> NW10ND
> United Kingdom
> ---
> n...@djdownload.com (E)
>
> ---
> www.djdownload.com
>
>


-- 
Regards,
Shalin Shekhar Mangar.


Re: Why do FQs make my spelling suggestions so slow?

2013-05-29 Thread Shalin Shekhar Mangar
James, this is very useful information. Can you please add this to the wiki?


On Wed, May 29, 2013 at 10:36 PM, Dyer, James
wrote:

> Instead of "maxCollationTries=0", use a value greater than zero.  Zero
> means not to check if the collation will return hits.  1 means to test 1
> possible combination against the index and return it only if it returns
> hits.  2 tries up to 2 possibilities, etc.  As you have
> "spellcheck.maxCollations=8", you'll probably want maxCollationTries at
> least that large.  Maybe 10-20 would be better.  Make it as low as possible
> to get generally good results, or as high as possible before the
> performance on a query with many misspelled words gets too bad.
>
> Also, use a spellcheck.count greater than 2.  This is as many corrections
> per misspelled term you want it to consider.  If using
> DirectSolrSpellChecker, you can have it set low, 5-10 might be good.  If
> using IndexBased- or FileBased spell checkers, use at least 10.
>
> Also, do not use "onlyMorePopular" unless you indeed want every term in
> the user's query to be replaced with higher-frequency terms (even
> correctly-spelled terms get replaced).  If you want it to suggest even for
> words that are in the dictionary, try "spellcheck.alternativeTermCount"
> instead.  Try setting it to about half of "spellcheck.count" (but at least
> 10 if using IndexBased- or FileBased spell checkers).
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: Nicholas Fellows [mailto:n...@djdownload.com]
> Sent: Wednesday, May 29, 2013 11:06 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Why do FQs make my spelling suggestions so slow?
>
> I also have problems getting the solrspellchecker to utilise existing FQ
> params correctly.
> we have some fairly monster queries
>
> eg : http://pastebin.com/4XzGpfeC
>
> I cannot seem to get our FQ parameters to be honored when generating
> results.
> In essence i am getting collations that yield no results when the filter
> query is applied.
>
> We have items that are by default not shown when out of stock or
> forthcoming. the user
> can select whether to show these or not.
>
> Is there something wrong with my query or perhaps my use case is not
> supported?
>
> Im using nested query and local params etc
>
> Would very much appreciate some assistance on this one as 2days worth of
> hacking, and pestering
> people on IRC have not yet yeilded a solution for me. Im not even sure what
> i am trying
> is even possible! Some sort of clarification on this would really help!
>
> Cheers
>
> Nick...
>
>
>
>
> On 29 May 2013 15:57, Andy Lester  wrote:
>
> >
> > On May 29, 2013, at 9:46 AM, "Dyer, James"  >
> > wrote:
> >
> > > Just an instanity check, I see I had misspelled "maxCollations" as
> > "maxCollation" in my prior response.  When you tested with this set the
> > same as "maxCollationTries", did you correct my spelling?
> >
> > Yes, definitely.
> >
> > Thanks for the ticket.  I am looking at the effects of turning on
> > spellcheck.onlyMorePopular to true, which reduces the number of
> collations
> > it seems to do, but doesn't affect the underlying question of "is the
> > spellchecker doing FQs properly?"
> >
> > Thanks,
> > Andy
> >
> > --
> > Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance
> >
> >
>
>
> --
> Nick Fellows
> DJdownload.com
> ---
> 10 Greenland Street
> London
> NW10ND
> United Kingdom
> ---
> n...@djdownload.com (E)
>
> ---
> www.djdownload.com
>
>


-- 
Regards,
Shalin Shekhar Mangar.


RE: Why do FQs make my spelling suggestions so slow?

2013-05-29 Thread Dyer, James
Instead of "maxCollationTries=0", use a value greater than zero.  Zero means 
not to check if the collation will return hits.  1 means to test 1 possible 
combination against the index and return it only if it returns hits.  2 tries 
up to 2 possibilities, etc.  As you have "spellcheck.maxCollations=8", you'll 
probably want maxCollationTries at least that large.  Maybe 10-20 would be 
better.  Make it as low as possible to get generally good results, or as high 
as possible before the performance on a query with many misspelled words gets 
too bad.

Also, use a spellcheck.count greater than 2.  This is as many corrections per 
misspelled term you want it to consider.  If using DirectSolrSpellChecker, you 
can have it set low, 5-10 might be good.  If using IndexBased- or FileBased 
spell checkers, use at least 10.

Also, do not use "onlyMorePopular" unless you indeed want every term in the 
user's query to be replaced with higher-frequency terms (even correctly-spelled 
terms get replaced).  If you want it to suggest even for words that are in the 
dictionary, try "spellcheck.alternativeTermCount" instead.  Try setting it to 
about half of "spellcheck.count" (but at least 10 if using IndexBased- or 
FileBased spell checkers).

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Nicholas Fellows [mailto:n...@djdownload.com] 
Sent: Wednesday, May 29, 2013 11:06 AM
To: solr-user@lucene.apache.org
Subject: Re: Why do FQs make my spelling suggestions so slow?

I also have problems getting the solrspellchecker to utilise existing FQ
params correctly.
we have some fairly monster queries

eg : http://pastebin.com/4XzGpfeC

I cannot seem to get our FQ parameters to be honored when generating
results.
In essence i am getting collations that yield no results when the filter
query is applied.

We have items that are by default not shown when out of stock or
forthcoming. the user
can select whether to show these or not.

Is there something wrong with my query or perhaps my use case is not
supported?

Im using nested query and local params etc

Would very much appreciate some assistance on this one as 2days worth of
hacking, and pestering
people on IRC have not yet yeilded a solution for me. Im not even sure what
i am trying
is even possible! Some sort of clarification on this would really help!

Cheers

Nick...




On 29 May 2013 15:57, Andy Lester  wrote:

>
> On May 29, 2013, at 9:46 AM, "Dyer, James" 
> wrote:
>
> > Just an instanity check, I see I had misspelled "maxCollations" as
> "maxCollation" in my prior response.  When you tested with this set the
> same as "maxCollationTries", did you correct my spelling?
>
> Yes, definitely.
>
> Thanks for the ticket.  I am looking at the effects of turning on
> spellcheck.onlyMorePopular to true, which reduces the number of collations
> it seems to do, but doesn't affect the underlying question of "is the
> spellchecker doing FQs properly?"
>
> Thanks,
> Andy
>
> --
> Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance
>
>


-- 
Nick Fellows
DJdownload.com
---
10 Greenland Street
London
NW10ND
United Kingdom
---
n...@djdownload.com (E)

---
www.djdownload.com



Re: Why do FQs make my spelling suggestions so slow?

2013-05-29 Thread Nicholas Fellows
I also have problems getting the solrspellchecker to utilise existing FQ
params correctly.
we have some fairly monster queries

eg : http://pastebin.com/4XzGpfeC

I cannot seem to get our FQ parameters to be honored when generating
results.
In essence i am getting collations that yield no results when the filter
query is applied.

We have items that are by default not shown when out of stock or
forthcoming. the user
can select whether to show these or not.

Is there something wrong with my query or perhaps my use case is not
supported?

Im using nested query and local params etc

Would very much appreciate some assistance on this one as 2days worth of
hacking, and pestering
people on IRC have not yet yeilded a solution for me. Im not even sure what
i am trying
is even possible! Some sort of clarification on this would really help!

Cheers

Nick...




On 29 May 2013 15:57, Andy Lester  wrote:

>
> On May 29, 2013, at 9:46 AM, "Dyer, James" 
> wrote:
>
> > Just an instanity check, I see I had misspelled "maxCollations" as
> "maxCollation" in my prior response.  When you tested with this set the
> same as "maxCollationTries", did you correct my spelling?
>
> Yes, definitely.
>
> Thanks for the ticket.  I am looking at the effects of turning on
> spellcheck.onlyMorePopular to true, which reduces the number of collations
> it seems to do, but doesn't affect the underlying question of "is the
> spellchecker doing FQs properly?"
>
> Thanks,
> Andy
>
> --
> Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance
>
>


-- 
Nick Fellows
DJdownload.com
---
10 Greenland Street
London
NW10ND
United Kingdom
---
n...@djdownload.com (E)

---
www.djdownload.com


Re: Why do FQs make my spelling suggestions so slow?

2013-05-29 Thread Andy Lester

On May 29, 2013, at 9:46 AM, "Dyer, James"  wrote:

> Just an instanity check, I see I had misspelled "maxCollations" as 
> "maxCollation" in my prior response.  When you tested with this set the same 
> as "maxCollationTries", did you correct my spelling?

Yes, definitely.

Thanks for the ticket.  I am looking at the effects of turning on 
spellcheck.onlyMorePopular to true, which reduces the number of collations it 
seems to do, but doesn't affect the underlying question of "is the spellchecker 
doing FQs properly?"

Thanks,
Andy

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance



RE: Why do FQs make my spelling suggestions so slow?

2013-05-29 Thread Dyer, James
Andy,

I opened this ticket so that someone can eventaully invistigate: 
https://issues.apache.org/jira/browse/SOLR-4874

Just an instanity check, I see I had misspelled "maxCollations" as 
"maxCollation" in my prior response.  When you tested with this set the same as 
"maxCollationTries", did you correct my spelling?  The thought is that by 
requiring it to return this many collations back, you are guaranteed to make it 
try the maximum time every time,giving yourself a cleaner test.  I am trying to 
isolate here if spellcheck is not running the queries properly or if the 
queries just naturally take that long to run over and over again.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Andy Lester [mailto:a...@petdance.com] 
Sent: Tuesday, May 28, 2013 4:22 PM
To: solr-user@lucene.apache.org
Subject: Re: Why do FQs make my spelling suggestions so slow?

Thanks for looking at this.

> What are the QTimes for the 0fq,1fq,2fq,4fq & 4fq cases with spellcheck 
> entirely turned off?  Is it about (or a little more than) half the total when 
> maxCollationTries=1 ?

With spellcheck off I get 8ms for 4fq query.


>  Also, with the varying # of fq's, how many collation tries does it take to 
> get 10 collations?

I don't know.  How can I tell?


> Possibly, a better way to test this is to set maxCollations = 
> maxCollationTries.  The reason is that it quits "trying" once it finds 
> "maxCollations", so if with 0fq's, lots of combinations can generate hits and 
> it doesn't need to try very many to get to 10.  But with more fq's, fewer 
> collations will pan out so now it is trying more up to 100 before (if ever) 
> it gets to 10.

It does just fine doing 100 collations so long as there are no FQs.  It seems 
to me that the FQs are taking an inordinate amount of extra time.  100 
collations in (roughly) the same amount of time as a single collation, so long 
as there are no FQs.  Why are the FQs such a drag on the collation process?


> (I'm assuming you have all non-search components like faceting turned off).

Yes, definitely.


>  So say with 2fq's it takes 10ms for the query to complete with spellcheck 
> off, and 20ms with "maxCollation = maxCollationTries = 1", then it will take 
> about 110ms with "maxCollation = maxCollationTries = 10".

I can do maxCollation = maxCollationTries = 100 and it comes back in 14ms, so 
long as I have FQs off.  Add a single FQ and it becomes 13499ms.

I can do maxCollation = maxCollationTries = 1000 and it comes back in 45ms, so 
long as I have FQs off.  Add a single FQ and it becomes 62038ms.


> But I think you're just setting maxCollationTries too high.  You're asking it 
> to do too much work in trying teens of combinations.

The results I get back with 100 tries are about twice as many as I get with 10 
tries.  That's a big difference to the user where it's trying to figure 
misspelled phrases.

Andy

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance





Re: Why do FQs make my spelling suggestions so slow?

2013-05-28 Thread Andy Lester
Thanks for looking at this.

> What are the QTimes for the 0fq,1fq,2fq,4fq & 4fq cases with spellcheck 
> entirely turned off?  Is it about (or a little more than) half the total when 
> maxCollationTries=1 ?

With spellcheck off I get 8ms for 4fq query.


>  Also, with the varying # of fq's, how many collation tries does it take to 
> get 10 collations?

I don't know.  How can I tell?


> Possibly, a better way to test this is to set maxCollations = 
> maxCollationTries.  The reason is that it quits "trying" once it finds 
> "maxCollations", so if with 0fq's, lots of combinations can generate hits and 
> it doesn't need to try very many to get to 10.  But with more fq's, fewer 
> collations will pan out so now it is trying more up to 100 before (if ever) 
> it gets to 10.

It does just fine doing 100 collations so long as there are no FQs.  It seems 
to me that the FQs are taking an inordinate amount of extra time.  100 
collations in (roughly) the same amount of time as a single collation, so long 
as there are no FQs.  Why are the FQs such a drag on the collation process?


> (I'm assuming you have all non-search components like faceting turned off).

Yes, definitely.


>  So say with 2fq's it takes 10ms for the query to complete with spellcheck 
> off, and 20ms with "maxCollation = maxCollationTries = 1", then it will take 
> about 110ms with "maxCollation = maxCollationTries = 10".

I can do maxCollation = maxCollationTries = 100 and it comes back in 14ms, so 
long as I have FQs off.  Add a single FQ and it becomes 13499ms.

I can do maxCollation = maxCollationTries = 1000 and it comes back in 45ms, so 
long as I have FQs off.  Add a single FQ and it becomes 62038ms.


> But I think you're just setting maxCollationTries too high.  You're asking it 
> to do too much work in trying teens of combinations.

The results I get back with 100 tries are about twice as many as I get with 10 
tries.  That's a big difference to the user where it's trying to figure 
misspelled phrases.

Andy

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance



RE: Why do FQs make my spelling suggestions so slow?

2013-05-28 Thread Dyer, James
Andy,

What are the QTimes for the 0fq,1fq,2fq,4fq & 4fq cases with spellcheck 
entirely turned off?  Is it about (or a little more than) half the total when 
maxCollationTries=1 ?  Also, with the varying # of fq's, how many collation 
tries does it take to get 10 collations?

Possibly, a better way to test this is to set maxCollations = 
maxCollationTries.  The reason is that it quits "trying" once it finds 
"maxCollations", so if with 0fq's, lots of combinations can generate hits and 
it doesn't need to try very many to get to 10.  But with more fq's, fewer 
collations will pan out so now it is trying more up to 100 before (if ever) it 
gets to 10.

I would predict that for each "try" it has to do (and you can force this by 
setting maxCollations = maxCollationTries), qtime will grow linerally per try.  
(I'm assuming you have all non-search components like faceting turned off).  So 
say with 2fq's it takes 10ms for the query to complete with spellcheck off, and 
20ms with "maxCollation = maxCollationTries = 1", then it will take about 110ms 
with "maxCollation = maxCollationTries = 10".

Now if you are finding that with a certain # of fq's, qtime with spellcheck off 
is, for instance, 2ms, 1 try is 10ms, 2 tries is 19ms, etc, then this is more 
than liner growth.  In this case we would need to look at how spell check 
applies fq's and see if there is a bug with it using the cache correctly.

But I think you're just setting maxCollationTries too high.  You're asking it 
to do too much work in trying teens of combinations.  Really, this feature was 
designed to spellcheck and not suggest.  But see 
https://issues.apache.org/jira/browse/SOLR-3240 , which is committed to the 4x 
branch for inclusion in an eventual 4.4 release.  This will make the time to do 
collation tries growth less than linear, possibly making it more suitable for 
suggest.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Andy Lester [mailto:a...@petdance.com] 
Sent: Tuesday, May 28, 2013 2:29 PM
To: solr-user@lucene.apache.org
Subject: Why do FQs make my spelling suggestions so slow?

I'm working on using spellcheck for giving suggestions, and collations
are giving me good results, but they turn out to be very slow if
my original query has any FQs in it.  We can do 100 maxCollationTries
in no time at all, but if there are FQs in the query, things get
very slow.  As maxCollationTries and the count of FQs increase,
things get very slow very quickly.

 1102050   100 MaxCollationTries
0FQs 8 9101110
1FQ 11   160   599  1597  1668
2FQs20   346  1163  3360  3361
3FQs29   474  1852  5039  5095
4FQs36   589  2463  6797  6807

All times are QTimes of ms.

See that top row?  With no FQs, 50 MaxCollationTries comes back
instantly.  Add just one FQ, though, and things go bad, and they
get worse as I add more of the FQs.  Also note that things seem to
level off at 100 MaxCollationTries.

Here's a query that I've been using as a test:

df=title_tracings_t&
fl=flrid,nodeid,title_tracings_t&
q=bagdad+AND+diaries+AND+-parent_tracings:(bagdad+AND+diaries)&
spellcheck.q=bagdad+AND+diaries&
rows=4&
wt=xml&
sort=popular_score+desc,+grouping+asc,+copyrightyear+desc,+flrid+asc&
spellcheck=true&
spellcheck.dictionary=direct&
spellcheck.onlyMorePopular=false&
spellcheck.count=15&
spellcheck.extendedResults=false&
spellcheck.collate=true&
spellcheck.maxCollations=10&
spellcheck.maxCollationTries=50&
spellcheck.collateExtendedResults=true&
spellcheck.alternativeTermCount=5&
spellcheck.maxResultsForSuggest=10&
debugQuery=off&
fq=((grouping:"1"+OR+grouping:"2"+OR+grouping:"3")+OR+solrtype:"N")&
fq=((item_source:"F"+OR+item_source:"B"+OR+item_source:"M")+OR+solrtype:"N")&
fq={!tag%3Dgrouping}((grouping:"1"+OR+grouping:"2")+OR+solrtype:"N")&
fq={!tag%3Dlanguagecode}(languagecode:"eng"+OR+solrtype:"N")&

The only thing that changes between tests is the value of
spellcheck.maxCollationTries and how many FQs are at the end.

Am I doing something wrong?  Do the collation internals not handle
FQs correctly?  The lookup/hit counts on filterCache seem to be
increasing just fine.  It will do N lookups, N hits, so I'm not
thinking that caching is the problem.

We'd really like to be able to use the spellchecker but the results
with only 10-20 maxCollationTries aren't nearly as good as if we
can bump that up to 100, but we can't afford the slow response time.
We also can't do without the FQs.

Thanks,
Andy


--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance