Re: queryResultMaxDocsCached vs. queryResultWindowSize

2014-09-29 Thread Tom Burton-West
Thanks for your help Yonik and Tomas,

I had several mistaken assumptions about how caching worked which were
resolved by walking through the code in the debugger after reading your
replies.

Tom


On Fri, Sep 26, 2014 at 4:55 PM, Yonik Seeley yo...@heliosearch.com wrote:

 On Fri, Sep 26, 2014 at 4:38 PM, Tom Burton-West tburt...@umich.edu
 wrote:
  Hi Yonik,
 
  I'm still confused.
 
   suspect don't understand how paging and caching interact.  I probably
 need
  to walk through the code.  Is there a unit test that exercises
  SolrIndexSearcher.getDocListC or a good unit test to use as a base to
 write
  one?
 
 
  Part of what confuses me is whether what gets cached always starts at
 row 1
  of results.

 Yes, we always cache from the first row.
 Asking for rows 91-100 requires collecting 1-100 (and it's the latter
 we cache - ignoring deep paging).
 It's also just ids (and optionally scores) that are cached... so
 either 4 bytes or 8 bytes per document cached, depending on if you ask
 for scores back.

 queryWindowSize just rounds up the upper bound.

  I'll try to explain my confusion.
  Using the defaults in the solrconfig example:
  queryResultWindowSize20/queryResultWindowSize
  queryResultMaxDocsCached200/queryResultMaxDocsCached
 
  If I query for start=0, rows =10  Solr fetches 20 results and caches
 them.
  If I query for start =11 rows =10 Solr read rows 11-20 from cache

 Correct.

  What happens when I query for start = 21 rows= 10?
  I thought that Solr would then fetch rows 21-40 into the
 queryResultCache.
  Is this wrong?

 It will result in a cache miss and we'll collect 0-40 and cache that.

  If I query for start =195 rows =10  does Solr cache rows 195-200 but go
 to
  disk for rows over 200 (queryResultMaxDocsCached=200)?   Or does Solr
 skip
  caching altogether for rows over 200

 Probably the latter... it's an edge case so I'd have to check the code
 to know for sure if the check is pre or post rounding up.

 -Yonik
 http://heliosearch.org - native code faceting, facet functions,
 sub-facets, off-heap data

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




Re: queryResultMaxDocsCached vs. queryResultWindowSize

2014-09-26 Thread Tom Burton-West
Hi Yonik,

I'm still confused.

 suspect don't understand how paging and caching interact.  I probably need
to walk through the code.  Is there a unit test that exercises
SolrIndexSearcher.getDocListC
or a good unit test to use as a base to write one?


Part of what confuses me is whether what gets cached always starts at row 1
of results.  I did not think this was true, but your example of start=1
rows = 10 (ie rows 1-through 10010) triggering the
queryResultMacDocsCached limit of 200 makes it sound like the cache always
starts at row 1.  I would have thought that a request for start= 10,000
 rows=10,010 would result in Solr caching rows 10,000-10,020.

I'll try to explain my confusion.
Using the defaults in the solrconfig example:
queryResultWindowSize20/queryResultWindowSize
queryResultMaxDocsCached200/queryResultMaxDocsCached

If I query for start=0, rows =10  Solr fetches 20 results and caches them.
If I query for start =11 rows =10 Solr read rows 11-20 from cache
What happens when I query for start = 21 rows= 10?
I thought that Solr would then fetch rows 21-40 into the queryResultCache.
Is this wrong?

If I query for start =195 rows =10  does Solr cache rows 195-200 but go to
disk for rows over 200 (queryResultMaxDocsCached=200)?   Or does Solr skip
caching altogether for rows over 200



Tom

On Wed, Sep 24, 2014 at 7:12 PM, Yonik Seeley yo...@heliosearch.com wrote:

 On Wed, Sep 24, 2014 at 5:27 PM, Tomás Fernández Löbbe
 tomasflo...@gmail.com wrote:
  I think you are right. I think the name is this because it’s considering
 a
  series of queries paging a result. The first X pages are going to be
 cached,
  but once the limit is reached, no further pages are and the last superset
  that fitted remains in cache.

 I was confused about the confusion ;-)  But your summary seems correct.

 queryResultWindowSize rounds up to a multiple of the window size for
 caching purposes.
 So if you ask for top 10, and the queryResultWindowSize is 20, then
 the top 20 will be cached (so if a user hits next to get to the next
 10, it will still result in a cache hit).

 queryResultMaxDocsCached sets a limit beyond which the resulting docs
 aren't cached (so if a user asks for docs 1 through 10010, we skip
 caching logic).

 -Yonik
 http://heliosearch.org - native code faceting, facet functions,
 sub-facets, off-heap data

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




Re: queryResultMaxDocsCached vs. queryResultWindowSize

2014-09-26 Thread Yonik Seeley
On Fri, Sep 26, 2014 at 4:38 PM, Tom Burton-West tburt...@umich.edu wrote:
 Hi Yonik,

 I'm still confused.

  suspect don't understand how paging and caching interact.  I probably need
 to walk through the code.  Is there a unit test that exercises
 SolrIndexSearcher.getDocListC or a good unit test to use as a base to write
 one?


 Part of what confuses me is whether what gets cached always starts at row 1
 of results.

Yes, we always cache from the first row.
Asking for rows 91-100 requires collecting 1-100 (and it's the latter
we cache - ignoring deep paging).
It's also just ids (and optionally scores) that are cached... so
either 4 bytes or 8 bytes per document cached, depending on if you ask
for scores back.

queryWindowSize just rounds up the upper bound.

 I'll try to explain my confusion.
 Using the defaults in the solrconfig example:
 queryResultWindowSize20/queryResultWindowSize
 queryResultMaxDocsCached200/queryResultMaxDocsCached

 If I query for start=0, rows =10  Solr fetches 20 results and caches them.
 If I query for start =11 rows =10 Solr read rows 11-20 from cache

Correct.

 What happens when I query for start = 21 rows= 10?
 I thought that Solr would then fetch rows 21-40 into the queryResultCache.
 Is this wrong?

It will result in a cache miss and we'll collect 0-40 and cache that.

 If I query for start =195 rows =10  does Solr cache rows 195-200 but go to
 disk for rows over 200 (queryResultMaxDocsCached=200)?   Or does Solr skip
 caching altogether for rows over 200

Probably the latter... it's an edge case so I'd have to check the code
to know for sure if the check is pre or post rounding up.

-Yonik
http://heliosearch.org - native code faceting, facet functions,
sub-facets, off-heap data

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



queryResultMaxDocsCached vs. queryResultWindowSize

2014-09-24 Thread Tom Burton-West
Hello,

No response on the Solr user list so I thought I would try the dev list.


queryResultWindowSize sets the number of documents  to cache for each query
in the queryResult cache.So if you normally output 10 results per page,
and users don't go beyond page 3 of results, you could set
queryResultWindowSize to 30 and the second and third page requests will
read from cache, not from disk.  This is well documented in both the Solr
example solrconfig.xml file and the Solr documentation.

However, the example in solrconfig.xml and the documentation in the
reference manual for Solr 4.10 say that queryResultMaxDocsCached :

sets the maximum number of documents to cache for any entry in the
queryResultCache.

Looking at the code  it appears that the queryResultMaxDocsCached parameter
actually tells Solr not to cache any results list that has a size  over
 queryResultMaxDocsCached:.

From:  SolrIndexSearcher.getDocListC
// lastly, put the superset in the cache if the size is less than or equal
// to queryResultMaxDocsCached
if (key != null  superset.size() = queryResultMaxDocsCached 
!qr.isPartialResults()) {
  queryResultCache.put(key, superset);
}

Deciding whether or not to cache a DocList if its size is over N (where N =
queryResultMaxDocsCached) is very different than caching only N items from
the DocList which is what the current documentation (and the variable name)
implies.

Looking at the JIRA issue https://issues.apache.org/jira/browse/SOLR-291
the original intent was to control memory use and the variable name
originally suggested was  noCacheIfLarger

Can someone please let me know if it is true that the
queryResultMaxDocsCached parameter actually tells Solr not to cache any
results list that contains over the  queryResultMaxDocsCached?

If so, I will add a comment to the Cwiki doc and open a JIRA and submit a
patch to the example file.

I tried to find a test case that excercises SolrIndexSearcher.getDocListC
so I could see how  queryResultWindowSize or queryResultMaxDocsCached
actually work in the debugger but could not find a test case.  Could
someone please point me to a good test case that either excercises
SolrIndexSearcher.getDocListC or would be a good starting point for writing
one?


Tom



---

http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_4_10/solr/example/solr/collection1/conf/solrconfig.xml?revision=1624269view=markup

635 !-- Maximum number of documents to cache for any entry in the
636 queryResultCache.
637 --
638 queryResultMaxDocsCached200/queryResultMaxDocsCached


Re: queryResultMaxDocsCached vs. queryResultWindowSize

2014-09-24 Thread Tomás Fernández Löbbe
I think you are right. I think the name is this because it’s considering a
series of queries paging a result. The first X pages are going to be
cached, but once the limit is reached, no further pages are and the last
superset that fitted remains in cache. At least that’s my understanding.
After a quick look, I couldn’t find a test case for this either.

Tomás

On Wed, Sep 24, 2014 at 11:10 AM, Tom Burton-West tburt...@umich.edu
wrote:

 Hello,

 No response on the Solr user list so I thought I would try the dev list.


 queryResultWindowSize sets the number of documents  to cache for each
 query in the queryResult cache.So if you normally output 10 results per
 page, and users don't go beyond page 3 of results, you could set
 queryResultWindowSize to 30 and the second and third page requests will
 read from cache, not from disk.  This is well documented in both the Solr
 example solrconfig.xml file and the Solr documentation.

 However, the example in solrconfig.xml and the documentation in the
 reference manual for Solr 4.10 say that queryResultMaxDocsCached :

 sets the maximum number of documents to cache for any entry in the
 queryResultCache.

 Looking at the code  it appears that the queryResultMaxDocsCached
 parameter actually tells Solr not to cache any results list that has a size
  over  queryResultMaxDocsCached:.

 From:  SolrIndexSearcher.getDocListC
 // lastly, put the superset in the cache if the size is less than or equal
 // to queryResultMaxDocsCached
 if (key != null  superset.size() = queryResultMaxDocsCached 
 !qr.isPartialResults()) {
   queryResultCache.put(key, superset);
 }

 Deciding whether or not to cache a DocList if its size is over N (where N
 = queryResultMaxDocsCached) is very different than caching only N items
 from the DocList which is what the current documentation (and the variable
 name) implies.

 Looking at the JIRA issue https://issues.apache.org/jira/browse/SOLR-291
 the original intent was to control memory use and the variable name
 originally suggested was  noCacheIfLarger

 Can someone please let me know if it is true that the
 queryResultMaxDocsCached parameter actually tells Solr not to cache any
 results list that contains over the  queryResultMaxDocsCached?

 If so, I will add a comment to the Cwiki doc and open a JIRA and submit a
 patch to the example file.

 I tried to find a test case that excercises SolrIndexSearcher.getDocListC
 so I could see how  queryResultWindowSize or queryResultMaxDocsCached
 actually work in the debugger but could not find a test case.  Could
 someone please point me to a good test case that either excercises
 SolrIndexSearcher.getDocListC or would be a good starting point for writing
 one?


 Tom



 ---


 http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_4_10/solr/example/solr/collection1/conf/solrconfig.xml?revision=1624269view=markup

 635 !-- Maximum number of documents to cache for any entry in the
 636 queryResultCache.
 637 --
 638 queryResultMaxDocsCached200/queryResultMaxDocsCached



Re: queryResultMaxDocsCached vs. queryResultWindowSize

2014-09-24 Thread Yonik Seeley
On Wed, Sep 24, 2014 at 5:27 PM, Tomás Fernández Löbbe
tomasflo...@gmail.com wrote:
 I think you are right. I think the name is this because it’s considering a
 series of queries paging a result. The first X pages are going to be cached,
 but once the limit is reached, no further pages are and the last superset
 that fitted remains in cache.

I was confused about the confusion ;-)  But your summary seems correct.

queryResultWindowSize rounds up to a multiple of the window size for
caching purposes.
So if you ask for top 10, and the queryResultWindowSize is 20, then
the top 20 will be cached (so if a user hits next to get to the next
10, it will still result in a cache hit).

queryResultMaxDocsCached sets a limit beyond which the resulting docs
aren't cached (so if a user asks for docs 1 through 10010, we skip
caching logic).

-Yonik
http://heliosearch.org - native code faceting, facet functions,
sub-facets, off-heap data

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org