Re: queryResultMaxDocsCached vs. queryResultWindowSize
Thanks for your help Yonik and Tomas, I had several mistaken assumptions about how caching worked which were resolved by walking through the code in the debugger after reading your replies. Tom On Fri, Sep 26, 2014 at 4:55 PM, Yonik Seeley yo...@heliosearch.com wrote: On Fri, Sep 26, 2014 at 4:38 PM, Tom Burton-West tburt...@umich.edu wrote: Hi Yonik, I'm still confused. suspect don't understand how paging and caching interact. I probably need to walk through the code. Is there a unit test that exercises SolrIndexSearcher.getDocListC or a good unit test to use as a base to write one? Part of what confuses me is whether what gets cached always starts at row 1 of results. Yes, we always cache from the first row. Asking for rows 91-100 requires collecting 1-100 (and it's the latter we cache - ignoring deep paging). It's also just ids (and optionally scores) that are cached... so either 4 bytes or 8 bytes per document cached, depending on if you ask for scores back. queryWindowSize just rounds up the upper bound. I'll try to explain my confusion. Using the defaults in the solrconfig example: queryResultWindowSize20/queryResultWindowSize queryResultMaxDocsCached200/queryResultMaxDocsCached If I query for start=0, rows =10 Solr fetches 20 results and caches them. If I query for start =11 rows =10 Solr read rows 11-20 from cache Correct. What happens when I query for start = 21 rows= 10? I thought that Solr would then fetch rows 21-40 into the queryResultCache. Is this wrong? It will result in a cache miss and we'll collect 0-40 and cache that. If I query for start =195 rows =10 does Solr cache rows 195-200 but go to disk for rows over 200 (queryResultMaxDocsCached=200)? Or does Solr skip caching altogether for rows over 200 Probably the latter... it's an edge case so I'd have to check the code to know for sure if the check is pre or post rounding up. -Yonik http://heliosearch.org - native code faceting, facet functions, sub-facets, off-heap data - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: queryResultMaxDocsCached vs. queryResultWindowSize
Hi Yonik, I'm still confused. suspect don't understand how paging and caching interact. I probably need to walk through the code. Is there a unit test that exercises SolrIndexSearcher.getDocListC or a good unit test to use as a base to write one? Part of what confuses me is whether what gets cached always starts at row 1 of results. I did not think this was true, but your example of start=1 rows = 10 (ie rows 1-through 10010) triggering the queryResultMacDocsCached limit of 200 makes it sound like the cache always starts at row 1. I would have thought that a request for start= 10,000 rows=10,010 would result in Solr caching rows 10,000-10,020. I'll try to explain my confusion. Using the defaults in the solrconfig example: queryResultWindowSize20/queryResultWindowSize queryResultMaxDocsCached200/queryResultMaxDocsCached If I query for start=0, rows =10 Solr fetches 20 results and caches them. If I query for start =11 rows =10 Solr read rows 11-20 from cache What happens when I query for start = 21 rows= 10? I thought that Solr would then fetch rows 21-40 into the queryResultCache. Is this wrong? If I query for start =195 rows =10 does Solr cache rows 195-200 but go to disk for rows over 200 (queryResultMaxDocsCached=200)? Or does Solr skip caching altogether for rows over 200 Tom On Wed, Sep 24, 2014 at 7:12 PM, Yonik Seeley yo...@heliosearch.com wrote: On Wed, Sep 24, 2014 at 5:27 PM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: I think you are right. I think the name is this because it’s considering a series of queries paging a result. The first X pages are going to be cached, but once the limit is reached, no further pages are and the last superset that fitted remains in cache. I was confused about the confusion ;-) But your summary seems correct. queryResultWindowSize rounds up to a multiple of the window size for caching purposes. So if you ask for top 10, and the queryResultWindowSize is 20, then the top 20 will be cached (so if a user hits next to get to the next 10, it will still result in a cache hit). queryResultMaxDocsCached sets a limit beyond which the resulting docs aren't cached (so if a user asks for docs 1 through 10010, we skip caching logic). -Yonik http://heliosearch.org - native code faceting, facet functions, sub-facets, off-heap data - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: queryResultMaxDocsCached vs. queryResultWindowSize
On Fri, Sep 26, 2014 at 4:38 PM, Tom Burton-West tburt...@umich.edu wrote: Hi Yonik, I'm still confused. suspect don't understand how paging and caching interact. I probably need to walk through the code. Is there a unit test that exercises SolrIndexSearcher.getDocListC or a good unit test to use as a base to write one? Part of what confuses me is whether what gets cached always starts at row 1 of results. Yes, we always cache from the first row. Asking for rows 91-100 requires collecting 1-100 (and it's the latter we cache - ignoring deep paging). It's also just ids (and optionally scores) that are cached... so either 4 bytes or 8 bytes per document cached, depending on if you ask for scores back. queryWindowSize just rounds up the upper bound. I'll try to explain my confusion. Using the defaults in the solrconfig example: queryResultWindowSize20/queryResultWindowSize queryResultMaxDocsCached200/queryResultMaxDocsCached If I query for start=0, rows =10 Solr fetches 20 results and caches them. If I query for start =11 rows =10 Solr read rows 11-20 from cache Correct. What happens when I query for start = 21 rows= 10? I thought that Solr would then fetch rows 21-40 into the queryResultCache. Is this wrong? It will result in a cache miss and we'll collect 0-40 and cache that. If I query for start =195 rows =10 does Solr cache rows 195-200 but go to disk for rows over 200 (queryResultMaxDocsCached=200)? Or does Solr skip caching altogether for rows over 200 Probably the latter... it's an edge case so I'd have to check the code to know for sure if the check is pre or post rounding up. -Yonik http://heliosearch.org - native code faceting, facet functions, sub-facets, off-heap data - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
queryResultMaxDocsCached vs. queryResultWindowSize
Hello, No response on the Solr user list so I thought I would try the dev list. queryResultWindowSize sets the number of documents to cache for each query in the queryResult cache.So if you normally output 10 results per page, and users don't go beyond page 3 of results, you could set queryResultWindowSize to 30 and the second and third page requests will read from cache, not from disk. This is well documented in both the Solr example solrconfig.xml file and the Solr documentation. However, the example in solrconfig.xml and the documentation in the reference manual for Solr 4.10 say that queryResultMaxDocsCached : sets the maximum number of documents to cache for any entry in the queryResultCache. Looking at the code it appears that the queryResultMaxDocsCached parameter actually tells Solr not to cache any results list that has a size over queryResultMaxDocsCached:. From: SolrIndexSearcher.getDocListC // lastly, put the superset in the cache if the size is less than or equal // to queryResultMaxDocsCached if (key != null superset.size() = queryResultMaxDocsCached !qr.isPartialResults()) { queryResultCache.put(key, superset); } Deciding whether or not to cache a DocList if its size is over N (where N = queryResultMaxDocsCached) is very different than caching only N items from the DocList which is what the current documentation (and the variable name) implies. Looking at the JIRA issue https://issues.apache.org/jira/browse/SOLR-291 the original intent was to control memory use and the variable name originally suggested was noCacheIfLarger Can someone please let me know if it is true that the queryResultMaxDocsCached parameter actually tells Solr not to cache any results list that contains over the queryResultMaxDocsCached? If so, I will add a comment to the Cwiki doc and open a JIRA and submit a patch to the example file. I tried to find a test case that excercises SolrIndexSearcher.getDocListC so I could see how queryResultWindowSize or queryResultMaxDocsCached actually work in the debugger but could not find a test case. Could someone please point me to a good test case that either excercises SolrIndexSearcher.getDocListC or would be a good starting point for writing one? Tom --- http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_4_10/solr/example/solr/collection1/conf/solrconfig.xml?revision=1624269view=markup 635 !-- Maximum number of documents to cache for any entry in the 636 queryResultCache. 637 -- 638 queryResultMaxDocsCached200/queryResultMaxDocsCached
Re: queryResultMaxDocsCached vs. queryResultWindowSize
I think you are right. I think the name is this because it’s considering a series of queries paging a result. The first X pages are going to be cached, but once the limit is reached, no further pages are and the last superset that fitted remains in cache. At least that’s my understanding. After a quick look, I couldn’t find a test case for this either. Tomás On Wed, Sep 24, 2014 at 11:10 AM, Tom Burton-West tburt...@umich.edu wrote: Hello, No response on the Solr user list so I thought I would try the dev list. queryResultWindowSize sets the number of documents to cache for each query in the queryResult cache.So if you normally output 10 results per page, and users don't go beyond page 3 of results, you could set queryResultWindowSize to 30 and the second and third page requests will read from cache, not from disk. This is well documented in both the Solr example solrconfig.xml file and the Solr documentation. However, the example in solrconfig.xml and the documentation in the reference manual for Solr 4.10 say that queryResultMaxDocsCached : sets the maximum number of documents to cache for any entry in the queryResultCache. Looking at the code it appears that the queryResultMaxDocsCached parameter actually tells Solr not to cache any results list that has a size over queryResultMaxDocsCached:. From: SolrIndexSearcher.getDocListC // lastly, put the superset in the cache if the size is less than or equal // to queryResultMaxDocsCached if (key != null superset.size() = queryResultMaxDocsCached !qr.isPartialResults()) { queryResultCache.put(key, superset); } Deciding whether or not to cache a DocList if its size is over N (where N = queryResultMaxDocsCached) is very different than caching only N items from the DocList which is what the current documentation (and the variable name) implies. Looking at the JIRA issue https://issues.apache.org/jira/browse/SOLR-291 the original intent was to control memory use and the variable name originally suggested was noCacheIfLarger Can someone please let me know if it is true that the queryResultMaxDocsCached parameter actually tells Solr not to cache any results list that contains over the queryResultMaxDocsCached? If so, I will add a comment to the Cwiki doc and open a JIRA and submit a patch to the example file. I tried to find a test case that excercises SolrIndexSearcher.getDocListC so I could see how queryResultWindowSize or queryResultMaxDocsCached actually work in the debugger but could not find a test case. Could someone please point me to a good test case that either excercises SolrIndexSearcher.getDocListC or would be a good starting point for writing one? Tom --- http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_4_10/solr/example/solr/collection1/conf/solrconfig.xml?revision=1624269view=markup 635 !-- Maximum number of documents to cache for any entry in the 636 queryResultCache. 637 -- 638 queryResultMaxDocsCached200/queryResultMaxDocsCached
Re: queryResultMaxDocsCached vs. queryResultWindowSize
On Wed, Sep 24, 2014 at 5:27 PM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: I think you are right. I think the name is this because it’s considering a series of queries paging a result. The first X pages are going to be cached, but once the limit is reached, no further pages are and the last superset that fitted remains in cache. I was confused about the confusion ;-) But your summary seems correct. queryResultWindowSize rounds up to a multiple of the window size for caching purposes. So if you ask for top 10, and the queryResultWindowSize is 20, then the top 20 will be cached (so if a user hits next to get to the next 10, it will still result in a cache hit). queryResultMaxDocsCached sets a limit beyond which the resulting docs aren't cached (so if a user asks for docs 1 through 10010, we skip caching logic). -Yonik http://heliosearch.org - native code faceting, facet functions, sub-facets, off-heap data - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org