[ 
https://issues.apache.org/jira/browse/SOLR-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Per Steffensen updated SOLR-5444:
---------------------------------

    Description: 
h5. Setup

We have a 6-Solr-node (release 4.4.0) setup with 12 billion "small" documents 
loaded across 3 collections. The documents have the following fields
* a_dlng_doc_sto (docvalue long)
* b_dlng_doc_sto (docvalue long)
* c_dstr_doc_sto (docvalue string)
* timestamp_lng_ind_sto  (indexed long)
* d_lng_ind_sto (indexed long)
>From schema.xml
{code}
    <dynamicField name="*_dstr_doc_sto" type="dstring" indexed="false" 
stored="true" required="true" docValues="true"/>
    <dynamicField name="*_lng_ind_sto" type="long" indexed="true" 
stored="true"/>
    <dynamicField name="*_dlng_doc_sto" type="dlng" indexed="false" 
stored="true" required="true" docValues="true"/>
...
    <fieldType name="dstring" class="solr.StrField" sortMissingLast="true" 
docValuesFormat="Disk"/>
    <fieldType name="dlng" class="solr.TrieLongField" precisionStep="0" 
positionIncrementGap="0" docValuesFormat="Disk"/>
{code}
timestamp_lng_ind_sto decides which collection documents go into

We execute queries on the following format:
* q=timestamp_lng_ind_sto:\[x TO y\] AND d_lng_ind_sto:(a OR b OR ... OR n)
* 
facet=true&facet.field=a_dlng_doc_sto&facet.zeros=false&facet.mincount=1&facet.limit=<asked-for-facets>&rows=0&start=0

h5. Problem 

We see very slow response-time when hitting large number of rows, spanning lots 
of facets, but only ask for "a few" of those facets

h5. Concrete example of query to get some concrete numbers to look at

With x and y plus a, b ... n set to values so that
* The timestamp_lng_ind_sto:\[x TO y\] part of the search-criteria alone hit 
about 1.7 billion documents (actually all in one (containing 4.5 billion docs) 
of the three collections - but that is not important)
* The d_lng_ind_sto:(a OR b OR ... OR n) part of the search-criteria alone hit 
about 500000 documents
* The combined search-criteria (timestamp_lng_ind_sto AND'ed with 
d_lng_ind_sto) hit about 200000 documents

The following graph shows responsetime as a function of <asked-for-facets> (in 
query)
!Responsetime_func_of_facets_asked_for.png!
Note that responsetime is high for "low" <asked-for-facets>, and that it 
increases fast (but linearly) in <asked-for-facets> up until <asked-for-facets> 
is somewhere inbetween 5000 (where responsetime is close to 1000 secs) and 
10000 (where responsetime is about 5 secs). For values of <asked-for-facets> 
above 10000 responsetime stays "low" at between 1-10 secs

Looking at the code and profiling it is clear that the change to better 
responsetime occurs when SimpleFacets.getFacetFieldCounts changes from using 
getListedTermCounts to using getTermCounts.

The following image shows profiling information during a request with 
<asked-for-facets> at about 2000.
!Profiiling_SimpleFacets_getListedTermCounts_path.png!
Note that
* SimpleFacets.getListedTermCounts is used (green box)
* 91% of the time spent performing the query is spent in 
DocSetCollector-constructor (red box). During this concrete query 125000 
DocSetCollection-objects are created spending 710 secs all in all. Additional 
investigations show that the time is spent allocating huge int-arrays for the 
"scratch"-int-array. Several thousands of those DocSetCollection-constructors 
create int-arrays at size above 1 million - that takes time, and also leaves a 
nice little job of the GC'er afterwards.
* The actual search-part of the query takes only 0.5% (4 secs) of the combined 
time executing the query (blue box)

The following image shows profiling information during a request with 
<asked-for-facets> at about 10000
!Profiling_SimpleFacets_getTermCounts_path.png!
Note that
* SimpleFacets.getTermCounts is used (green box)
* The actual search-part of the query now takes 70% (11 secs) of the combined 
time executing the query (blue box)

h5. What to do about this?

* I am not sure why there are two paths that SimpleFacets.getFacetFieldCounts 
can take (getListedTermCounts or getTermCounts) - but I am pretty sure there is 
a good reason. It seems like getListedTermCounts is used when 
<asked-for-facets> is noticeable lower than the total number of facets hit 
(believe it is when <asked-for-facets> * 1.5 + 10 is below actual number of 
facets hit)
* *One solution* could be to just drop the getListedTermCounts-path and always 
go getTermCounts, but that is probably not at good idea, because 
getListedTermCounts is probably there for a performance reason (in other 
scenarios)
* The comment above DocSetCollection.scratch says
{code}
  // in case there aren't that many hits, we may not want a very sparse
  // bit array.  Optimistically collect the first few docs in an array
  // in case there are only a few.
  final int[] scratch;
{code}
The comment seems reasonable. But when we look at what values are used as 
"smallSetSize" for the DocSetCollection-constructor, it is always "maxDoc >> 6" 
(basically dividing by 64) - this value depends on maxDoc and will be high if 
maxDoc is high. In my case maxDoc is 50+ million a lot of the times resulting 
in "smallSetSize"s of 1+ million (that is not "a few"). I am very much in doubt 
why you want "smallSetSize" to increase as maxDoc increase - why not just 
always a low (fixed or something) value for "smallSetSize"? Is it ever a good 
idea with huge int-arrays for the "scratch"-array?
* *Another solution* would be to never create "scratch"-arrays with size above 
e.g. 50
* *There are probably several other potential solutions*

I would really want your opinion on what solution to make, so that I do not 
unintentionally break good performance-optimizations, just because I missed 
some points explaining why the code is as it is today!?

*Note* I have filed this as a 4.4 issues, because that is the platform I use 
for my tests etc. But I am sure the problem also exists on 4.5.1 (or whatever 
the latest 4.x release is)

  was:
We have a 6-Solr-node (release 4.4.0) setup with 12 billion "small" documents 
loaded across 3 collections. The documents have the following fields
* a_dlng_doc_sto (docvalue long)
* b_dlng_doc_sto (docvalue long)
* c_dstr_doc_sto (docvalue string)
* timestamp_lng_ind_sto  (indexed long)
* d_lng_ind_sto (indexed long)
>From schema.xml
{code}
    <dynamicField name="*_dstr_doc_sto" type="dstring" indexed="false" 
stored="true" required="true" docValues="true"/>
    <dynamicField name="*_lng_ind_sto" type="long" indexed="true" 
stored="true"/>
    <dynamicField name="*_dlng_doc_sto" type="dlng" indexed="false" 
stored="true" required="true" docValues="true"/>
...
    <fieldType name="dstring" class="solr.StrField" sortMissingLast="true" 
docValuesFormat="Disk"/>
    <fieldType name="dlng" class="solr.TrieLongField" precisionStep="0" 
positionIncrementGap="0" docValuesFormat="Disk"/>
{code}
timestamp_lng_ind_sto decides which collection documents go into

We execute queries on the following format:
* q=timestamp_lng_ind_sto:\[x TO y\] AND d_lng_ind_sto:(a OR b OR ... OR n)
* 
facet=true&facet.field=a_dlng_doc_sto&facet.zeros=false&facet.mincount=1&facet.limit=<asked-for-facets>&rows=0&start=0

We see very slow response-time when hitting large number of rows, spanning lots 
of facets, but only ask for "a few" of those rows

Example
* With x and y plus a, b ... n set to values so that
* The timestamp_lng_ind_sto:\[x TO y\] part of the search-criteria alone hit 
about 1.7 billion documents
* The d_lng_ind_sto:(a OR b OR ... OR n) part of the search-criteria alone hit 
about 500.000 documents
* The combined search-criteria (timestamp_lng_ind_sto AND'ed with 
d_lng_ind_sto) hit about 200.000 documents
!Profiling_SimpleFacets_getListedTermCounts_path.png!



> Slow response on facet search, lots of facets, asking for few facets in 
> response
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-5444
>                 URL: https://issues.apache.org/jira/browse/SOLR-5444
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrCloud
>    Affects Versions: 4.4
>            Reporter: Per Steffensen
>            Assignee: Per Steffensen
>              Labels: docvalue, faceted-search, performance
>             Fix For: 4.7
>
>         Attachments: Profiiling_SimpleFacets_getListedTermCounts_path.png, 
> Profiling_SimpleFacets_getTermCounts_path.png, 
> Responsetime_func_of_facets_asked_for.png
>
>
> h5. Setup
> We have a 6-Solr-node (release 4.4.0) setup with 12 billion "small" documents 
> loaded across 3 collections. The documents have the following fields
> * a_dlng_doc_sto (docvalue long)
> * b_dlng_doc_sto (docvalue long)
> * c_dstr_doc_sto (docvalue string)
> * timestamp_lng_ind_sto  (indexed long)
> * d_lng_ind_sto (indexed long)
> From schema.xml
> {code}
>     <dynamicField name="*_dstr_doc_sto" type="dstring" indexed="false" 
> stored="true" required="true" docValues="true"/>
>     <dynamicField name="*_lng_ind_sto" type="long" indexed="true" 
> stored="true"/>
>     <dynamicField name="*_dlng_doc_sto" type="dlng" indexed="false" 
> stored="true" required="true" docValues="true"/>
> ...
>     <fieldType name="dstring" class="solr.StrField" sortMissingLast="true" 
> docValuesFormat="Disk"/>
>     <fieldType name="dlng" class="solr.TrieLongField" precisionStep="0" 
> positionIncrementGap="0" docValuesFormat="Disk"/>
> {code}
> timestamp_lng_ind_sto decides which collection documents go into
> We execute queries on the following format:
> * q=timestamp_lng_ind_sto:\[x TO y\] AND d_lng_ind_sto:(a OR b OR ... OR n)
> * 
> facet=true&facet.field=a_dlng_doc_sto&facet.zeros=false&facet.mincount=1&facet.limit=<asked-for-facets>&rows=0&start=0
> h5. Problem 
> We see very slow response-time when hitting large number of rows, spanning 
> lots of facets, but only ask for "a few" of those facets
> h5. Concrete example of query to get some concrete numbers to look at
> With x and y plus a, b ... n set to values so that
> * The timestamp_lng_ind_sto:\[x TO y\] part of the search-criteria alone hit 
> about 1.7 billion documents (actually all in one (containing 4.5 billion 
> docs) of the three collections - but that is not important)
> * The d_lng_ind_sto:(a OR b OR ... OR n) part of the search-criteria alone 
> hit about 500000 documents
> * The combined search-criteria (timestamp_lng_ind_sto AND'ed with 
> d_lng_ind_sto) hit about 200000 documents
> The following graph shows responsetime as a function of <asked-for-facets> 
> (in query)
> !Responsetime_func_of_facets_asked_for.png!
> Note that responsetime is high for "low" <asked-for-facets>, and that it 
> increases fast (but linearly) in <asked-for-facets> up until 
> <asked-for-facets> is somewhere inbetween 5000 (where responsetime is close 
> to 1000 secs) and 10000 (where responsetime is about 5 secs). For values of 
> <asked-for-facets> above 10000 responsetime stays "low" at between 1-10 secs
> Looking at the code and profiling it is clear that the change to better 
> responsetime occurs when SimpleFacets.getFacetFieldCounts changes from using 
> getListedTermCounts to using getTermCounts.
> The following image shows profiling information during a request with 
> <asked-for-facets> at about 2000.
> !Profiiling_SimpleFacets_getListedTermCounts_path.png!
> Note that
> * SimpleFacets.getListedTermCounts is used (green box)
> * 91% of the time spent performing the query is spent in 
> DocSetCollector-constructor (red box). During this concrete query 125000 
> DocSetCollection-objects are created spending 710 secs all in all. Additional 
> investigations show that the time is spent allocating huge int-arrays for the 
> "scratch"-int-array. Several thousands of those DocSetCollection-constructors 
> create int-arrays at size above 1 million - that takes time, and also leaves 
> a nice little job of the GC'er afterwards.
> * The actual search-part of the query takes only 0.5% (4 secs) of the 
> combined time executing the query (blue box)
> The following image shows profiling information during a request with 
> <asked-for-facets> at about 10000
> !Profiling_SimpleFacets_getTermCounts_path.png!
> Note that
> * SimpleFacets.getTermCounts is used (green box)
> * The actual search-part of the query now takes 70% (11 secs) of the combined 
> time executing the query (blue box)
> h5. What to do about this?
> * I am not sure why there are two paths that SimpleFacets.getFacetFieldCounts 
> can take (getListedTermCounts or getTermCounts) - but I am pretty sure there 
> is a good reason. It seems like getListedTermCounts is used when 
> <asked-for-facets> is noticeable lower than the total number of facets hit 
> (believe it is when <asked-for-facets> * 1.5 + 10 is below actual number of 
> facets hit)
> * *One solution* could be to just drop the getListedTermCounts-path and 
> always go getTermCounts, but that is probably not at good idea, because 
> getListedTermCounts is probably there for a performance reason (in other 
> scenarios)
> * The comment above DocSetCollection.scratch says
> {code}
>   // in case there aren't that many hits, we may not want a very sparse
>   // bit array.  Optimistically collect the first few docs in an array
>   // in case there are only a few.
>   final int[] scratch;
> {code}
> The comment seems reasonable. But when we look at what values are used as 
> "smallSetSize" for the DocSetCollection-constructor, it is always "maxDoc >> 
> 6" (basically dividing by 64) - this value depends on maxDoc and will be high 
> if maxDoc is high. In my case maxDoc is 50+ million a lot of the times 
> resulting in "smallSetSize"s of 1+ million (that is not "a few"). I am very 
> much in doubt why you want "smallSetSize" to increase as maxDoc increase - 
> why not just always a low (fixed or something) value for "smallSetSize"? Is 
> it ever a good idea with huge int-arrays for the "scratch"-array?
> * *Another solution* would be to never create "scratch"-arrays with size 
> above e.g. 50
> * *There are probably several other potential solutions*
> I would really want your opinion on what solution to make, so that I do not 
> unintentionally break good performance-optimizations, just because I missed 
> some points explaining why the code is as it is today!?
> *Note* I have filed this as a 4.4 issues, because that is the platform I use 
> for my tests etc. But I am sure the problem also exists on 4.5.1 (or whatever 
> the latest 4.x release is)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to