[ 
https://issues.apache.org/jira/browse/SOLR-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Per Steffensen updated SOLR-5444:
---------------------------------

    Description: 
We have a 6-Solr-node (release 4.4.0) setup with 12 billion "small" documents 
loaded across 3 collections. The documents have the following fields
* a_dlng_doc_sto (docvalue long)
* b_dlng_doc_sto (docvalue long)
* c_dstr_doc_sto (docvalue string)
* timestamp_lng_ind_sto  (indexed long)
* d_lng_ind_sto (indexed long)
>From schema.xml
{code}
    <dynamicField name="*_dstr_doc_sto" type="dstring" indexed="false" 
stored="true" required="true" docValues="true"/>
    <dynamicField name="*_lng_ind_sto" type="long" indexed="true" 
stored="true"/>
    <dynamicField name="*_dlng_doc_sto" type="dlng" indexed="false" 
stored="true" required="true" docValues="true"/>
...
    <fieldType name="dstring" class="solr.StrField" sortMissingLast="true" 
docValuesFormat="Disk"/>
    <fieldType name="dlng" class="solr.TrieLongField" precisionStep="0" 
positionIncrementGap="0" docValuesFormat="Disk"/>
{code}
timestamp_lng_ind_sto decides which collection documents go into

We execute queries on the following format:
* q=timestamp_lng_ind_sto:\[x TO y\] AND d_lng_ind_sto:(a OR b OR ... OR n)
* 
facet=true&facet.field=a_dlng_doc_sto&facet.zeros=false&facet.mincount=1&facet.limit=<asked-for-facets>&rows=0&start=0

We see very slow response-time when hitting large number of rows, spanning lots 
of facets, but only ask for "a few" of those rows

Example
* With x and y plus a, b ... n set to values so that
* The timestamp_lng_ind_sto:\[x TO y\] part of the search-criteria alone hit 
about 1.7 billion documents
* The d_lng_ind_sto:(a OR b OR ... OR n) part of the search-criteria alone hit 
about 500.000 documents
* The combined search-criteria (timestamp_lng_ind_sto AND'ed with 
d_lng_ind_sto) hit about 200.000 documents
!Profiling_SimpleFacets_getListedTermCounts_path.png!


  was:TBD


> Slow response on facet search, lots of facets, asking for few facets in 
> response
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-5444
>                 URL: https://issues.apache.org/jira/browse/SOLR-5444
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrCloud
>    Affects Versions: 4.4
>            Reporter: Per Steffensen
>            Assignee: Per Steffensen
>              Labels: docvalue, faceted-search, performance
>             Fix For: 4.7
>
>         Attachments: Profiiling_SimpleFacets_getListedTermCounts_path.png, 
> Profiling_SimpleFacets_getTermCounts_path.png, 
> Responsetime_func_of_facets_asked_for.png
>
>
> We have a 6-Solr-node (release 4.4.0) setup with 12 billion "small" documents 
> loaded across 3 collections. The documents have the following fields
> * a_dlng_doc_sto (docvalue long)
> * b_dlng_doc_sto (docvalue long)
> * c_dstr_doc_sto (docvalue string)
> * timestamp_lng_ind_sto  (indexed long)
> * d_lng_ind_sto (indexed long)
> From schema.xml
> {code}
>     <dynamicField name="*_dstr_doc_sto" type="dstring" indexed="false" 
> stored="true" required="true" docValues="true"/>
>     <dynamicField name="*_lng_ind_sto" type="long" indexed="true" 
> stored="true"/>
>     <dynamicField name="*_dlng_doc_sto" type="dlng" indexed="false" 
> stored="true" required="true" docValues="true"/>
> ...
>     <fieldType name="dstring" class="solr.StrField" sortMissingLast="true" 
> docValuesFormat="Disk"/>
>     <fieldType name="dlng" class="solr.TrieLongField" precisionStep="0" 
> positionIncrementGap="0" docValuesFormat="Disk"/>
> {code}
> timestamp_lng_ind_sto decides which collection documents go into
> We execute queries on the following format:
> * q=timestamp_lng_ind_sto:\[x TO y\] AND d_lng_ind_sto:(a OR b OR ... OR n)
> * 
> facet=true&facet.field=a_dlng_doc_sto&facet.zeros=false&facet.mincount=1&facet.limit=<asked-for-facets>&rows=0&start=0
> We see very slow response-time when hitting large number of rows, spanning 
> lots of facets, but only ask for "a few" of those rows
> Example
> * With x and y plus a, b ... n set to values so that
> * The timestamp_lng_ind_sto:\[x TO y\] part of the search-criteria alone hit 
> about 1.7 billion documents
> * The d_lng_ind_sto:(a OR b OR ... OR n) part of the search-criteria alone 
> hit about 500.000 documents
> * The combined search-criteria (timestamp_lng_ind_sto AND'ed with 
> d_lng_ind_sto) hit about 200.000 documents
> !Profiling_SimpleFacets_getListedTermCounts_path.png!



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to