Looking at threaddumps

It seems like one of the major differences in what is done for c_dstr_doc_sto vs a_dlng_doc_sto is in SimpleFactes.getFacetFieldCounts, where c_dstr_doc_sto takes the "getTermCounts"-path and a_dlng_doc_sto takes the "getListedTermCounts"-path.

String termList = localParams == null ? null : localParams.get(CommonParams.TERMS);
            if (termList != null) {
              res.add(key, getListedTermCounts(facetValue, termList));
            } else {
              res.add(key, getTermCounts(facetValue));
            }

getTermCounts seems to do a lot more and to be a lot more complex than getListedTermCounts

On 11/5/13 11:47 AM, Per Steffensen wrote:
Hi

We have a 6-Solr-node (release 4.4.0) setup with 12billion "small" documents loadad. The documents have the following fields
* a_dlng_doc_sto
* b_dlng_doc_sto
* c_dstr_doc_sto
* timestamp_lng_ind_sto
* d_lng_ind_sto
From schema.xml
<dynamicField name="*_dstr_doc_sto" type="dstring" indexed="false" stored="true" required="true" docValues="true"/> <dynamicField name="*_lng_ind_sto" type="long" indexed="true" stored="true"/> <dynamicField name="*_dlng_doc_sto" type="dlng" indexed="false" stored="true" required="true" docValues="true"/>
...
<fieldType name="dstring" class="solr.StrField" sortMissingLast="true" docValuesFormat="Disk"/> <fieldType name="dlng" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0" docValuesFormat="Disk"/>

We execute queries on the following format:
* q=timestamp_lng_ind_sto:[x TO y] AND d_lng_ind_sto:(a OR b OR ... OR n)
* facet=true&facet.field=<field>&facet.zeros=false&facet.mincount=1

F.ex executing a query with values for x, y, a, b ... and n that hits only 6 documents (out of the 12billion) total * With <field>=a_dlng_doc_sto (long docvalue) the query responds fairly quick (< 2 sec) * With <field>=c_dstr_doc_sto (string docvalue) the query responds very slowly (> 100 sec) and only if we give the Solr-nodes a lot of Xmx. If Xmx is too low we experience OOM on involved Solr-nodes and never see a response c_dstr_doc_sto strings are all about 10-15 chars, so it is not very long strings

Is it a known issue that there is such a big difference between facet searches on longs and strings? And that memory usage seems to very different, also?
If yes, has it been optimized after 4.4.0?

Regards, Per Steffensen

Reply via email to