Looking at threaddumps
It seems like one of the major differences in what is done for
c_dstr_doc_sto vs a_dlng_doc_sto is in SimpleFactes.getFacetFieldCounts,
where c_dstr_doc_sto takes the "getTermCounts"-path and a_dlng_doc_sto
takes the "getListedTermCounts"-path.
String termList = localParams == null ? null :
localParams.get(CommonParams.TERMS);
if (termList != null) {
res.add(key, getListedTermCounts(facetValue, termList));
} else {
res.add(key, getTermCounts(facetValue));
}
getTermCounts seems to do a lot more and to be a lot more complex than
getListedTermCounts
On 11/5/13 11:47 AM, Per Steffensen wrote:
Hi
We have a 6-Solr-node (release 4.4.0) setup with 12billion "small"
documents loadad. The documents have the following fields
* a_dlng_doc_sto
* b_dlng_doc_sto
* c_dstr_doc_sto
* timestamp_lng_ind_sto
* d_lng_ind_sto
From schema.xml
<dynamicField name="*_dstr_doc_sto" type="dstring" indexed="false"
stored="true" required="true" docValues="true"/>
<dynamicField name="*_lng_ind_sto" type="long" indexed="true"
stored="true"/>
<dynamicField name="*_dlng_doc_sto" type="dlng" indexed="false"
stored="true" required="true" docValues="true"/>
...
<fieldType name="dstring" class="solr.StrField"
sortMissingLast="true" docValuesFormat="Disk"/>
<fieldType name="dlng" class="solr.TrieLongField"
precisionStep="0" positionIncrementGap="0" docValuesFormat="Disk"/>
We execute queries on the following format:
* q=timestamp_lng_ind_sto:[x TO y] AND d_lng_ind_sto:(a OR b OR ... OR n)
* facet=true&facet.field=<field>&facet.zeros=false&facet.mincount=1
F.ex executing a query with values for x, y, a, b ... and n that hits
only 6 documents (out of the 12billion) total
* With <field>=a_dlng_doc_sto (long docvalue) the query responds
fairly quick (< 2 sec)
* With <field>=c_dstr_doc_sto (string docvalue) the query responds
very slowly (> 100 sec) and only if we give the Solr-nodes a lot of
Xmx. If Xmx is too low we experience OOM on involved Solr-nodes and
never see a response
c_dstr_doc_sto strings are all about 10-15 chars, so it is not very
long strings
Is it a known issue that there is such a big difference between facet
searches on longs and strings? And that memory usage seems to very
different, also?
If yes, has it been optimized after 4.4.0?
Regards, Per Steffensen