[
https://issues.apache.org/jira/browse/SOLR-12632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16692678#comment-16692678
]
Varun Thacker commented on SOLR-12632:
--------------------------------------
I did some basic benchmarking on my laptop against branch 7_6. Index has 25M
documents in a two shard collection.
Both top_level_pi/one_level_pi have a 1M cardinality.
{code:java}
SolrInputDocument document = new SolrInputDocument();
document.addField("id", x*batchSize+j);
document.addField("top_level_pi", TestUtil.nextInt(r,0, 1000*1000));
document.addField("one_level_pi", TestUtil.nextInt(r,0, 1000*1000));
document.addField("top_level_ti", TestUtil.nextInt(r,0, 1000*1000));
document.addField("one_level_ti", TestUtil.nextInt(r,0, 1000*1000));
{code}
{code:java}
<fieldType name="tint" class="solr.TrieIntField" docValues="true"
precisionStep="8" positionIncrementGap="0"/>
<dynamicField name="*_ti" type="tint" indexed="true" stored="true"
docValues="true"/>
<dynamicField name="*_pi" type="pint" indexed="true" stored="true"
docValues="true"/>{code}
There were two types of queries that I ran against : one with counts and one
with count and a sum aggregation
The totalRows is 111111
*Simple Count*
{code:java}
1.
time : 38-41s
filterCache inserts : 54303
facet(gettingstarted,q="id:123*",buckets="top_level_pi,one_level_pi",
bucketSorts="count(*) desc", bucketSizeLimit=-1, count(*))
2.
time : 6-9s
filterCache inserts : 54280
facet(gettingstarted,q="id:123*",buckets="top_level_ti,one_level_ti",
bucketSorts="count(*) desc", bucketSizeLimit=-1, count(*))
Underlying Facet Query that the stream expression forms
{
"top_level_ti": {
"type": "terms",
"field": "top_level_ti",
"limit": 2147483647,
"sort": {
"count": "desc"
},
"facet": {
"one_level_ti": {
"type": "terms",
"field": "one_level_ti",
"limit": 2147483647,
"sort": {
"count": "desc"
},
"facet": {}
}
}
}
{code}
*Count + Sum Aggregation*
{code:java}
3.
time : 110s
filterCache inserts : 110044
facet(gettingstarted,q="id:123*",buckets="top_level_pi,one_level_pi",
bucketSorts="count(*) desc", bucketSizeLimit=-1, count(*), sum(top_level_pi))
4.
time : 11-13s
filterCache inserts ; 110021
facet(gettingstarted,q="id:123*",buckets="top_level_ti,one_level_ti",
bucketSorts="count(*) desc", bucketSizeLimit=-1, count(*), sum(top_level_ti))
Underlying Facet Query that the stream expression forms
{
"top_level_pi": {
"type": "terms",
"field": "top_level_pi",
"limit": 2147483647,
"sort": {
"count": "desc"
},
"facet": {
"one_level_pi": {
"type": "terms",
"field": "one_level_pi",
"limit": 2147483647,
"sort": {
"count": "desc"
},
"facet": {
"facet_0": "sum(top_level_pi)"
}
}
}
}
}{code}
Few observations
# For the same query trie is a lot faster than point fields
# When you add a sum aggregation to the same facet query the time doubles
# The filter cache inserts are exteremly high. For nested facets with high
cardinality this will simply nuke the filer cache without much reuse
# For the same queries when I ran without the filter cache, only query 3 was
faster by 30-50% the rest were roughly the same time.
For reference , I ran the points query with java flight recorder and here's the
main stack trace from method profiling
{code:java}
org.apache.lucene.util.bkd.BKDReader$PackedIndexTree.pushLeft(BKDReader.java:410)
org.apache.lucene.util.bkd.BKDReader.intersect(BKDReader.java:786)
org.apache.lucene.util.bkd.BKDReader.intersect(BKDReader.java:797)
org.apache.lucene.util.bkd.BKDReader.intersect(BKDReader.java:787)
org.apache.lucene.util.bkd.BKDReader.intersect(BKDReader.java:787)
org.apache.lucene.util.bkd.BKDReader.intersect(BKDReader.java:797)
org.apache.lucene.util.bkd.BKDReader.intersect(BKDReader.java:787)
org.apache.lucene.util.bkd.BKDReader.intersect(BKDReader.java:787)
org.apache.lucene.util.bkd.BKDReader.intersect(BKDReader.java:787)
org.apache.lucene.util.bkd.BKDReader.intersect(BKDReader.java:797)
org.apache.lucene.util.bkd.BKDReader.intersect(BKDReader.java:533)
org.apache.lucene.search.PointRangeQuery$1$4.get(PointRangeQuery.java:299)
org.apache.lucene.search.PointRangeQuery$1.scorer(PointRangeQuery.java:323)
org.apache.lucene.search.Weight.bulkScorer(Weight.java:177)
org.apache.lucene.search.IndexOrDocValuesQuery$1.bulkScorer(IndexOrDocValuesQuery.java:138)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:667)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:471)
org.apache.solr.search.DocSetUtil.createDocSetGeneric(DocSetUtil.java:151)
org.apache.solr.search.DocSetUtil.createDocSet(DocSetUtil.java:140)
org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:1178)
org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:1213)
org.apache.solr.search.facet.FacetFieldProcessor.fillBucket(FacetFieldProcessor.java:439)
org.apache.solr.search.facet.FacetFieldProcessor.findTopSlots(FacetFieldProcessor.java:381)
org.apache.solr.search.facet.FacetFieldProcessorByHashDV.calcFacets(FacetFieldProcessorByHashDV.java:249)
org.apache.solr.search.facet.FacetFieldProcessorByHashDV.process(FacetFieldProcessorByHashDV.java:214)
org.apache.solr.search.facet.FacetRequest.process(FacetRequest.java:368)
org.apache.solr.search.facet.FacetProcessor.processSubs(FacetProcessor.java:472)
org.apache.solr.search.facet.FacetProcessor.fillBucket(FacetProcessor.java:429)
org.apache.solr.search.facet.FacetQueryProcessor.process(FacetQuery.java:64)
org.apache.solr.search.facet.FacetRequest.process(FacetRequest.java:368)
org.apache.solr.search.facet.FacetModule.process(FacetModule.java:139)
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:298){code}
> Completely remove Trie fields
> -----------------------------
>
> Key: SOLR-12632
> URL: https://issues.apache.org/jira/browse/SOLR-12632
> Project: Solr
> Issue Type: Task
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Steve Rowe
> Priority: Blocker
> Labels: numeric-tries-to-points
> Fix For: master (8.0)
>
>
> Trie fields were deprecated in Solr 7.0. We should remove them completely
> before we release Solr 8.0.
> Unresolved points-related issues:
> [https://jira.apache.org/jira/issues/?jql=project=SOLR+AND+labels=numeric-tries-to-points+AND+resolution=unresolved]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]