[ https://issues.apache.org/jira/browse/SOLR-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14241920#comment-14241920 ]
Neil Ireson commented on SOLR-6803: ----------------------------------- I ran JProfiler to see if it would cast any light on what's happening. Below is a snippet of the call tree... 100.0% - 482 s - 1 inv. org.apache.solr.handler.component.PivotFacetProcessor.processSingle 100.0% - 482 s - 1 inv. org.apache.solr.handler.component.PivotFacetProcessor.doPivots 99.4% - 479 s - 180 inv. org.apache.solr.handler.component.PivotFacetProcessor.doPivots 98.6% - 475 s - 179,920 inv. org.apache.solr.handler.component.PivotFacetProcessor.getSubset 98.3% - 474 s - 179,920 inv. org.apache.solr.search.SolrIndexSearcher.getDocSet 96.5% - 465 s - 153,059 inv. org.apache.solr.search.SolrIndexSearcher.getDocSetNC 95.9% - 462 s - 153,059 inv. org.apache.lucene.search.IndexSearcher.search 39.9% - 192 s - 1,523,680,993 inv. org.apache.solr.search.DocSetCollector.collect The problem seems to be with the call to getSubset method in the PivotFacetProcessor doPivots method. Although I'm not actually sure how the doPivots method works, it would seem that the getSubset method only needs to be called if subField != null (in my case subField is always null). Therefore I think the performance issue maybe fixed by simply moving the getSubset method call to after the subField != null check, although it'd be nice if someone more au fait with the code could confirm. > Pivot Performance > ----------------- > > Key: SOLR-6803 > URL: https://issues.apache.org/jira/browse/SOLR-6803 > Project: Solr > Issue Type: Bug > Affects Versions: 4.10.2 > Reporter: Neil Ireson > Priority: Minor > Attachments: PivotPerformanceTest.java > > > I found that my pivot search for terms per day was taking an age so I knocked > up a quick test, using a collection of 1 million documents with a different > number of random terms and times, to compare different ways of getting the > counts. > 1) Combined = combining the term and time in a single field. > 2) Facet = for each term set the query to the term and then get the time > facet > 3) Pivot = use the term/time pivot facet. > The following two tables present the results for version 4.9.1 vs 4.10.1, as > an average of five runs. > 4.9.1 (Processing time in ms) > |Values (#) | Combined (ms)| Facet (ms)| Pivot (ms)| > |100 | 22| 21| 52| > |1000 | 178| 57| 115| > |10000 | 1363| 211| 310| > |100000 | 2592| 1009| 978| > |500000 | 3125| 3753| 2476| > |1000000 | 3957| 6789| 3725| > 4.10.1 (Processing time in ms) > |Values (#) | Combined (ms)| Facet (ms)| Pivot (ms)| > |100 | 21| 21| 75| > |1000 | 188| 60| 265| > |10000 | 1438| 215| 1826| > |100000 | 2768| 1073| 16594| > |500000 | 3266| 3686| 99682| > |1000000 | 4080| 6777| 208873| > The results show that, as the number of pivot values increases (i.e. number > of terms * number of times), pivot performance in 4.10.1 get progressively > worse. > I tried to look at the code but there was a lot of changes in pivoting > between 4.9 and 4.10, and so it is not clear to me what has cause the > performance issues. However the results seem to indicate that if the pivot > was simply a combined facet search, it could potentially produce better and > more robust performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org