[ https://issues.apache.org/jira/browse/LUCENE-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13572666#comment-13572666 ]
Michael McCandless commented on LUCENE-4757: -------------------------------------------- I tested perf of last patch on wikibig, with 7 facet dims: {noformat} Task QPS base StdDev QPS comp StdDev Pct diff IntNRQ 4.15 (2.6%) 3.88 (2.8%) -6.4% ( -11% - -1%) HighTerm 22.40 (3.0%) 21.05 (3.3%) -6.0% ( -11% - 0%) Prefix3 14.92 (2.3%) 14.15 (2.4%) -5.2% ( -9% - 0%) MedTerm 53.74 (2.5%) 51.02 (2.9%) -5.1% ( -10% - 0%) OrHighLow 19.23 (2.8%) 18.35 (3.0%) -4.6% ( -10% - 1%) OrHighMed 18.62 (2.8%) 17.77 (3.0%) -4.6% ( -10% - 1%) OrHighHigh 9.79 (3.0%) 9.35 (3.1%) -4.5% ( -10% - 1%) Wildcard 30.48 (1.7%) 29.44 (2.1%) -3.4% ( -7% - 0%) LowTerm 114.24 (1.6%) 112.06 (1.8%) -1.9% ( -5% - 1%) AndHighHigh 23.91 (0.8%) 23.54 (1.3%) -1.5% ( -3% - 0%) Fuzzy1 48.93 (2.0%) 48.30 (2.0%) -1.3% ( -5% - 2%) Fuzzy2 56.09 (3.0%) 55.38 (2.4%) -1.3% ( -6% - 4%) Respell 46.99 (3.7%) 46.39 (2.9%) -1.3% ( -7% - 5%) MedPhrase 120.51 (5.7%) 119.16 (6.0%) -1.1% ( -12% - 11%) HighSloppyPhrase 0.94 (4.5%) 0.93 (6.1%) -1.1% ( -11% - 9%) MedSloppyPhrase 26.59 (1.4%) 26.37 (2.4%) -0.8% ( -4% - 3%) LowPhrase 21.67 (5.6%) 21.52 (6.1%) -0.7% ( -11% - 11%) HighPhrase 17.80 (10.0%) 17.70 (10.7%) -0.6% ( -19% - 22%) AndHighMed 108.97 (0.6%) 108.48 (0.9%) -0.4% ( -1% - 1%) LowSloppyPhrase 20.81 (2.0%) 20.74 (2.2%) -0.3% ( -4% - 3%) MedSpanNear 29.10 (1.3%) 29.03 (1.1%) -0.2% ( -2% - 2%) HighSpanNear 3.57 (1.6%) 3.57 (1.3%) -0.0% ( -2% - 2%) LowSpanNear 8.46 (2.2%) 8.46 (2.0%) 0.0% ( -4% - 4%) AndHighLow 665.03 (1.5%) 668.55 (2.0%) 0.5% ( -2% - 4%) {noformat} Looks like things got a bit slower ... not sure why. > Cleanup FacetsAccumulator API path > ---------------------------------- > > Key: LUCENE-4757 > URL: https://issues.apache.org/jira/browse/LUCENE-4757 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet > Reporter: Shai Erera > Assignee: Shai Erera > Attachments: LUCENE-4757.patch, LUCENE-4757.patch > > > FacetsAccumulator and FacetRequest expose too many things to users, even when > they are not needed, e.g. complements and partitions. Also, Aggregator is > created per-FacetRequest, while in fact applied per category list. This is > confusing, because if you want to do two aggregations, e.g. count and > sum-score, you need to separate the two dimensions into two different > category lists at indexing time. > It's not so easy to refactor everything in one go, since there's a lot of > code involved. So in this issue I will: > * Remove complements from FacetRequest. It is only relevant to > CountFacetRequest anyway. In the future, it should be a special Accumulator. > * Make FacetsAccumulator concrete class, and StandardFacetsAccumulator extend > it and handles all the stuff that's relevant to sampling, complements and > partitions. Gradually, these things will be migrated to the new API, and > hopefully StandardFacetsAccumulator will go away. > * Aggregator is per-document. I could not break its API b/c some features > (e.g. complement) depend on it. So rather I created a new FacetsAggregator, > with a bulk, per-segment, API. So far migrated Counting and SumScore to that > API. > ** In the new API, you need to override FacetsAccumulator to define an > Aggregator for use, the default is CountingFacetsAggregator. > * Started to refactor FacetResultsHandler, which its API was guided by the > use of partitions. I added a simple {{compute(FacetArrays)}} to it, which by > default delegates to the nasty API, but overridden by specific classes. This > will get cleaned further along too. > * FacetRequest has a .getValueOf() which resolves an ordinal to its value > (i.e. which of the two arrays to use). I added FacetRequest.FacetArraysSource > and specialize when they are INT or FLOAT, creating a special > FacetResultsHandler which does not go back to FR.getValueOf for every > ordinal. I think that we can migrate other FacetResultsHandlers to behave > like that ... at the expense of code duplication. > ** I also added a TODO to get rid of getValueOf entirely .. will be done > separately. > * Got rid of CountingFacetsCollector and StandardFacetsCollector in favor of > a single FacetsCollector which collects matching documents, and optionally > scores, per-segment. I wrote a migration class from these per-segment > MatchingDocs to ScoredDocIDs (which is global), so that the rest of the code > works, but the new code works w/ the optimized per-segment API. I hope > performance is still roughly the same w/ these changes too. > There will be follow-on issues to migrate more features to the new API, and > more cleanups ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org