[ https://issues.apache.org/jira/browse/LUCENE-10325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506640#comment-17506640 ]
Yuting Gan edited comment on LUCENE-10325 at 3/15/22, 12:22 AM: ---------------------------------------------------------------- Thanks [~gsmiller] for creating issue! I provided a default implementation of _getTopDims(int topNDims, int topNChildren)_ in the Facets class that calls the existing _getAllDims(topNChildren)_ function and returns _FacetResult_ of the requested _topNDims_ and their {_}topNChildren{_}. Currently, I only experimented with one overridden implementation of _getTopDims_ in _SortedSetDocValuesFacetCounts_ that aims to provide a more optimal way of populating {_}dimCount{_}. It avoids resolving all child paths and creating all _FacetResult_ for every dim when calling _getTopDims._ I created #747 for this change and will appreciate any feedback. Since this change has a lot of code refactoring in SSDVFacetCounts, if it is worth and the PR is approved, I can also expand it to ConcurrentSSDVFacetCounts and explore other possible optimized implementations in faceting. Thanks! was (Author: yutinggan): Thanks [~gsmiller] for creating issue! I provided a default implementation of `getTopDims({color:#cc7832}int {color}topNDims{color:#cc7832}, int {color}topNChildren)` in the Facets class that calls the existing `getAllDims(topNChildren)` function and returns `FacetResult` of the requested `topNDims` and their `topNChildren`. Currently, I only experimented with one overridden implementation of `getTopDims` in `SortedSetDocValuesFacetCounts` that aims to provide a more optimal way of populating dimCount. It avoids resolving all child paths and creating all FacetResult for every dim when calling `getTopDims`. I created #747 for this change and will appreciate any feedback. Since this change has a lot of code refactoring in SSDVFacetCounts, if it is worth and the PR is approved, I can also expand it to `ConcurrentSSDVFacetCounts`and explore other possible optimized implementations in faceting. > Add getTopDims functionality to Facets > -------------------------------------- > > Key: LUCENE-10325 > URL: https://issues.apache.org/jira/browse/LUCENE-10325 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet > Reporter: Greg Miller > Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > The current {{getAllDims}} functionality is really the only way for users to > determine the "top" dimensions in a faceting field (i.e., get the top dims by > count along with their top-n children), but it has the unfortunate > side-effect of resolving all child paths for every dim, even if the user > doesn't intend to use those dims. For example, if a match set contains docs > relating to 100 different dims (and various values under each), but the user > only wants the top 10 dims with their top 5 children, they can call > getAllDims(5) then just grab the first 10 results, but a lot of wasted work > has been done for the other 90 dims. > It would be nice to implement something like {{getTopDims(int numDims, int > numChildren)}} that would only do the work necessary to resolve {{numDims}} > dims instead of all dims. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org