[ 
https://issues.apache.org/jira/browse/LUCENE-10325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506640#comment-17506640
 ] 

Yuting Gan edited comment on LUCENE-10325 at 3/29/22, 12:17 AM:
----------------------------------------------------------------

Thanks [~gsmiller] for creating this issue.

I provided a default implementation of getTopDims(int topNDims, int 
topNChildren) in the Facets class that calls the existing 
getAllDims(topNChildren) function and returns FacetResult of the requested 
topNDims and their topNChildren.

Currently, I only experimented with one overridden implementation of getTopDims 
in SortedSetDocValuesFacetCounts that aims to provide a more optimal way of 
populating dimCount. It avoids resolving all child paths and creating all 
FacetResult for every dim when calling getTopDims. 

I created #747 for this change and will appreciate any feedback. Since this 
change has a lot of code refactoring in SSDVFacetCounts, if it is worth and the 
PR is approved, I can also expand it to ConcurrentSSDVFacetCounts and explore 
other possible optimized implementations in faceting. Thanks!


was (Author: yutinggan):
Thanks [~gsmiller] for creating this issue.

I provided a default implementation of _getTopDims(int topNDims, int 
topNChildren)_ in the Facets class that calls the existing 
_getAllDims(topNChildren)_ function and returns _FacetResult_ of the requested 
_topNDims_ and their {_}topNChildren{_}.

Currently, I only experimented with one overridden implementation of 
_getTopDims_ in _SortedSetDocValuesFacetCounts_ that aims to provide a more 
optimal way of populating {_}dimCount{_}. It avoids resolving all child paths 
and creating all _FacetResult_ for every dim when calling _getTopDims._ 

I created #747 for this change and will appreciate any feedback. Since this 
change has a lot of code refactoring in SSDVFacetCounts, if it is worth and the 
PR is approved, I can also expand it to ConcurrentSSDVFacetCounts and explore 
other possible optimized implementations in faceting. Thanks!

> Add getTopDims functionality to Facets
> --------------------------------------
>
>                 Key: LUCENE-10325
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10325
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Greg Miller
>            Priority: Major
>             Fix For: 9.2
>
>          Time Spent: 9h
>  Remaining Estimate: 0h
>
> The current {{getAllDims}} functionality is really the only way for users to 
> determine the "top" dimensions in a faceting field (i.e., get the top dims by 
> count along with their top-n children), but it has the unfortunate 
> side-effect of resolving all child paths for every dim, even if the user 
> doesn't intend to use those dims. For example, if a match set contains docs 
> relating to 100 different dims (and various values under each), but the user 
> only wants the top 10 dims with their top 5 children, they can call 
> getAllDims(5) then just grab the first 10 results, but a lot of wasted work 
> has been done for the other 90 dims.
> It would be nice to implement something like {{getTopDims(int numDims, int 
> numChildren)}} that would only do the work necessary to resolve {{numDims}} 
> dims instead of all dims.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to