[
https://issues.apache.org/jira/browse/LUCENE-9950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17354735#comment-17354735
]
Alexander L commented on LUCENE-9950:
-------------------------------------
Thank you for adding the new facet implementation, [~gsmiller]!
??It seems like the only advantage it might offer over a taxonomy-based
approach is not requiring the side-car index??
A couple of SSDVFF advantages we found is the ability to perform fast index
merge operation, since it is a regular index and does not require [global
ordinals translation
logic|https://github.com/apache/lucene/blob/main/lucene/facet/src/java/org/apache/lucene/facet/taxonomy/TaxonomyMergeUtils.java]
(regular index merge with HardlinkCopyDirectoryWrapper takes 3 minutes in our
tests, while main+taxonomy pairs merge is about 85 minutes for ~ 200Gb index
size). Also, SSDVFF indexing performance is better and unlike the Taxonomy
approach, scales with added threads. These advantages tipped the scales in
favor of SSDVFF in our case, although Taxonomy provides a bit better query
performance and allows hierarchical faceting.
?? there may still be some use-cases for "packing" multiple "dimensions" into
one field??
I wonder what use cases do you have in mind for that, or maybe you have some
performance comparison with SortedSetDocValuesFacetField implementation
available? I remember reading somewhere that facet dimensions stored in a
single field can provide better performance (e.g due to CPU reference
locality), but not sure how big the difference can be.
> Support both single- and multi-value string fields in facet counting
> (non-taxonomy based approaches)
> ----------------------------------------------------------------------------------------------------
>
> Key: LUCENE-9950
> URL: https://issues.apache.org/jira/browse/LUCENE-9950
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/facet
> Affects Versions: main (9.0)
> Reporter: Greg Miller
> Priority: Minor
> Fix For: main (9.0), 8.9
>
> Time Spent: 3h
> Remaining Estimate: 0h
>
> Users wanting to facet count string-based fields using a non-taxonomy-based
> approach can use {{SortedSetDocValueFacetCounts}}, which accumulates facet
> counts based on a {{SortedSetDocValues}} field. This requires the stored doc
> values to be multi-valued (i.e., {{SORTED_SET}}), and doesn't work on
> single-valued fields (i.e., SORTED). In contrast, if a user wants to facet
> count on a stored numeric field, they can use {{LongValueFacetCounts}}, which
> supports both single- and multi-valued fields (and in LUCENE-9948, we now
> auto-detect instead of asking the user to specify).
> Let's update {{SortedSetDocValueFacetCounts}} to also support, and
> automatically detect single- and multi-value fields. Note that this is a
> spin-off issue from LUCENE-9946, where [~rcmuir] points out that this can
> essentially be a one-line change, but we may want to do some class renaming
> at the same time. Also note that we should do this in
> {{ConcurrentSortedSetDocValuesFacetCounts}} while we're at it.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]