[
https://issues.apache.org/jira/browse/LUCENE-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shai Erera updated LUCENE-7590:
-------------------------------
Attachment: LUCENE-7590.patch
Patch implements a {{DocValuesStatsCollector}}. Note some key design decisions:
A {{DocValuesStats}} is responsible for providing the specific
{{DocValuesIterator}} for a {{LeafReaderContext}}. It then accumulates the
value, computes missing and other statistics. It computes {{missing}} and
{{count}}, leaving {{min}} and {{max}} to the actual implementation. Also, this
stats does not define a {{mean}}, as at least for now I'm not sure how the mean
value of a {{SortedSetDocValues}} is defined.
An abstract {{NumericDocValuesStats}} implementation for single-numeric DV
fields, which also adds a {{mean}} statistic, with two concrete
implementations: {{LongNumericDocValuesStats}} and
{{DoubleNumericDocValuesStats}}.
This hierarchy should allow us to add further statistics for {{SortedSet}} and
{{SortedNumeric}} DV fields. I did not implement them yet, as I'm not sure
about some of the statistics (e.g. should the {{mean}} stat of a
{{SortedNumeric}} be the mean across all values, or the minimum per document or
...). Let's discuss that separately.
Also, note that I had to make {{DocValuesIterator}} public in order to declare
it in this collector.
If you're OK with the design and implementation, I want to separate
{{DovValuesStats}} to its own file, for clarity. I did not do it yet though.
> Add DocValues statistics helpers
> --------------------------------
>
> Key: LUCENE-7590
> URL: https://issues.apache.org/jira/browse/LUCENE-7590
> Project: Lucene - Core
> Issue Type: New Feature
> Components: modules/misc
> Reporter: Shai Erera
> Assignee: Shai Erera
> Attachments: LUCENE-7590.patch, LUCENE-7590.patch, LUCENE-7590.patch,
> LUCENE-7590.patch
>
>
> I think it can be useful to have DocValues statistics helpers, that can allow
> users to query for the min/max/avg etc. stats of a DV field. In this issue
> I'd like to cover numeric DV, but there's no reason not to add it to other DV
> types too.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]