EdsonPetry opened a new pull request, #22985:
URL: https://github.com/apache/datafusion/pull/22985

   ## Which issue does this PR close?
   
   - Closes #22819.
   
   ## Rationale for this change
   
   `approx_distinct`'s grouped HyperLogLog fast path encoded its supported 
input types in two independent places: the `is_hll_groups_type` predicate 
(backing `groups_accumulator_supported`) and the `match` in 
`create_groups_accumulator`. The two share a contract — if 
`groups_accumulator_supported` returns `true`, `create_groups_accumulator` must 
succeed — but nothing enforced it. A future type addition or edit could update 
one path and not the other, either silently dropping a type to the slow 
`GroupsAccumulatorAdapter` path or producing a runtime `not_impl_err` for an 
advertised-supported type.
   
   ## What changes are included in this PR?
   
   - Introduce `create_hll_groups_accumulator(&DataType) -> Option<Box<dyn 
GroupsAccumulator>>` as the single source of truth for grouped HLL dispatch 
(`Some` = supported, `None` = falls back to the per-group `Accumulator` path).
   - Rewire `groups_accumulator_supported` to 
`create_hll_groups_accumulator(..).is_some()` and `create_groups_accumulator` 
to wrap the helper, re-attaching the existing `not_impl_err!` message for 
unsupported types.
   - Remove the now-redundant `is_hll_groups_type` predicate; its rationale 
moves onto the helper's doc comment.
   
   No change to which types are supported or to runtime behavior — this is a 
pure consolidation.
   
   ## Are these changes tested?
   
   Yes. A new unit test, `grouped_support_predicate_matches_constructor`, 
drives a table of representative supported and unsupported `DataType`s and 
asserts both `groups_accumulator_supported` and 
`create_groups_accumulator(..).is_ok()` agree with the expected support for 
each — pinning both halves of the contract. It includes the unsupported 
time/unit combinations called out in the issue (`Time32(Microsecond)`, 
`Time32(Nanosecond)`, `Time64(Second)`, `Time64(Millisecond)`) as regression 
guards. All existing `approx_distinct` grouped tests continue to pass.
   
   ## Are there any user-facing changes?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to