FearfulTomcat27 opened a new pull request, #15570: URL: https://github.com/apache/iotdb/pull/15570
This pull request introduces a new feature for approximate most frequent value aggregation (`approx_most_frequent`) in Apache IoTDB. It includes changes to support this functionality in the query engine, test cases, and dependency updates. Below are the key changes: ### Feature Implementation: Approximate Most Frequent Aggregation * **New Abstract Class for Accumulators**: Added `AbstractApproxMostFrequentAccumulator` to handle common logic for approximate most frequent value aggregation. It includes methods for intermediate and final evaluations, reset functionality, and unsupported statistics handling. (`iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/operator/source/relational/aggregation/AbstractApproxMostFrequentAccumulator.java`) * **Specific Accumulators**: Introduced specialized accumulator classes such as `BinaryApproxMostFrequentAccumulator` and `BlobApproxMostFrequentAccumulator` to handle different data types, including binary and blob data. (`iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/operator/source/relational/aggregation/BinaryApproxMostFrequentAccumulator.java`, `iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/operator/source/relational/aggregation/BlobApproxMostFrequentAccumulator.java`) [[1]](diffhunk://#diff-1e4e4db0b91f86d3760197aa3c30f8a8ea7416acfe76ea4e4b35473045c695e1R1-R124) [[2]](diffhunk://#diff-c152a4d0cb08185d96ba8fd81ea31eccedd9d7c253f0c7868723800aa3692e43R1-R39) * **Accumulator Factory Enhancements**: Updated `AccumulatorFactory` to support `APPROX_MOST_FREQUENT` aggregation type, with methods to create grouped and table accumulators for various data types. (`iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/operator/source/relational/aggregation/AccumulatorFactory.java`) [[1]](diffhunk://#diff-a72623b1eb8bac3f674e3b12529635a4ebd66a0c5763e40878249d13bd6e6068R25-R29) [[2]](diffhunk://#diff-a72623b1eb8bac3f674e3b12529635a4ebd66a0c5763e40878249d13bd6e6068R257-R258) [[3]](diffhunk://#diff-a72623b1eb8bac3f674e3b12529635a4ebd66a0c5763e40878249d13bd6e6068R326-R382) ### Dependency Updates * **Added Dependency for Stream Library**: Included `com.clearspring.analytics:stream` (version 2.9.8) to support space-saving data structures used in the new aggregation functionality. (`iotdb-core/datanode/pom.xml`) ### Testing * **Integration Tests**: Added a new test case `approxMostFrequentTest` to verify the functionality of the `approx_most_frequent` aggregation. The test includes queries for different scenarios and validates the results against expected outputs. (`integration-test/src/test/java/org/apache/iotdb/relational/it/query/recent/IoTDBTableAggregationIT.java`) ### Refactoring and Adjustments * **Refactored Approximation Logic**: Adjusted imports and dependencies in `ApproxCountDistinctAccumulator` to align with the new structure for approximate aggregations. (`iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/operator/source/relational/aggregation/ApproxCountDistinctAccumulator.java`) [[1]](diffhunk://#diff-24c6e2b1ec5522790e49b72c6a8fb6c2c7618905a685fa633552ee87709b1200R17-R19) [[2]](diffhunk://#diff-24c6e2b1ec5522790e49b72c6a8fb6c2c7618905a685fa633552ee87709b1200L27-R30) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
