priyen opened a new pull request, #9097:
URL: https://github.com/apache/pinot/pull/9097
This adds support for distinct count hll pre-aggregation. It introduces a
new property on the fieldSpec, fixedLength in bytes so that BYTES data type can
be treated as fixed length and we can utilize the
FixedByteSVMutableForwardIndex.
When used for Hyperloglog data values, the fixedLength should represent in
bytes the size of the Hyperloglog object when serialized.
Hyperloglog w/ log2m of 8 has a size of 180 bytes, with a log2m of 12 has a
size of 2740 bytes. I unit tested using log2m of 12 because that's the size one
of our use cases require
unit tests for the fixedByte mutable forward indexes' getBytes() and
setBytes() new implementation
unit tests for aggregating rows and asserting on their Hyperloglog objects
Instructions:
1. The PR has to be tagged with at least one of the following labels (*):
1. `feature`
2. `bugfix`
3. `performance`
4. `ui`
5. `backward-incompat`
6. `release-notes` (**)
2. Remove these instructions before publishing the PR.
(*) Other labels to consider:
- `testing`
- `dependencies`
- `docker`
- `kubernetes`
- `observability`
- `security`
- `code-style`
- `extension-point`
- `refactor`
- `cleanup`
(**) Use `release-notes` label for scenarios like:
- New configuration options
- Deprecation of configurations
- Signature changes to public methods/interfaces
- New plugins added or old plugins removed
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]