andygrove opened a new pull request, #4390:
URL: https://github.com/apache/datafusion-comet/pull/4390

   ## Which issue does this PR close?
   
   Closes #2889.
   
   Stacks on #4015.
   
   ## Rationale for this change
   
   Comet's `SparkBloomFilter::state_as_bytes` emitted only the raw bit array in 
native endianness, with no header. Spark's `BloomFilterImpl.writeTo` emits a 
header (version, num_hash_functions, optional seed for V2, num_words) followed 
by big-endian `u64` bit words. The mismatch crashed mixed Spark-Partial / 
Comet-Final (and the inverse) execution with `Cannot merge SparkBloomFilters 
with different lengths.` (#2889).
   
   #4015 hides the crash by tagging `BloomFilterAggregate` as not 
`supportsMixedPartialFinal`. This PR fixes the root cause: Comet's intermediate 
buffer now matches Spark's `writeTo` output byte-for-byte, and `merge_filter` 
parses + validates the header before merging. With buffers wire-compatible, 
`BloomFilterAggregate` joins the `supportsMixedPartialFinal = true` set.
   
   ## What changes are included in this PR?
   
   - `SparkBloomFilter::state_as_bytes` delegates to the existing 
`spark_serialization()`.
   - `SparkBloomFilter::merge_filter` parses the Spark header, asserts 
`version` / `num_hash_functions` / `seed` / `num_words` match `self`, and 
OR-merges big-endian `u64` words.
   - `CometBloomFilterAggregate` overrides `supportsMixedPartialFinal = true`.
   - `CometObjectHashAggregateExec` gains a `getSupportLevel` override that 
mirrors `CometHashAggregateExec`'s 
`COMET_ENABLE_{PARTIAL,FINAL}_HASH_AGGREGATE` test-knob handling, so 
`ObjectHashAggregateExec` plans can be exercised in the same way.
   - Native unit tests for V1 / V2 round-trip and explicit mismatch rejection 
cases.
   - `CometExecRuleSuite` cases for both mixed-direction BloomFilter plans, 
mirroring the MIN/MAX coverage added in #4015.
   
   This PR was scaffolded with the project's `superpowers:writing-plans` / 
`superpowers:subagent-driven-development` skills.
   
   ## How are these changes tested?
   
   - Native: `cargo test -p datafusion-comet-spark-expr bloom_filter::` covers 
V1 / V2 round-trip and four mismatch-rejection cases.
   - JVM: `CometExecRuleSuite` mixed BloomFilter tests (both directions).
   - End-to-end: Spark's `BloomFilterAggregateQuerySuite` now exercises the 
mixed path via CI (previously surfaced #2889).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to