Shekharrajak opened a new pull request, #3003: URL: https://github.com/apache/datafusion-comet/pull/3003
Handle Spark's full serialization format (12-byte header + bits) in merge_filter() to support Spark partial / Comet final execution. The fix automatically detects the format and extracts bits data accordingly. Fixes #2889 ## Rationale for this change Spark's serialize() returns full format: 12-byte header (version + numHashFunctions + numWords) + bits data Comet's state_as_bytes() returns bits data only When Spark partial sends full format, Comet's merge_filter() expects bits-only, causing mismatch Ref https://github.com/apache/spark/blob/master/common/sketch/src/main/java/org/apache/spark/util/sketch/BitArray.java#L99 Ref https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/BloomFilterAggregate.scala#L219 Spark format: BloomFilterImpl.writeTo() (4+4 bytes) + BitArray.writeTo() (4 bytes + bits) ## What changes are included in this PR? Detects Spark format (buffer size = 12 + expected_bits_size) Extracts bits data by skipping 12-byte header if Spark format Returns bits as-is if Comet format ## How are these changes tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
