mbutrovich opened a new issue, #2051:
URL: https://github.com/apache/datafusion-comet/issues/2051

   ### Describe the bug
   
   ### Background
   We encountered a [new 
issue](https://github.com/apache/datafusion-comet/pull/2040#issuecomment-3111563959)
 while trying to upgrade to DF49 where a test using the `first` or `last` 
aggregations when ignoring nulls returns incorrect results. DF49 did have some 
changes to these code paths, but I think this is just exposing a Comet issue, 
rather than a DF issue. After digging into it, the data coming out of the 
partial aggregation is correct on the native side with 3 columns showing a 
single row for one of the parititions that is `[2, null, false]` but when we 
import the vector on the Spark side the value changes to `[2, null, true]` 
which breaks the final aggregation and produces wrong results. When I dig into 
the FFI snapshot of the problematic batches, the only difference I found was 
that `offset` was non-zero (1 in this case).
   
   More discussion on a proposed DataFusion workaround: 
https://github.com/apache/datafusion/pull/16918.
   
   Digging into past issues, `offset` with FFI between C and Rust with Java 
seems problematic...
   https://github.com/apache/arrow-rs/issues/3671
   https://github.com/apache/arrow-rs/pull/3675
   https://github.com/apache/arrow-rs/issues/5959
   ...and perhaps most importantly...
   https://github.com/apache/arrow-java/issues/88
   
   ### Proposed fix
   
   I have a test branch locally that just does a `take` on any arrays with a 
non-zero offset before sending them over the JNI boundary with FFI. Running 
`make test` locally with a debug print if a non-zero offset array occurs makes 
this seem like a very rare code path. I will likely open a PR later today with 
this workaround, but wanted an issue to add a comment in the code related and 
to collect discussion.
   
   ### Steps to reproduce
   
   It's slightly non-deterministic due to the first/last function behavior, but 
this test with DF49 reproduces almost every time for me:
   
   https://github.com/apache/datafusion-comet/pull/2040#issuecomment-3111563959
   
   ### Expected behavior
   
   A `false` should flip to `true` when crossing the JNI boundary with Arrow 
FFI.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to