adriangb commented on PR #16501:
URL: https://github.com/apache/datafusion/pull/16501#issuecomment-3016782508

   I've pushed a series of commits that simplify the test down to a trivial 
example:
   
   | u64 | u32 |
   |-----|-----|
   | 1   | 1   |
   | 2   | 1   |
   
   ```sql
   SELECT * FROM t ORDER BY u32 LIMIT 1
   ```
   
   Now it's clear what is happening: previously each partition had `ORDER BY 
u32 LIMIT 1` applied internally and then they got combined globally, so results 
were always deterministic.
   
   With the addition of pre-filtering rows in the TopK operator, since the 
filter is updated across partitions, as soon as 1 partition sees the value for 
`u32` and has filled up it's limit the other partitions will drop the data on 
the floor. But which partition sees the value first and "wins" is 
non-deterministic, thus which value for `u64` comes out is also non 
deterministic. This isn't necessarily a bad thing, I believe many other 
database systems work like this (probably for similar reasons), but we do need 
to do something about this. Even if we had a shared TopK heap I think we'd have 
the same issue because which row ends up in the TopK heap will still be 
non-deterministic.
   
   Options I can see:
   
   1. Modify the fuzzer to account for the fact that this non-determinism is 
okay.
   2. Make the filters be updated per-partition, which probably looses some 
performance (I think not too much?).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to