adriangb commented on PR #16501: URL: https://github.com/apache/datafusion/pull/16501#issuecomment-3016782508
I've pushed a series of commits that simplify the test down to a trivial example: | u64 | u32 | |-----|-----| | 1 | 1 | | 2 | 1 | ```sql SELECT * FROM t ORDER BY u32 LIMIT 1 ``` Now it's clear what is happening: previously each partition had `ORDER BY u32 LIMIT 1` applied internally and then they got combined globally, so results were always deterministic. With the addition of pre-filtering rows in the TopK operator, since the filter is updated across partitions, as soon as 1 partition sees the value for `u32` and has filled up it's limit the other partitions will drop the data on the floor. But which partition sees the value first and "wins" is non-deterministic, thus which value for `u64` comes out is also non deterministic. This isn't necessarily a bad thing, I believe many other database systems work like this (probably for similar reasons), but we do need to do something about this. Even if we had a shared TopK heap I think we'd have the same issue because which row ends up in the TopK heap will still be non-deterministic. Options I can see: 1. Modify the fuzzer to account for the fact that this non-determinism is okay. 2. Make the filters be updated per-partition, which probably looses some performance (I think not too much?). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org