Re: [PR] Remove unnecessary bit counting code from spark `bit_count` [datafusion]

via GitHub Thu, 20 Nov 2025 22:47:29 -0800


pepijnve commented on PR #18841:
URL: https://github.com/apache/datafusion/pull/18841#issuecomment-3561659194


   > Thanks @pepijnve in Spark/JVM and Rust sometimes there are discrepancies, 
like treating decimals, regexp, etc.
   
   Yep, I understand that. What was a bit puzzling initially was that there was 
no escription of what was actually different and why the port of the Java 
“count ones” implementation was being added.
   
   The difference was that the original DataFusion implementation was operating 
on the native size of the signed integer input values, while Spark always 
operates on Java long (i.e. i64). For unsigned and non negative signed integers 
that not an issue since the answer is the same. For negative integers though 
you get a different result since those are padded with `1`s when widened.
   
   There’s absolutely no need for a custom popcount implementation to fix this. 
Just widen to i64 and use count_ones.
   
   > please add tests for booleans T/F/null
   
   That code path was not touched in this PR at all. Not sure why I should add 
tests for code that’s not being added or modified.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Remove unnecessary bit counting code from spark `bit_count` [datafusion]

Reply via email to