c21 commented on pull request #29342:
URL: https://github.com/apache/spark/pull/29342#issuecomment-672288194


   @agrawaldevesh - thanks for notes. I totally agree. Just to point out for 
existing current approach, I already use unsafe row boolean type to store the 
matched bit in `BytesToBytesMap`.
   
   I think for CPU usage, the current approach works better as it does not need 
to have extra look up in key array, when iterating all values of map (which my 
gut feeling is not very efficient). For memory usage, the newly proposed 
approach works better as it only saves information for matched row, but not all 
rows. This is a trade-off here. Side note: for our internal workload originally 
it is CPU bound but not memory bound, and we are gradually moving towards 
memory bound more and more now with new type of machines. Not sure whether it's 
a trend for others for caring more of memory usage.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to