c21 commented on pull request #29342: URL: https://github.com/apache/spark/pull/29342#issuecomment-672288194
@agrawaldevesh - thanks for notes. I totally agree. Just to point out for existing current approach, I already use unsafe row boolean type to store the matched bit in `BytesToBytesMap`. I think for CPU usage, the current approach works better as it does not need to have extra look up in key array, when iterating all values of map (which my gut feeling is not very efficient). For memory usage, the newly proposed approach works better as it only saves information for matched row, but not all rows. This is a trade-off here. Side note: for our internal workload originally it is CPU bound but not memory bound, and we are gradually moving towards memory bound more and more now with new type of machines. Not sure whether it's a trend for others for caring more of memory usage. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org