Github user lemire commented on the pull request: https://github.com/apache/spark/pull/9243#issuecomment-150668521 @rxin There are definitively cases where attempting to use compressed bitmaps is wasteful. For example, if you have a small universe size. E.g., your bitmaps represent sets of integer from [0,n) where n is small (e.g., n=64 or n=128). It is just generally true that compression is not always a good idea. The fact that you are able to use uncompressed BitSet and it does not blow up memory usage tells me that you might be in a scenario where compression is not useful. Techniques like Roaring or Concise do not make uncompressed BitSet obsolete. Rather, they are there to help when regular BitSets would fail you due to excessive memory usage. How can this happen? Well. Suppose that you are trying to index a column containing 1000 distinct integer values. If you try to do it with a BitSet, each row will use 125 bytes... just to index this column... if you have 10,000 distinct values, then you use over 1kB per row just to index this one column. And so forth. But, if your BitSets are tiny then compressing them could definitively be wasteful.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org