[
https://issues.apache.org/jira/browse/KYLIN-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14900444#comment-14900444
]
liyang commented on KYLIN-1034:
-------------------------------
How is the performance difference between in-mem and mem mapping? Got any
benchmark around that?
And yes, for OLAP, the bitmaps are immutable once built.
> Faster bitmap indexes with Roaring bitmaps
> ------------------------------------------
>
> Key: KYLIN-1034
> URL: https://issues.apache.org/jira/browse/KYLIN-1034
> Project: Kylin
> Issue Type: Improvement
> Components: General
> Affects Versions: Future
> Reporter: Daniel Lemire
> Labels: performance
>
> Kylin is using ConciseSet for bitmap indexing. It was found that Roaring
> bitmaps are often much better than ConciseSet (e.g., see experimental section
> in http://arxiv.org/pdf/1402.6407.pdf ). The compression is often better and
> the speed difference can be substantial. This is even more so with version
> 0.5 of Roaring.
> There is a high quality Java implementation that is used by Apache Spark and
> Druid.io. The Druid people found that switching to Roaring bitmaps could
> improve real-world performance by 30% or more.
> Source code:
> https://github.com/lemire/RoaringBitmap/
> Importing from Maven:
> http://central.maven.org/maven2/org/roaringbitmap/RoaringBitmap/0.5.3/
> <dependencies>
> <dependency>
> <groupId>org.roaringbitmap</groupId>
> <artifactId>RoaringBitmap</artifactId>
> <version>[0.5,)</version>
> </dependency>
> </dependencies>
> Online API :
> http://lemire.me/docs/RoaringBitmap/
> JavaDoc Jar:
> http://central.maven.org/maven2/org/roaringbitmap/RoaringBitmap/0.5.3/RoaringBitmap-0.5.3-javadoc.jar
> When desired, the library supports memory file mapping, so that out-of-JVM
> heap memory is used instead. This can greatly improve IO issues. The library
> is available under the Apache license and is patent-free.
> There is an extensive real-data benchmark framework which you can run for
> yourself to compare Roaring with competitive alternatives such as ConciseSet :
> https://github.com/lemire/RoaringBitmap/tree/master/jmh
> Running such a benchmark can be as simple as launching a script.
> For Druid, the bitmap format was made "pluggable" so you can switch from one
> format to the other using a configuration flag. This is implemented through
> simple wrappers, e.g., see
> https://github.com/metamx/bytebuffer-collections/tree/master/src/main/java/com/metamx/collections/bitmap
> So it can be really easy to make it possible to switch the format while
> preserving backward compatibility if needed...
> It is probably not difficult work to integrate Roaring in Kylin (maybe a day
> or so of programming) and it could potentially improve performance while
> reducing memory storage.
> Note: Roaring bitmaps were also adopted in Apache Lucene, though they have
> their own implementation, see
> https://www.elastic.co/blog/frame-of-reference-and-roaring-bitmaps
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)