[
https://issues.apache.org/jira/browse/KYLIN-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14944889#comment-14944889
]
liyang commented on KYLIN-1034:
-------------------------------
All unit tests pass successfully. Integration test will run later tonight, but
I don't think that will have problem.
> Faster bitmap indexes with Roaring bitmaps
> ------------------------------------------
>
> Key: KYLIN-1034
> URL: https://issues.apache.org/jira/browse/KYLIN-1034
> Project: Kylin
> Issue Type: Improvement
> Components: General
> Affects Versions: Future
> Reporter: Daniel Lemire
> Assignee: liyang
> Labels: performance
> Fix For: v1.1
>
> Attachments: roaring-0001.patch
>
>
> Kylin is using ConciseSet for bitmap indexing. It was found that Roaring
> bitmaps are often much better than ConciseSet (e.g., see experimental section
> in http://arxiv.org/pdf/1402.6407.pdf ). The compression is often better and
> the speed difference can be substantial. This is even more so with version
> 0.5 of Roaring.
> There is a high quality Java implementation that is used by Apache Spark and
> Druid.io. The Druid people found that switching to Roaring bitmaps could
> improve real-world performance by 30% or more.
> Source code:
> https://github.com/lemire/RoaringBitmap/
> Importing from Maven:
> http://central.maven.org/maven2/org/roaringbitmap/RoaringBitmap/0.5.3/
> <dependencies>
> <dependency>
> <groupId>org.roaringbitmap</groupId>
> <artifactId>RoaringBitmap</artifactId>
> <version>[0.5,)</version>
> </dependency>
> </dependencies>
> Online API :
> http://lemire.me/docs/RoaringBitmap/
> JavaDoc Jar:
> http://central.maven.org/maven2/org/roaringbitmap/RoaringBitmap/0.5.3/RoaringBitmap-0.5.3-javadoc.jar
> When desired, the library supports memory file mapping, so that out-of-JVM
> heap memory is used instead. This can greatly improve IO issues. The library
> is available under the Apache license and is patent-free.
> There is an extensive real-data benchmark framework which you can run for
> yourself to compare Roaring with competitive alternatives such as ConciseSet :
> https://github.com/lemire/RoaringBitmap/tree/master/jmh
> Running such a benchmark can be as simple as launching a script.
> For Druid, the bitmap format was made "pluggable" so you can switch from one
> format to the other using a configuration flag. This is implemented through
> simple wrappers, e.g., see
> https://github.com/metamx/bytebuffer-collections/tree/master/src/main/java/com/metamx/collections/bitmap
> So it can be really easy to make it possible to switch the format while
> preserving backward compatibility if needed...
> It is probably not difficult work to integrate Roaring in Kylin (maybe a day
> or so of programming) and it could potentially improve performance while
> reducing memory storage.
> Note: Roaring bitmaps were also adopted in Apache Lucene, though they have
> their own implementation, see
> https://www.elastic.co/blog/frame-of-reference-and-roaring-bitmaps
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)