[ 
https://issues.apache.org/jira/browse/KYLIN-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Lemire updated KYLIN-1034:
---------------------------------
    Attachment: roaring-0001.patch

We have simply substituted Roaring for Concise throughout the code. The patch 
is straight-forward. 

Patch corresponding to this pull request :

https://github.com/apache/incubator-kylin/pull/12

and to this commit : 

https://github.com/lemire/incubator-kylin/commit/01988012fa9c114130479ac9640957e825486769

> Faster bitmap indexes with Roaring bitmaps
> ------------------------------------------
>
>                 Key: KYLIN-1034
>                 URL: https://issues.apache.org/jira/browse/KYLIN-1034
>             Project: Kylin
>          Issue Type: Improvement
>          Components: General
>    Affects Versions: Future
>            Reporter: Daniel Lemire
>              Labels: performance
>         Attachments: roaring-0001.patch
>
>
> Kylin is using ConciseSet for bitmap indexing. It was found that Roaring 
> bitmaps are often much better than ConciseSet (e.g., see experimental section 
> in http://arxiv.org/pdf/1402.6407.pdf ). The compression is often better and 
> the speed difference can be substantial. This is even more so with version 
> 0.5 of Roaring.
> There is a high quality Java implementation that is used by Apache Spark and 
> Druid.io. The Druid people found that switching to Roaring bitmaps could 
> improve real-world performance by 30% or more. 
> Source code: 
> https://github.com/lemire/RoaringBitmap/
> Importing from Maven:
> http://central.maven.org/maven2/org/roaringbitmap/RoaringBitmap/0.5.3/
> <dependencies>
>     <dependency>
>       <groupId>org.roaringbitmap</groupId>
>       <artifactId>RoaringBitmap</artifactId>
>       <version>[0.5,)</version>
>     </dependency>
>  </dependencies>
> Online API :
> http://lemire.me/docs/RoaringBitmap/
> JavaDoc Jar: 
> http://central.maven.org/maven2/org/roaringbitmap/RoaringBitmap/0.5.3/RoaringBitmap-0.5.3-javadoc.jar
> When desired, the library supports memory file mapping, so that  out-of-JVM 
> heap memory is used instead. This can greatly improve IO issues. The library 
> is available under the Apache license and is patent-free.
> There is an extensive real-data benchmark framework which you can run for 
> yourself to compare Roaring with competitive alternatives such as ConciseSet :
> https://github.com/lemire/RoaringBitmap/tree/master/jmh
> Running such a benchmark can be as simple as launching a script.
> For Druid, the bitmap format was made "pluggable" so you can switch from one 
> format to the other using a configuration flag. This is implemented through 
> simple wrappers, e.g., see
> https://github.com/metamx/bytebuffer-collections/tree/master/src/main/java/com/metamx/collections/bitmap
> So it can be really easy to make it possible to switch the format while 
> preserving backward compatibility if needed... 
> It is probably not difficult work to integrate Roaring in Kylin (maybe a day 
> or so of programming) and it could potentially improve performance while 
> reducing memory storage.
> Note: Roaring bitmaps were also adopted in Apache Lucene, though they have 
> their own implementation, see 
> https://www.elastic.co/blog/frame-of-reference-and-roaring-bitmaps



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to