R: Spark + Druid

Paolo Platter Tue, 01 Sep 2015 23:47:52 -0700

Fantastic!!! I will look into that and I hope to contribute

Paolo


Inviata dal mio Windows Phone
________________________________
Da: Harish Butani<mailto:rhbutani.sp...@gmail.com>
Inviato: ‎02/‎09/‎2015 06:04
A: user<mailto:user@spark.apache.org>
Oggetto: Spark + Druid

Hi,

I am working on the Spark Druid Package: 
https://github.com/SparklineData/spark-druid-olap.
For scenarios where a 'raw event' dataset is being indexed in Druid it enables 
you to write your Logical Plans(queries/dataflows) against the 'raw event' 
dataset and it rewrites parts of the plan to execute as a Druid Query. In Spark 
the configuration of a Druid DataSource is somewhat like configuring an OLAP 
index in a traditional DB. Early results show significant speedup of pushing 
slice and dice queries to Druid.

It comprises of a Druid DataSource that wraps the 'raw event' dataset and has 
knowledge of the Druid Index; and a DruidPlanner which is a set of plan rewrite 
strategies to convert Aggregation queries into a Plan having a DruidRDD.

Here<https://github.com/SparklineData/spark-druid-olap/blob/master/docs/SparkDruid.pdf>
 is a detailed design document, which also describes a benchmark of 
representative queries on the TPCH dataset.

Looking for folks who would be willing to try this out and/or contribute.

regards,
Harish Butani.

R: Spark + Druid

Reply via email to