Hi,

I am working on the Spark Druid Package:
https://github.com/SparklineData/spark-druid-olap.
For scenarios where a 'raw event' dataset is being indexed in Druid it
enables you to write your Logical Plans(queries/dataflows) against the 'raw
event' dataset and it rewrites parts of the plan to execute as a Druid
Query. In Spark the configuration of a Druid DataSource is somewhat like
configuring an OLAP index in a traditional DB. Early results show
significant speedup of pushing slice and dice queries to Druid.

It comprises of a Druid DataSource that wraps the 'raw event' dataset and
has knowledge of the Druid Index; and a DruidPlanner which is a set of plan
rewrite strategies to convert Aggregation queries into a Plan having a
DruidRDD.

Here
<https://github.com/SparklineData/spark-druid-olap/blob/master/docs/SparkDruid.pdf>
is
a detailed design document, which also describes a benchmark of
representative queries on the TPCH dataset.

Looking for folks who would be willing to try this out and/or contribute.

regards,
Harish Butani.

Reply via email to