Fantastic!!! I will look into that and I hope to contribute Paolo
Inviata dal mio Windows Phone ________________________________ Da: Harish Butani<mailto:rhbutani.sp...@gmail.com> Inviato: 02/09/2015 06:04 A: user<mailto:user@spark.apache.org> Oggetto: Spark + Druid Hi, I am working on the Spark Druid Package: https://github.com/SparklineData/spark-druid-olap. For scenarios where a 'raw event' dataset is being indexed in Druid it enables you to write your Logical Plans(queries/dataflows) against the 'raw event' dataset and it rewrites parts of the plan to execute as a Druid Query. In Spark the configuration of a Druid DataSource is somewhat like configuring an OLAP index in a traditional DB. Early results show significant speedup of pushing slice and dice queries to Druid. It comprises of a Druid DataSource that wraps the 'raw event' dataset and has knowledge of the Druid Index; and a DruidPlanner which is a set of plan rewrite strategies to convert Aggregation queries into a Plan having a DruidRDD. Here<https://github.com/SparklineData/spark-druid-olap/blob/master/docs/SparkDruid.pdf> is a detailed design document, which also describes a benchmark of representative queries on the TPCH dataset. Looking for folks who would be willing to try this out and/or contribute. regards, Harish Butani.