Am curious to know how people are generally processing data from druid? We want 
to be able to spark processing in a distributed fashion using Dataframes.

- Rajiv

On 2/11/19, 1:04 PM, "Julian Jaffe" <[email protected]> wrote:

    Spark can convert an RDD of JSON strings into an RDD/DataFrame/DataSet of
    objects parsed from the JSON (something like
    `sparkSession.read.json(jsonStringRDD)`). You could hook this up to a Druid
    response, but I would definitely recommend looking through the code that
    Gian posted instead - it reads data from deep storage instead of sending an
    HTTP request to the Druid cluster and waiting for the response.
    
    On Sat, Feb 9, 2019 at 5:02 PM Rajiv Mordani <[email protected]>
    wrote:
    
    > Thanks Julian,
    >         See some questions in-line:
    >
    > On 2/6/19, 3:01 PM, "Julian Jaffe" <[email protected]> wrote:
    >
    >     I think this question is going the other way (e.g. how to read data
    > into
    >     Spark, as opposed to into Druid). For that, the quickest and dirtiest
    >     approach is probably to use Spark's json support to parse a Druid
    > response.
    >
    > [Rajiv] Can you please expand more here?
    >
    >     You may also be able to repurpose some code from
    >
    > 
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FSparklineData%2Fspark-druid-olap&amp;data=02%7C01%7Crmordani%40vmware.com%7C4b7f159a82db4dc4fdc008d690647969%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636855158547887488&amp;sdata=9Uq3ox5hhes60fxfqMOxmjfQPZdwFrfSs7glVLTafs0%3D&amp;reserved=0,
    > but I don't think
    >     there's any official guidance on this.
    >
    >
    >
    >     On Wed, Feb 6, 2019 at 2:21 PM Gian Merlino <[email protected]> wrote:
    >
    >     > Hey Rajiv,
    >     >
    >     > There's an unofficial Druid/Spark adapter at:
    >     >
    > 
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmetamx%2Fdruid-spark-batch&amp;data=02%7C01%7Crmordani%40vmware.com%7C4b7f159a82db4dc4fdc008d690647969%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636855158547887488&amp;sdata=OFHEl0qFx5g8csFcjz5qnfU67bw37reST%2BYY%2BqzDLk8%3D&amp;reserved=0.
    > If you want to stick to
    >     > official things, then the best approach would be to use Spark to
    > write data
    >     > to HDFS or S3 and then ingest it into Druid using Druid's
    > Hadoop-based or
    >     > native batch ingestion. (Or even write it to Kafka using Spark
    > Streaming
    >     > and ingest from Kafka into Druid using Druid's Kafka indexing
    > service.)
    >     >
    >     > On Wed, Feb 6, 2019 at 12:04 PM Rajiv Mordani
    > <[email protected]
    >     > >
    >     > wrote:
    >     >
    >     > > Is there a best practice for how to load data from druid to use in
    > a
    >     > spark
    >     > > batch job? I asked this question on the user alias but got no
    > response
    >     > > hence reposting here.
    >     > >
    >     > >
    >     > >   *   Rajiv
    >     > >
    >     >
    >
    >
    >
    

Reply via email to