Re: Spark batch with Druid

Julian Jaffe Mon, 11 Feb 2019 13:05:08 -0800

Spark can convert an RDD of JSON strings into an RDD/DataFrame/DataSet of
objects parsed from the JSON (something like
`sparkSession.read.json(jsonStringRDD)`). You could hook this up to a Druid
response, but I would definitely recommend looking through the code that
Gian posted instead - it reads data from deep storage instead of sending an
HTTP request to the Druid cluster and waiting for the response.


On Sat, Feb 9, 2019 at 5:02 PM Rajiv Mordani <[email protected]>
wrote:

> Thanks Julian,
>         See some questions in-line:
>
> On 2/6/19, 3:01 PM, "Julian Jaffe" <[email protected]> wrote:
>
>     I think this question is going the other way (e.g. how to read data
> into
>     Spark, as opposed to into Druid). For that, the quickest and dirtiest
>     approach is probably to use Spark's json support to parse a Druid
> response.
>
> [Rajiv] Can you please expand more here?
>
>     You may also be able to repurpose some code from
>
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FSparklineData%2Fspark-druid-olap&amp;data=02%7C01%7Crmordani%40vmware.com%7Cdac469891e6143eb417208d68c87161c%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C636850909153478697&amp;sdata=YwEJLohvwCI%2FGnjtlH%2BP6BgnLLketOJnhp8IGZey2d4%3D&amp;reserved=0,
> but I don't think
>     there's any official guidance on this.
>
>
>
>     On Wed, Feb 6, 2019 at 2:21 PM Gian Merlino <[email protected]> wrote:
>
>     > Hey Rajiv,
>     >
>     > There's an unofficial Druid/Spark adapter at:
>     >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmetamx%2Fdruid-spark-batch&amp;data=02%7C01%7Crmordani%40vmware.com%7Cdac469891e6143eb417208d68c87161c%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C636850909153478697&amp;sdata=WnaiBpvr%2B4%2BrkFGZPhcZJ%2BpbrxkkzyAv8vi7cql5GZA%3D&amp;reserved=0.
> If you want to stick to
>     > official things, then the best approach would be to use Spark to
> write data
>     > to HDFS or S3 and then ingest it into Druid using Druid's
> Hadoop-based or
>     > native batch ingestion. (Or even write it to Kafka using Spark
> Streaming
>     > and ingest from Kafka into Druid using Druid's Kafka indexing
> service.)
>     >
>     > On Wed, Feb 6, 2019 at 12:04 PM Rajiv Mordani
> <[email protected]
>     > >
>     > wrote:
>     >
>     > > Is there a best practice for how to load data from druid to use in
> a
>     > spark
>     > > batch job? I asked this question on the user alias but got no
> response
>     > > hence reposting here.
>     > >
>     > >
>     > >   *   Rajiv
>     > >
>     >
>
>
>

Re: Spark batch with Druid

Reply via email to