Spark can convert an RDD of JSON strings into an RDD/DataFrame/DataSet of objects parsed from the JSON (something like `sparkSession.read.json(jsonStringRDD)`). You could hook this up to a Druid response, but I would definitely recommend looking through the code that Gian posted instead - it reads data from deep storage instead of sending an HTTP request to the Druid cluster and waiting for the response.
On Sat, Feb 9, 2019 at 5:02 PM Rajiv Mordani <rmord...@vmware.com.invalid> wrote: > Thanks Julian, > See some questions in-line: > > On 2/6/19, 3:01 PM, "Julian Jaffe" <jja...@pinterest.com.INVALID> wrote: > > I think this question is going the other way (e.g. how to read data > into > Spark, as opposed to into Druid). For that, the quickest and dirtiest > approach is probably to use Spark's json support to parse a Druid > response. > > [Rajiv] Can you please expand more here? > > You may also be able to repurpose some code from > > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FSparklineData%2Fspark-druid-olap&data=02%7C01%7Crmordani%40vmware.com%7Cdac469891e6143eb417208d68c87161c%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C636850909153478697&sdata=YwEJLohvwCI%2FGnjtlH%2BP6BgnLLketOJnhp8IGZey2d4%3D&reserved=0, > but I don't think > there's any official guidance on this. > > > > On Wed, Feb 6, 2019 at 2:21 PM Gian Merlino <g...@apache.org> wrote: > > > Hey Rajiv, > > > > There's an unofficial Druid/Spark adapter at: > > > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmetamx%2Fdruid-spark-batch&data=02%7C01%7Crmordani%40vmware.com%7Cdac469891e6143eb417208d68c87161c%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C636850909153478697&sdata=WnaiBpvr%2B4%2BrkFGZPhcZJ%2BpbrxkkzyAv8vi7cql5GZA%3D&reserved=0. > If you want to stick to > > official things, then the best approach would be to use Spark to > write data > > to HDFS or S3 and then ingest it into Druid using Druid's > Hadoop-based or > > native batch ingestion. (Or even write it to Kafka using Spark > Streaming > > and ingest from Kafka into Druid using Druid's Kafka indexing > service.) > > > > On Wed, Feb 6, 2019 at 12:04 PM Rajiv Mordani > <rmord...@vmware.com.invalid > > > > > wrote: > > > > > Is there a best practice for how to load data from druid to use in > a > > spark > > > batch job? I asked this question on the user alias but got no > response > > > hence reposting here. > > > > > > > > > * Rajiv > > > > > > > >