Martin Eden <martineden...@gmail.com>
> Date:16/07/2016 14:01 (GMT+05:30)
> To: Diwakar Dhanuskodi <diwakar.dhanusk...@gmail.com>
> Cc: user <user@spark.apache.org>
> Subject: Re: Spark streaming takes longer time to read json into dataframes
>
> Hi,
>
> I woul
Martin Eden <martineden...@gmail.com>
> Date:16/07/2016 14:01 (GMT+05:30)
> To: Diwakar Dhanuskodi <diwakar.dhanusk...@gmail.com>
> Cc: user <user@spark.apache.org>
> Subject: Re: Spark streaming takes longer time to read json into dataframes
>
> Hi,
>
> I woul
ate:16/07/2016 14:01 (GMT+05:30)
> To: Diwakar Dhanuskodi <diwakar.dhanusk...@gmail.com>
> Cc: user <user@spark.apache.org>
> Subject: Re: Spark streaming takes longer time to read json into dataframes
>
> Hi,
>
> I would just do a repartition on the initial direct D
.
Original message From: Martin Eden
<martineden...@gmail.com> Date:16/07/2016 14:01 (GMT+05:30)
To: Diwakar Dhanuskodi <diwakar.dhanusk...@gmail.com> Cc:
user <user@spark.apache.org> Subject: Re: Spark streaming takes
longer time to read json into dataframes
Hi
at 5:26 AM, Diwakar Dhanuskodi <
diwakar.dhanusk...@gmail.com> wrote:
>
> -- Forwarded message --
> From: Diwakar Dhanuskodi <diwakar.dhanusk...@gmail.com>
> Date: Sat, Jul 16, 2016 at 9:30 AM
> Subject: Re: Spark streaming takes longer time to read json
-- Forwarded message --
From: Diwakar Dhanuskodi <diwakar.dhanusk...@gmail.com>
Date: Sat, Jul 16, 2016 at 9:30 AM
Subject: Re: Spark streaming takes longer time to read json into dataframes
To: Jean Georges Perrin <j...@jgp.net>
Hello,
I need it on memory. Increa
Do you need it on disk or just push it to memory? Can you try to increase
memory or # of cores (I know it sounds basic)
> On Jul 15, 2016, at 11:43 PM, Diwakar Dhanuskodi
> wrote:
>
> Hello,
>
> I have 400K json messages pulled from Kafka into spark streaming
Hello,
I have 400K json messages pulled from Kafka into spark streaming using
DirectStream approach. Size of 400K messages is around 5G. Kafka topic is
single partitioned. I am using rdd.read.json(_._2) inside foreachRDD to
convert rdd into dataframe. It takes almost 2.3 minutes to convert into