Re: Spark streaming takes longer time to read json into dataframes

2016-07-19 Thread Diwakar Dhanuskodi
Martin Eden <martineden...@gmail.com> > Date:16/07/2016 14:01 (GMT+05:30) > To: Diwakar Dhanuskodi <diwakar.dhanusk...@gmail.com> > Cc: user <user@spark.apache.org> > Subject: Re: Spark streaming takes longer time to read json into dataframes > > Hi, > > I woul

Re: Spark streaming takes longer time to read json into dataframes

2016-07-19 Thread Diwakar Dhanuskodi
Martin Eden <martineden...@gmail.com> > Date:16/07/2016 14:01 (GMT+05:30) > To: Diwakar Dhanuskodi <diwakar.dhanusk...@gmail.com> > Cc: user <user@spark.apache.org> > Subject: Re: Spark streaming takes longer time to read json into dataframes > > Hi, > > I woul

Re: Spark streaming takes longer time to read json into dataframes

2016-07-19 Thread Cody Koeninger
ate:16/07/2016 14:01 (GMT+05:30) > To: Diwakar Dhanuskodi <diwakar.dhanusk...@gmail.com> > Cc: user <user@spark.apache.org> > Subject: Re: Spark streaming takes longer time to read json into dataframes > > Hi, > > I would just do a repartition on the initial direct D

Re: Spark streaming takes longer time to read json into dataframes

2016-07-17 Thread Diwakar Dhanuskodi
. Original message From: Martin Eden <martineden...@gmail.com> Date:16/07/2016 14:01 (GMT+05:30) To: Diwakar Dhanuskodi <diwakar.dhanusk...@gmail.com> Cc: user <user@spark.apache.org> Subject: Re: Spark streaming takes longer time to read json into dataframes Hi

Re: Spark streaming takes longer time to read json into dataframes

2016-07-16 Thread Martin Eden
at 5:26 AM, Diwakar Dhanuskodi < diwakar.dhanusk...@gmail.com> wrote: > > -- Forwarded message -- > From: Diwakar Dhanuskodi <diwakar.dhanusk...@gmail.com> > Date: Sat, Jul 16, 2016 at 9:30 AM > Subject: Re: Spark streaming takes longer time to read json

Fwd: Spark streaming takes longer time to read json into dataframes

2016-07-15 Thread Diwakar Dhanuskodi
-- Forwarded message -- From: Diwakar Dhanuskodi <diwakar.dhanusk...@gmail.com> Date: Sat, Jul 16, 2016 at 9:30 AM Subject: Re: Spark streaming takes longer time to read json into dataframes To: Jean Georges Perrin <j...@jgp.net> Hello, I need it on memory. Increa

Re: Spark streaming takes longer time to read json into dataframes

2016-07-15 Thread Jean Georges Perrin
Do you need it on disk or just push it to memory? Can you try to increase memory or # of cores (I know it sounds basic) > On Jul 15, 2016, at 11:43 PM, Diwakar Dhanuskodi > wrote: > > Hello, > > I have 400K json messages pulled from Kafka into spark streaming

Spark streaming takes longer time to read json into dataframes

2016-07-15 Thread Diwakar Dhanuskodi
Hello, I have 400K json messages pulled from Kafka into spark streaming using DirectStream approach. Size of 400K messages is around 5G. Kafka topic is single partitioned. I am using rdd.read.json(_._2) inside foreachRDD to convert rdd into dataframe. It takes almost 2.3 minutes to convert into