can I use ExectorService in my driver? was: is dataframe.write() async? Streaming performance problem

2016-07-08 Thread Andy Davidson
> -Original Message- > From: Cody Koeninger [mailto:c...@koeninger.org] > Sent: 08 July 2016 15:31 > To: Andy Davidson <a...@santacruzintegration.com> > Cc: user @spark <user@spark.apache.org> > Subject: Re: is dataframe.write() async? Streaming performance prob

RE: is dataframe.write() async? Streaming performance problem

2016-07-08 Thread Ewan Leith
Cc: user @spark <user@spark.apache.org> Subject: Re: is dataframe.write() async? Streaming performance problem Maybe obvious, but what happens when you change the s3 write to a println of all the data? That should identify whether it's the issue. count() and read.json() will involve addition

Re: is dataframe.write() async? Streaming performance problem

2016-07-08 Thread Cody Koeninger
Maybe obvious, but what happens when you change the s3 write to a println of all the data? That should identify whether it's the issue. count() and read.json() will involve additional tasks (run through the items in the rdd to count them, likewise to infer the schema) but for 300 records that

is dataframe.write() async? Streaming performance problem

2016-07-07 Thread Andy Davidson
I am running Spark 1.6.1 built for Hadoop 2.0.0-mr1-cdh4.2.0 and using kafka direct stream approach. I am running into performance problems. My processing time is > than my window size. Changing window sizes, adding cores and executor memory does not change performance. I am having a lot of