> -Original Message-
> From: Cody Koeninger [mailto:c...@koeninger.org]
> Sent: 08 July 2016 15:31
> To: Andy Davidson <a...@santacruzintegration.com>
> Cc: user @spark <user@spark.apache.org>
> Subject: Re: is dataframe.write() async? Streaming performance prob
Cc: user @spark <user@spark.apache.org>
Subject: Re: is dataframe.write() async? Streaming performance problem
Maybe obvious, but what happens when you change the s3 write to a println of
all the data? That should identify whether it's the issue.
count() and read.json() will involve addition
Maybe obvious, but what happens when you change the s3 write to a
println of all the data? That should identify whether it's the issue.
count() and read.json() will involve additional tasks (run through the
items in the rdd to count them, likewise to infer the schema) but for
300 records that
I am running Spark 1.6.1 built for Hadoop 2.0.0-mr1-cdh4.2.0 and using kafka
direct stream approach. I am running into performance problems. My
processing time is > than my window size. Changing window sizes, adding
cores and executor memory does not change performance. I am having a lot of