I am running into serious performance problems with my spark 1.6 streaming
app. As it runs it gets slower and slower.

My app is simple. 

* It receives fairly large and complex JSON files. (twitter data)
* Converts the RDD to DataFrame
* Splits the data frame in to maybe 20 different data sets
* Writes each data set as JSON to s3
* Writing to S3 is really slow. I use an executorService to get the writes
to run in parallel

I found a lot of error log messages like the following error in my spark
streaming executor log files

Any suggestions?

Thanks

Andy

16/07/11 14:53:49 WARN FileOutputCommitter: Failed to delete the temporary
output directory of task: attempt_201607111453_128606_m_000000_0 -
s3n://com.xxx/json/yyy/2016-07-11/1468244820000/_temporary/_attempt_20160711
1453_128606_m_000000_0


Reply via email to