+ ▪ Mobile +
_
From: Steve Loughran [mailto:ste...@hortonworks.com]
Sent: Friday, January 27, 2017 3:20 AM
To: VND Tremblay, Paul
Cc: Neil Jonkers; Takeshi Yamamuro; user@spark.apache.org
Subject: Re: spark 2.02
ytics Specialist
THE BOSTON CONSULTING GROUP
Tel. + ▪ Mobile +
_
From: Neil Jonkers [mailto:neilod...@gmail.com]
Sent: Friday, January 20, 2017 11:39 AM
To: Steve Loughran; VND Tremblay, Paul
Cc: Takeshi Yam
What relational db are you using? We do this at work, and the way we handle it
is to unload the db into Spark (actually, we unload it to S3 and then into
Spark). Redshift is very efficient at dumlping tables this way.
___
Specialist
THE BOSTON CONSULTING GROUP
Tel. + ▪ Mobile +
_
From: Takeshi Yamamuro [mailto:linguin@gmail.com]
Sent: Thursday, January 19, 2017 9:27 PM
To: VND Tremblay, Paul
Cc: user@spark.apache.org
I have come across a problem when writing CSV files to S3 in Spark 2.02. The
problem does not exist in Spark 1.6.
19:09:20 Caused by: java.io.IOException: File already
exists:s3://stx-apollo-pr-datascience-internal/revenue_model/part-r-00025-c48a0d52-9600-4495-913c-64ae6bf888bd.csv
My code is