RE: spark 2.02 error when writing to s3

2017-01-27 Thread VND Tremblay, Paul
+ ▪ Mobile + _ From: Steve Loughran [mailto:ste...@hortonworks.com] Sent: Friday, January 27, 2017 3:20 AM To: VND Tremblay, Paul Cc: Neil Jonkers; Takeshi Yamamuro; user@spark.apache.org Subject: Re: spark 2.02

RE: spark 2.02 error when writing to s3

2017-01-26 Thread VND Tremblay, Paul
Specialist THE BOSTON CONSULTING GROUP Tel. + ▪ Mobile + _ From: Neil Jonkers [mailto:neilod...@gmail.com] Sent: Friday, January 20, 2017 11:39 AM To: Steve Loughran; VND Tremblay, Paul Cc: Takeshi Yamamuro

RE: Ingesting Large csv File to relational database

2017-01-26 Thread VND Tremblay, Paul
What relational db are you using? We do this at work, and the way we handle it is to unload the db into Spark (actually, we unload it to S3 and then into Spark). Redshift is very efficient at dumlping tables this way.

RE: spark 2.02 error when writing to s3

2017-01-20 Thread VND Tremblay, Paul
Specialist THE BOSTON CONSULTING GROUP Tel. + ▪ Mobile + _ From: Takeshi Yamamuro [mailto:linguin@gmail.com] Sent: Thursday, January 19, 2017 9:27 PM To: VND Tremblay, Paul Cc: user@spark.apache.org

spark 2.02 error when writing to s3

2017-01-19 Thread VND Tremblay, Paul
I have come across a problem when writing CSV files to S3 in Spark 2.02. The problem does not exist in Spark 1.6. 19:09:20 Caused by: java.io.IOException: File already exists:s3://stx-apollo-pr-datascience-internal/revenue_model/part-r-00025-c48a0d52-9600-4495-913c-64ae6bf888bd.csv My code