b...@amazon.com>
Cc: user <user@spark.apache.org>
Subject: Re: CSV write to S3 failing silently with partial completion
Hi,
Can you please let me know the following:
1. Why are you using JAVA?
2. The way you are creating the SPARK cluster
3. The way you are initiating SPARK session or cont
Hi,
Can you please let me know the following:
1. Why are you using JAVA?
2. The way you are creating the SPARK cluster
3. The way you are initiating SPARK session or context
4. Are you able to query the data that is written to S3 using a SPARK
dataframe and validate that the number of rows in the
On 7 Sep 2017, at 18:36, Mcclintic, Abbi
> wrote:
Thanks all – couple notes below.
Generally all our partitions are of equal size (ie on a normal day in this
particular case I see 10 equally sized partitions of 2.8 GB). We see the
problem with
) file? If you are sending to redshift, why not use the
JDBC driver?
-Original Message-
From: abbim [mailto:ab...@amazon.com]
Sent: Thursday, September 07, 2017 1:02 AM
To: user@spark.apache.org
Subject: CSV write to S3 failing silently wit
to redshift, why not use the JDBC
driver?
-Original Message-
From: abbim [mailto:ab...@amazon.com]
Sent: Thursday, September 07, 2017 1:02 AM
To: user@spark.apache.org
Subject: CSV write to S3 failing silently with partial completion
Hi all,
My team has b
...@amazon.com]
Sent: Thursday, September 07, 2017 1:02 AM
To: user@spark.apache.org
Subject: CSV write to S3 failing silently with partial completion
Hi all,
My team has been experiencing a recurring unpredictable bug where only a
partial write to CSV in S3 on one partition of our Dataset is performed
Hi all,
My team has been experiencing a recurring unpredictable bug where only a
partial write to CSV in S3 on one partition of our Dataset is performed. For
example, in a Dataset of 10 partitions written to CSV in S3, we might see 9
of the partitions as 2.8 GB in size, but one of them as 1.6 GB.