subject:"Performance Problems Migrating to S3A Committers"

Re: Performance Problems Migrating to S3A Committers

2021-08-05 Thread James Yu

See this ticket https://issues.apache.org/jira/browse/HADOOP-17201. It may help your team. From: Johnny Burns Sent: Tuesday, June 22, 2021 3:41 PM To: user@spark.apache.org Cc: data-orchestration-team Subject: Performance Problems Migrating to S3A Committers

Re: Performance Problems Migrating to S3A Committers

2021-06-23 Thread Artemis User

Thanks Johnny for sharing your experience. Have you tried to use S3A committer? Looks like this one is introduced in the latest Hadoop for solving problems with other committers. https://hadoop.apache.org/docs/r3.1.1/hadoop-aws/tools/hadoop-aws/committers.html - ND On 6/22/21 6:41 PM,

Performance Problems Migrating to S3A Committers

2021-06-22 Thread Johnny Burns

Hello. I’m Johnny, I work at Stripe. We’re heavy Spark users and we’ve been exploring using s3 committers. Currently we first write the data to HDFS and then upload it to S3. However, now with S3 offering strong consistency guarantees, we are evaluating if we can write data directly to S3. We’re