See this ticket https://issues.apache.org/jira/browse/HADOOP-17201. It may
help your team.
From: Johnny Burns
Sent: Tuesday, June 22, 2021 3:41 PM
To: user@spark.apache.org
Cc: data-orchestration-team
Subject: Performance Problems Migrating to S3A Committers
Thanks Johnny for sharing your experience. Have you tried to use S3A
committer? Looks like this one is introduced in the latest Hadoop for
solving problems with other committers.
https://hadoop.apache.org/docs/r3.1.1/hadoop-aws/tools/hadoop-aws/committers.html
- ND
On 6/22/21 6:41 PM,
Hello.
I’m Johnny, I work at Stripe. We’re heavy Spark users and we’ve been
exploring using s3 committers. Currently we first write the data to HDFS
and then upload it to S3. However, now with S3 offering strong consistency
guarantees, we are evaluating if we can write data directly to S3.
We’re