[ https://issues.apache.org/jira/browse/SPARK-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Patrick Wendell resolved SPARK-3595. ------------------------------------ Resolution: Fixed Fix Version/s: 1.2.0 Target Version/s: 1.2.0 Thanks I've merged this into master. We can consider merging this into 1.1 as well later on. I decided not to do that yet because we've often found that changes around Hadoop configurations can produce unanticipated regressions. So let's see how this fares in master and if there is lots of demand we can backport a fix once it's been stable in master for a while. > Spark should respect configured OutputCommitter when using saveAsHadoopFile > --------------------------------------------------------------------------- > > Key: SPARK-3595 > URL: https://issues.apache.org/jira/browse/SPARK-3595 > Project: Spark > Issue Type: Improvement > Affects Versions: 1.1.0 > Reporter: Ian Hummel > Assignee: Ian Hummel > Fix For: 1.2.0 > > > When calling {{saveAsHadoopFile}}, Spark hardcodes the OutputCommitter to be > a {{FileOutputCommitter}}. > When using Spark on an EMR cluster to process and write files to/from S3, the > default Hadoop configuration uses a {{DirectFileOutputCommitter}} to avoid > writing to a temporary directory and doing a copy. > Will submit a patch via GitHub shortly. > Cheers, -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org