Hi,
My question is specifically about PR #29000 <https://github.com/apache/spark/pull/29000/files#r649580767> for SPARK-29302 <https://issues.apache.org/jira/browse/SPARK-29302>. To my understanding, the PR is to introduce a different staging directory at job commit to avoid commit collision. In SQLHadoopMapReduceCommitProtocol, the new staging directory is only set when SQLConf.OUTPUT_COMMITTER_CLASS is not null: code <https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/SQLHadoopMapReduceCommitProtocol.scala#L58>, and in current Spark repo, OUTPUT_COMMITTER_CLASS is set only for parquet formats: code <https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L96>. I didn't find similar behavior in Orc related code to set that config. If I understand it correctly, without setting SQLConf.OUTPUT_COMMITTER_CLASS properly (like for Orc format), SQLHadoopMapReduceCommitProtocol will still use the original staging directory, which may void the fix by the PR, in which case the commit collision may still happen, thus the fix now seems only effective for Parquet, but not for non-Parquet files. Could someone confirm if it is a potential problem or not? Am I missing something here? Thanks! Best Regards, Tony Zhang