Rahij Ramsharan created SPARK-29259: ---------------------------------------
Summary: Filesystem.exists is called even when not necessary for append save mode Key: SPARK-29259 URL: https://issues.apache.org/jira/browse/SPARK-29259 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.4 Reporter: Rahij Ramsharan When saving a dataframe into Hadoop ([https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala#L93]), spark first checks if the file exists before inspecting the SaveMode to determine if it should actually insert data. However, the pathExists variable is actually not used in the case of SaveMode.Append. In some file systems, the exists call can be expensive and hence this PR makes that call only when necessary. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org