[ https://issues.apache.org/jira/browse/SPARK-29259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun resolved SPARK-29259. ----------------------------------- Fix Version/s: 3.0.0 Resolution: Fixed This is resolved via https://github.com/apache/spark/pull/25928 > Filesystem.exists is called even when not necessary for append save mode > ------------------------------------------------------------------------ > > Key: SPARK-29259 > URL: https://issues.apache.org/jira/browse/SPARK-29259 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.4.4 > Reporter: Rahij Ramsharan > Priority: Minor > Fix For: 3.0.0 > > > When saving a dataframe into Hadoop > ([https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala#L93]), > spark first checks if the file exists before inspecting the SaveMode to > determine if it should actually insert data. However, the pathExists variable > is actually not used in the case of SaveMode.Append. In some file systems, > the exists call can be expensive and hence this PR makes that call only when > necessary. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org