Hi spark users and developers, I wonder if the following observed behaviour is expected. I'm writing dataframe to parquet into s3. I'm using append mode when I'm writing to it. Since I'm using org.apache.spark.sql. parquet.DirectParquetOutputCommitter as the spark.sql.parquet.output.committer.class, I expected that no _temporary files will be generated.
I appended the same dataframe twice to the same directory. The first "append" works as expected; no _temporary files are generated because of the DirectParquetOutputCommitter but the second "append" does generate _temporary files and then it moved the files under the _temporary to the output directory. Is this behavior expected? Or is it a bug? I'm using Spark 1.5.2. Best Regards, Jerry