[Spark SQL]: Issues with writing dataframe with Append Mode to Parquet

Jerry Lam Tue, 12 Jan 2016 13:11:45 -0800

Hi spark users and developers,

I wonder if the following observed behaviour is expected. I'm writing
dataframe to parquet into s3. I'm using append mode when I'm writing to it.
Since I'm using org.apache.spark.sql.
parquet.DirectParquetOutputCommitter as
the spark.sql.parquet.output.committer.class, I expected that no _temporary
files will be generated.


I appended the same dataframe twice to the same directory. The first
"append" works as expected; no _temporary files are generated because of
the DirectParquetOutputCommitter but the second "append" does generate
_temporary files and then it moved the files under the _temporary to the
output directory.

Is this behavior expected? Or is it a bug?

I'm using Spark 1.5.2.

Best Regards,

Jerry

[Spark SQL]: Issues with writing dataframe with Append Mode to Parquet

Reply via email to