Re: [Spark SQL]: Issues with writing dataframe with Append Mode to Parquet

2016-01-12 Thread Michael Armbrust
There can be dataloss when you are using the DirectOutputCommitter and speculation is turned on, so we disable it automatically. On Tue, Jan 12, 2016 at 1:11 PM, Jerry Lam wrote: > Hi spark users and developers, > > I wonder if the following observed behaviour is expected.

[Spark SQL]: Issues with writing dataframe with Append Mode to Parquet

2016-01-12 Thread Jerry Lam
Hi spark users and developers, I wonder if the following observed behaviour is expected. I'm writing dataframe to parquet into s3. I'm using append mode when I'm writing to it. Since I'm using org.apache.spark.sql. parquet.DirectParquetOutputCommitter as the

Re: [Spark SQL]: Issues with writing dataframe with Append Mode to Parquet

2016-01-12 Thread Jerry Lam
Hi Michael, Thanks for the hint! So if I turn off speculation, consecutive appends like above will not produce temporary files right? Which class is responsible for disabling the use of DirectOutputCommitter? Thank you, Jerry On Tue, Jan 12, 2016 at 4:12 PM, Michael Armbrust