There can be dataloss when you are using the DirectOutputCommitter and
speculation is turned on, so we disable it automatically.

On Tue, Jan 12, 2016 at 1:11 PM, Jerry Lam <chiling...@gmail.com> wrote:

> Hi spark users and developers,
>
> I wonder if the following observed behaviour is expected. I'm writing
> dataframe to parquet into s3. I'm using append mode when I'm writing to it.
> Since I'm using org.apache.spark.sql.
> parquet.DirectParquetOutputCommitter as
> the spark.sql.parquet.output.committer.class, I expected that no _temporary
> files will be generated.
>
> I appended the same dataframe twice to the same directory. The first
> "append" works as expected; no _temporary files are generated because of
> the DirectParquetOutputCommitter but the second "append" does generate
> _temporary files and then it moved the files under the _temporary to the
> output directory.
>
> Is this behavior expected? Or is it a bug?
>
> I'm using Spark 1.5.2.
>
> Best Regards,
>
> Jerry
>

Reply via email to