GitHub user lw-lin opened a pull request:

    https://github.com/apache/spark/pull/13507

    [SPARK-15765][SQL][Streaming] Make continuous Parquet writing consistent 
with non-consistent Parquet writing

    ## What changes were proposed in this pull request?
    
    Currently there are some code duplicates in continuous Parquet writing (as 
in Structured Streaming) and non-continuous batch writing; see 
[ParquetFileFormat#prepareWrite()](https://github.com/apache/spark/blob/431542765785304edb76a19885fbc5f9b8ae7d64/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L68)
 and 
[ParquetFileFormat#ParquetOutputWriterFactory](https://github.com/apache/spark/blob/431542765785304edb76a19885fbc5f9b8ae7d64/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L414).
    
    This may lead to inconsistent behavior, when we only change one piece of 
code but not the other.
    
    By extracting the common code out, this patch fixes the inconsistency. As a 
result, Structured Streaming now also enjoys 
[SPARK-15719](https://github.com/apache/spark/pull/13455).
    
    ## How was this patch tested?
    
    Just code refactoring without any logic change, this should be covered by 
existing suits.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/lw-lin/spark parquet-conf-deduplicate

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13507.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13507
    
----
commit 60a2c8ee7c610a783e65b78ac21e25661b84f49d
Author: Liwei Lin <lwl...@gmail.com>
Date:   2016-06-03T14:31:56Z

    Make continuous writing consistent with non-consistent writing

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to