GitHub user steveloughran opened a pull request:

    https://github.com/apache/spark/pull/20824

    With SPARK-20236, FileCommitProtocol.instantiate() looks for a three …

    ## What changes were proposed in this pull request?
    
    With SPARK-20236, `FileCommitProtocol.instantiate()` looks for a three 
argument constructor, passing in the `dynamicPartitionOverwrite` parameter. If 
there is no such constructor, it falls back to the classic two-arg one.
    
    When `InsertIntoHadoopFsRelationCommand` passes down that 
`dynamicPartitionOverwrite` flag `to FileCommitProtocol.instantiate(`), it 
assumes that the instantiated protocol supports the specific requirements of 
dynamic partition overwrite. It does not notice when this does not hold, and so 
the output generated may be incorrect.
    
    This patch changes  `FileCommitProtocol.instantiate()` so  when 
`dynamicPartitionOverwrite == true`, it requires the protocol implementation to 
have a 3-arg constructor. Classic two arg constructors are supported when it is 
false.
    
    Also it adds some debug level logging for anyone trying to understand 
what's going on.
    
    ## How was this patch tested?
    
    Unit tests verify that 
    
    * classes with only 2-arg constructor cannot be used with dynamic overwrite
    * classes with only 2-arg constructor can be used without dynamic overwrite
    * classes with 3 arg constructors can be used with both.
    * the fallback to any two arg ctor takes place after the attempt to load 
the 3-arg ctor,
    * passing in invalid class types fail as expected (regression tests on 
expected behavior)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/steveloughran/spark 
stevel/SPARK-23683-protocol-instantiate

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20824.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20824
    
----
commit 529db0851e50f5bf15cc71d400a030db4e696350
Author: Steve Loughran <stevel@...>
Date:   2018-03-14T17:43:34Z

    With SPARK-20236, FileCommitProtocol.instantiate() looks for a three 
argument constructor, passing in the dynamicPartitionOverwrite parameter. If 
there is no such constructor, it falls back to the classic two-arg one.
    
    When InsertIntoHadoopFsRelationCommand passes down that 
dynamicPartitionOverwrite flag to FileCommitProtocol.instantiate(), it assumes 
that the instantiated protocol supports the specific requirements of dynamic 
partition overwrite. It does not notice when this does not hold, and so the 
output generated may be incorrect.
    
    This patch changes  FileCommitProtocol.instantiate() so  when 
dynamicPartitionOverwrite == true, it requires the protocol implementation to 
have a 3-arg constructor.
    
    Tests verify that
    
    * classes with only 2-arg constructor cannot be used with dynamic overwrite
    * classes with only 2-arg constructor can be used without dynamic overwrite
    * classes with 3 arg constructors can be used with both
    * the fallback to any two arg ctor takes place after the attempt to load 
the 3-arg ctor,
    * passing in invalid class types fail as expected (regression tests on 
expected behavior)
    
    Change-Id: I694868aecf865cfa552e031ea3f6dde8b600fa7b

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to