GitHub user steveloughran opened a pull request: https://github.com/apache/spark/pull/20824
With SPARK-20236, FileCommitProtocol.instantiate() looks for a three ⦠## What changes were proposed in this pull request? With SPARK-20236, `FileCommitProtocol.instantiate()` looks for a three argument constructor, passing in the `dynamicPartitionOverwrite` parameter. If there is no such constructor, it falls back to the classic two-arg one. When `InsertIntoHadoopFsRelationCommand` passes down that `dynamicPartitionOverwrite` flag `to FileCommitProtocol.instantiate(`), it assumes that the instantiated protocol supports the specific requirements of dynamic partition overwrite. It does not notice when this does not hold, and so the output generated may be incorrect. This patch changes `FileCommitProtocol.instantiate()` so when `dynamicPartitionOverwrite == true`, it requires the protocol implementation to have a 3-arg constructor. Classic two arg constructors are supported when it is false. Also it adds some debug level logging for anyone trying to understand what's going on. ## How was this patch tested? Unit tests verify that * classes with only 2-arg constructor cannot be used with dynamic overwrite * classes with only 2-arg constructor can be used without dynamic overwrite * classes with 3 arg constructors can be used with both. * the fallback to any two arg ctor takes place after the attempt to load the 3-arg ctor, * passing in invalid class types fail as expected (regression tests on expected behavior) You can merge this pull request into a Git repository by running: $ git pull https://github.com/steveloughran/spark stevel/SPARK-23683-protocol-instantiate Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20824.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20824 ---- commit 529db0851e50f5bf15cc71d400a030db4e696350 Author: Steve Loughran <stevel@...> Date: 2018-03-14T17:43:34Z With SPARK-20236, FileCommitProtocol.instantiate() looks for a three argument constructor, passing in the dynamicPartitionOverwrite parameter. If there is no such constructor, it falls back to the classic two-arg one. When InsertIntoHadoopFsRelationCommand passes down that dynamicPartitionOverwrite flag to FileCommitProtocol.instantiate(), it assumes that the instantiated protocol supports the specific requirements of dynamic partition overwrite. It does not notice when this does not hold, and so the output generated may be incorrect. This patch changes FileCommitProtocol.instantiate() so when dynamicPartitionOverwrite == true, it requires the protocol implementation to have a 3-arg constructor. Tests verify that * classes with only 2-arg constructor cannot be used with dynamic overwrite * classes with only 2-arg constructor can be used without dynamic overwrite * classes with 3 arg constructors can be used with both * the fallback to any two arg ctor takes place after the attempt to load the 3-arg ctor, * passing in invalid class types fail as expected (regression tests on expected behavior) Change-Id: I694868aecf865cfa552e031ea3f6dde8b600fa7b ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org