[spark] branch master updated (db74fd0d -> 11c6a23)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from db74fd0d [SPARK-32552][SQL][DOCS] Complete the documentation for Table-valued Function add 11c6a23 [SPARK-32352][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Exclude partition columns from data columns No new revisions were added by this update. Summary of changes: .../apache/spark/sql/execution/datasources/FileSourceStrategy.scala | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (db74fd0d -> 11c6a23)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from db74fd0d [SPARK-32552][SQL][DOCS] Complete the documentation for Table-valued Function add 11c6a23 [SPARK-32352][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Exclude partition columns from data columns No new revisions were added by this update. Summary of changes: .../apache/spark/sql/execution/datasources/FileSourceStrategy.scala | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (db74fd0d -> 11c6a23)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from db74fd0d [SPARK-32552][SQL][DOCS] Complete the documentation for Table-valued Function add 11c6a23 [SPARK-32352][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Exclude partition columns from data columns No new revisions were added by this update. Summary of changes: .../apache/spark/sql/execution/datasources/FileSourceStrategy.scala | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (db74fd0d -> 11c6a23)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from db74fd0d [SPARK-32552][SQL][DOCS] Complete the documentation for Table-valued Function add 11c6a23 [SPARK-32352][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Exclude partition columns from data columns No new revisions were added by this update. Summary of changes: .../apache/spark/sql/execution/datasources/FileSourceStrategy.scala | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (db74fd0d -> 11c6a23)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from db74fd0d [SPARK-32552][SQL][DOCS] Complete the documentation for Table-valued Function add 11c6a23 [SPARK-32352][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Exclude partition columns from data columns No new revisions were added by this update. Summary of changes: .../apache/spark/sql/execution/datasources/FileSourceStrategy.scala | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 8aa644e [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code 8aa644e is described below commit 8aa644e9a991cd7f965aec082adcc3a3d19d452f Author: Louiszr AuthorDate: Sun Aug 23 21:10:52 2020 -0700 [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code ### What changes were proposed in this pull request? - Removed `foldCol` related code introduced in #29445 which is causing issues in the base branch. - Fixed `CrossValidatorModel.copy()` so that it correctly calls `.copy()` on the models instead of lists of models. ### Why are the changes needed? - `foldCol` is from 3.1 hence causing tests to fail. - `CrossValidatorModel.copy()` is supposed to shallow copy models not lists of models. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Existing tests created in #29445 ran and passed. - Updated `test_copy` to make sure `copy()` is called on models instead of lists of models. Closes #29524 from Louiszr/remove-foldcol-3.0. Authored-by: Louiszr Signed-off-by: Huaxin Gao --- python/pyspark/ml/tests/test_tuning.py | 11 --- python/pyspark/ml/tuning.py| 7 --- 2 files changed, 8 insertions(+), 10 deletions(-) diff --git a/python/pyspark/ml/tests/test_tuning.py b/python/pyspark/ml/tests/test_tuning.py index b250740..b1acaf6 100644 --- a/python/pyspark/ml/tests/test_tuning.py +++ b/python/pyspark/ml/tests/test_tuning.py @@ -101,7 +101,6 @@ class CrossValidatorTests(SparkSessionTestCase): lambda x: x.getEstimator().uid, # SPARK-32092: CrossValidator.copy() needs to copy all existing params lambda x: x.getNumFolds(), -lambda x: x.getFoldCol(), lambda x: x.getCollectSubModels(), lambda x: x.getParallelism(), lambda x: x.getSeed() @@ -116,7 +115,6 @@ class CrossValidatorTests(SparkSessionTestCase): # SPARK-32092: CrossValidatorModel.copy() needs to copy all existing params for param in [ lambda x: x.getNumFolds(), -lambda x: x.getFoldCol(), lambda x: x.getSeed() ]: self.assertEqual(param(cvModel), param(cvModelCopied)) @@ -127,9 +125,9 @@ class CrossValidatorTests(SparkSessionTestCase): 'foo', "Changing the original avgMetrics should not affect the copied model" ) -cvModel.subModels[0] = 'foo' +cvModel.subModels[0][0].getInducedError = lambda: 'foo' self.assertNotEqual( -cvModelCopied.subModels[0], +cvModelCopied.subModels[0][0].getInducedError(), 'foo', "Changing the original subModels should not affect the copied model" ) @@ -224,7 +222,6 @@ class CrossValidatorTests(SparkSessionTestCase): loadedCvModel = CrossValidatorModel.load(cvModelPath) for param in [ lambda x: x.getNumFolds(), -lambda x: x.getFoldCol(), lambda x: x.getSeed(), lambda x: len(x.subModels) ]: @@ -780,9 +777,9 @@ class TrainValidationSplitTests(SparkSessionTestCase): 'foo', "Changing the original validationMetrics should not affect the copied model" ) -tvsModel.subModels[0] = 'foo' +tvsModel.subModels[0].getInducedError = lambda: 'foo' self.assertNotEqual( -tvsModelCopied.subModels[0], +tvsModelCopied.subModels[0].getInducedError(), 'foo', "Changing the original subModels should not affect the copied model" ) diff --git a/python/pyspark/ml/tuning.py b/python/pyspark/ml/tuning.py index 91f34ef..6283c8b 100644 --- a/python/pyspark/ml/tuning.py +++ b/python/pyspark/ml/tuning.py @@ -480,7 +480,10 @@ class CrossValidatorModel(Model, _CrossValidatorParams, MLReadable, MLWritable): extra = dict() bestModel = self.bestModel.copy(extra) avgMetrics = list(self.avgMetrics) -subModels = [model.copy() for model in self.subModels] +subModels = [ +[sub_model.copy() for sub_model in fold_sub_models] +for fold_sub_models in self.subModels +] return self._copyValues(CrossValidatorModel(bestModel, avgMetrics, subModels), extra=extra) @since("2.3.0") @@ -511,7 +514,6 @@ class CrossValidatorModel(Model, _CrossValidatorParams, MLReadable, MLWritable): "estimator": estimator, "estimatorParamMaps": epms, "numFolds": java_stage.getNumFolds(), -"foldCol": java_stage.getFoldCol(), "seed":
[spark] branch branch-3.0 updated: [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 8aa644e [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code 8aa644e is described below commit 8aa644e9a991cd7f965aec082adcc3a3d19d452f Author: Louiszr AuthorDate: Sun Aug 23 21:10:52 2020 -0700 [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code ### What changes were proposed in this pull request? - Removed `foldCol` related code introduced in #29445 which is causing issues in the base branch. - Fixed `CrossValidatorModel.copy()` so that it correctly calls `.copy()` on the models instead of lists of models. ### Why are the changes needed? - `foldCol` is from 3.1 hence causing tests to fail. - `CrossValidatorModel.copy()` is supposed to shallow copy models not lists of models. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Existing tests created in #29445 ran and passed. - Updated `test_copy` to make sure `copy()` is called on models instead of lists of models. Closes #29524 from Louiszr/remove-foldcol-3.0. Authored-by: Louiszr Signed-off-by: Huaxin Gao --- python/pyspark/ml/tests/test_tuning.py | 11 --- python/pyspark/ml/tuning.py| 7 --- 2 files changed, 8 insertions(+), 10 deletions(-) diff --git a/python/pyspark/ml/tests/test_tuning.py b/python/pyspark/ml/tests/test_tuning.py index b250740..b1acaf6 100644 --- a/python/pyspark/ml/tests/test_tuning.py +++ b/python/pyspark/ml/tests/test_tuning.py @@ -101,7 +101,6 @@ class CrossValidatorTests(SparkSessionTestCase): lambda x: x.getEstimator().uid, # SPARK-32092: CrossValidator.copy() needs to copy all existing params lambda x: x.getNumFolds(), -lambda x: x.getFoldCol(), lambda x: x.getCollectSubModels(), lambda x: x.getParallelism(), lambda x: x.getSeed() @@ -116,7 +115,6 @@ class CrossValidatorTests(SparkSessionTestCase): # SPARK-32092: CrossValidatorModel.copy() needs to copy all existing params for param in [ lambda x: x.getNumFolds(), -lambda x: x.getFoldCol(), lambda x: x.getSeed() ]: self.assertEqual(param(cvModel), param(cvModelCopied)) @@ -127,9 +125,9 @@ class CrossValidatorTests(SparkSessionTestCase): 'foo', "Changing the original avgMetrics should not affect the copied model" ) -cvModel.subModels[0] = 'foo' +cvModel.subModels[0][0].getInducedError = lambda: 'foo' self.assertNotEqual( -cvModelCopied.subModels[0], +cvModelCopied.subModels[0][0].getInducedError(), 'foo', "Changing the original subModels should not affect the copied model" ) @@ -224,7 +222,6 @@ class CrossValidatorTests(SparkSessionTestCase): loadedCvModel = CrossValidatorModel.load(cvModelPath) for param in [ lambda x: x.getNumFolds(), -lambda x: x.getFoldCol(), lambda x: x.getSeed(), lambda x: len(x.subModels) ]: @@ -780,9 +777,9 @@ class TrainValidationSplitTests(SparkSessionTestCase): 'foo', "Changing the original validationMetrics should not affect the copied model" ) -tvsModel.subModels[0] = 'foo' +tvsModel.subModels[0].getInducedError = lambda: 'foo' self.assertNotEqual( -tvsModelCopied.subModels[0], +tvsModelCopied.subModels[0].getInducedError(), 'foo', "Changing the original subModels should not affect the copied model" ) diff --git a/python/pyspark/ml/tuning.py b/python/pyspark/ml/tuning.py index 91f34ef..6283c8b 100644 --- a/python/pyspark/ml/tuning.py +++ b/python/pyspark/ml/tuning.py @@ -480,7 +480,10 @@ class CrossValidatorModel(Model, _CrossValidatorParams, MLReadable, MLWritable): extra = dict() bestModel = self.bestModel.copy(extra) avgMetrics = list(self.avgMetrics) -subModels = [model.copy() for model in self.subModels] +subModels = [ +[sub_model.copy() for sub_model in fold_sub_models] +for fold_sub_models in self.subModels +] return self._copyValues(CrossValidatorModel(bestModel, avgMetrics, subModels), extra=extra) @since("2.3.0") @@ -511,7 +514,6 @@ class CrossValidatorModel(Model, _CrossValidatorParams, MLReadable, MLWritable): "estimator": estimator, "estimatorParamMaps": epms, "numFolds": java_stage.getNumFolds(), -"foldCol": java_stage.getFoldCol(), "seed":
[spark] branch branch-3.0 updated: [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 8aa644e [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code 8aa644e is described below commit 8aa644e9a991cd7f965aec082adcc3a3d19d452f Author: Louiszr AuthorDate: Sun Aug 23 21:10:52 2020 -0700 [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code ### What changes were proposed in this pull request? - Removed `foldCol` related code introduced in #29445 which is causing issues in the base branch. - Fixed `CrossValidatorModel.copy()` so that it correctly calls `.copy()` on the models instead of lists of models. ### Why are the changes needed? - `foldCol` is from 3.1 hence causing tests to fail. - `CrossValidatorModel.copy()` is supposed to shallow copy models not lists of models. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Existing tests created in #29445 ran and passed. - Updated `test_copy` to make sure `copy()` is called on models instead of lists of models. Closes #29524 from Louiszr/remove-foldcol-3.0. Authored-by: Louiszr Signed-off-by: Huaxin Gao --- python/pyspark/ml/tests/test_tuning.py | 11 --- python/pyspark/ml/tuning.py| 7 --- 2 files changed, 8 insertions(+), 10 deletions(-) diff --git a/python/pyspark/ml/tests/test_tuning.py b/python/pyspark/ml/tests/test_tuning.py index b250740..b1acaf6 100644 --- a/python/pyspark/ml/tests/test_tuning.py +++ b/python/pyspark/ml/tests/test_tuning.py @@ -101,7 +101,6 @@ class CrossValidatorTests(SparkSessionTestCase): lambda x: x.getEstimator().uid, # SPARK-32092: CrossValidator.copy() needs to copy all existing params lambda x: x.getNumFolds(), -lambda x: x.getFoldCol(), lambda x: x.getCollectSubModels(), lambda x: x.getParallelism(), lambda x: x.getSeed() @@ -116,7 +115,6 @@ class CrossValidatorTests(SparkSessionTestCase): # SPARK-32092: CrossValidatorModel.copy() needs to copy all existing params for param in [ lambda x: x.getNumFolds(), -lambda x: x.getFoldCol(), lambda x: x.getSeed() ]: self.assertEqual(param(cvModel), param(cvModelCopied)) @@ -127,9 +125,9 @@ class CrossValidatorTests(SparkSessionTestCase): 'foo', "Changing the original avgMetrics should not affect the copied model" ) -cvModel.subModels[0] = 'foo' +cvModel.subModels[0][0].getInducedError = lambda: 'foo' self.assertNotEqual( -cvModelCopied.subModels[0], +cvModelCopied.subModels[0][0].getInducedError(), 'foo', "Changing the original subModels should not affect the copied model" ) @@ -224,7 +222,6 @@ class CrossValidatorTests(SparkSessionTestCase): loadedCvModel = CrossValidatorModel.load(cvModelPath) for param in [ lambda x: x.getNumFolds(), -lambda x: x.getFoldCol(), lambda x: x.getSeed(), lambda x: len(x.subModels) ]: @@ -780,9 +777,9 @@ class TrainValidationSplitTests(SparkSessionTestCase): 'foo', "Changing the original validationMetrics should not affect the copied model" ) -tvsModel.subModels[0] = 'foo' +tvsModel.subModels[0].getInducedError = lambda: 'foo' self.assertNotEqual( -tvsModelCopied.subModels[0], +tvsModelCopied.subModels[0].getInducedError(), 'foo', "Changing the original subModels should not affect the copied model" ) diff --git a/python/pyspark/ml/tuning.py b/python/pyspark/ml/tuning.py index 91f34ef..6283c8b 100644 --- a/python/pyspark/ml/tuning.py +++ b/python/pyspark/ml/tuning.py @@ -480,7 +480,10 @@ class CrossValidatorModel(Model, _CrossValidatorParams, MLReadable, MLWritable): extra = dict() bestModel = self.bestModel.copy(extra) avgMetrics = list(self.avgMetrics) -subModels = [model.copy() for model in self.subModels] +subModels = [ +[sub_model.copy() for sub_model in fold_sub_models] +for fold_sub_models in self.subModels +] return self._copyValues(CrossValidatorModel(bestModel, avgMetrics, subModels), extra=extra) @since("2.3.0") @@ -511,7 +514,6 @@ class CrossValidatorModel(Model, _CrossValidatorParams, MLReadable, MLWritable): "estimator": estimator, "estimatorParamMaps": epms, "numFolds": java_stage.getNumFolds(), -"foldCol": java_stage.getFoldCol(), "seed":
[spark] branch branch-3.0 updated: [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 8aa644e [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code 8aa644e is described below commit 8aa644e9a991cd7f965aec082adcc3a3d19d452f Author: Louiszr AuthorDate: Sun Aug 23 21:10:52 2020 -0700 [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code ### What changes were proposed in this pull request? - Removed `foldCol` related code introduced in #29445 which is causing issues in the base branch. - Fixed `CrossValidatorModel.copy()` so that it correctly calls `.copy()` on the models instead of lists of models. ### Why are the changes needed? - `foldCol` is from 3.1 hence causing tests to fail. - `CrossValidatorModel.copy()` is supposed to shallow copy models not lists of models. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Existing tests created in #29445 ran and passed. - Updated `test_copy` to make sure `copy()` is called on models instead of lists of models. Closes #29524 from Louiszr/remove-foldcol-3.0. Authored-by: Louiszr Signed-off-by: Huaxin Gao --- python/pyspark/ml/tests/test_tuning.py | 11 --- python/pyspark/ml/tuning.py| 7 --- 2 files changed, 8 insertions(+), 10 deletions(-) diff --git a/python/pyspark/ml/tests/test_tuning.py b/python/pyspark/ml/tests/test_tuning.py index b250740..b1acaf6 100644 --- a/python/pyspark/ml/tests/test_tuning.py +++ b/python/pyspark/ml/tests/test_tuning.py @@ -101,7 +101,6 @@ class CrossValidatorTests(SparkSessionTestCase): lambda x: x.getEstimator().uid, # SPARK-32092: CrossValidator.copy() needs to copy all existing params lambda x: x.getNumFolds(), -lambda x: x.getFoldCol(), lambda x: x.getCollectSubModels(), lambda x: x.getParallelism(), lambda x: x.getSeed() @@ -116,7 +115,6 @@ class CrossValidatorTests(SparkSessionTestCase): # SPARK-32092: CrossValidatorModel.copy() needs to copy all existing params for param in [ lambda x: x.getNumFolds(), -lambda x: x.getFoldCol(), lambda x: x.getSeed() ]: self.assertEqual(param(cvModel), param(cvModelCopied)) @@ -127,9 +125,9 @@ class CrossValidatorTests(SparkSessionTestCase): 'foo', "Changing the original avgMetrics should not affect the copied model" ) -cvModel.subModels[0] = 'foo' +cvModel.subModels[0][0].getInducedError = lambda: 'foo' self.assertNotEqual( -cvModelCopied.subModels[0], +cvModelCopied.subModels[0][0].getInducedError(), 'foo', "Changing the original subModels should not affect the copied model" ) @@ -224,7 +222,6 @@ class CrossValidatorTests(SparkSessionTestCase): loadedCvModel = CrossValidatorModel.load(cvModelPath) for param in [ lambda x: x.getNumFolds(), -lambda x: x.getFoldCol(), lambda x: x.getSeed(), lambda x: len(x.subModels) ]: @@ -780,9 +777,9 @@ class TrainValidationSplitTests(SparkSessionTestCase): 'foo', "Changing the original validationMetrics should not affect the copied model" ) -tvsModel.subModels[0] = 'foo' +tvsModel.subModels[0].getInducedError = lambda: 'foo' self.assertNotEqual( -tvsModelCopied.subModels[0], +tvsModelCopied.subModels[0].getInducedError(), 'foo', "Changing the original subModels should not affect the copied model" ) diff --git a/python/pyspark/ml/tuning.py b/python/pyspark/ml/tuning.py index 91f34ef..6283c8b 100644 --- a/python/pyspark/ml/tuning.py +++ b/python/pyspark/ml/tuning.py @@ -480,7 +480,10 @@ class CrossValidatorModel(Model, _CrossValidatorParams, MLReadable, MLWritable): extra = dict() bestModel = self.bestModel.copy(extra) avgMetrics = list(self.avgMetrics) -subModels = [model.copy() for model in self.subModels] +subModels = [ +[sub_model.copy() for sub_model in fold_sub_models] +for fold_sub_models in self.subModels +] return self._copyValues(CrossValidatorModel(bestModel, avgMetrics, subModels), extra=extra) @since("2.3.0") @@ -511,7 +514,6 @@ class CrossValidatorModel(Model, _CrossValidatorParams, MLReadable, MLWritable): "estimator": estimator, "estimatorParamMaps": epms, "numFolds": java_stage.getNumFolds(), -"foldCol": java_stage.getFoldCol(), "seed":
[spark] branch branch-3.0 updated (da60de5 -> 8aa644e)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from da60de5 [SPARK-32552][SQL][DOCS] Complete the documentation for Table-valued Function add 8aa644e [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code No new revisions were added by this update. Summary of changes: python/pyspark/ml/tests/test_tuning.py | 11 --- python/pyspark/ml/tuning.py| 7 --- 2 files changed, 8 insertions(+), 10 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ac520d4 -> 772c706)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ac520d4 [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans add 772c706 [SPARK-32648][SS] Remove unused DELETE_ACTION in FileStreamSinkLog No new revisions were added by this update. Summary of changes: .../sql/execution/streaming/FileStreamSinkLog.scala | 2 -- .../streaming/FileStreamSinkLogSuite.scala | 21 - 2 files changed, 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ac520d4 -> 772c706)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ac520d4 [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans add 772c706 [SPARK-32648][SS] Remove unused DELETE_ACTION in FileStreamSinkLog No new revisions were added by this update. Summary of changes: .../sql/execution/streaming/FileStreamSinkLog.scala | 2 -- .../streaming/FileStreamSinkLogSuite.scala | 21 - 2 files changed, 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (898211b -> da60de5)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 898211b [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse with DataSourceV2 add da60de5 [SPARK-32552][SQL][DOCS] Complete the documentation for Table-valued Function No new revisions were added by this update. Summary of changes: docs/sql-ref-syntax-qry-select-tvf.md | 99 --- 1 file changed, 80 insertions(+), 19 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b9585cd -> db74fd0d)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b9585cd [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse with DataSourceV2 add db74fd0d [SPARK-32552][SQL][DOCS] Complete the documentation for Table-valued Function No new revisions were added by this update. Summary of changes: docs/sql-ref-syntax-qry-select-tvf.md | 99 --- 1 file changed, 80 insertions(+), 19 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (898211b -> da60de5)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 898211b [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse with DataSourceV2 add da60de5 [SPARK-32552][SQL][DOCS] Complete the documentation for Table-valued Function No new revisions were added by this update. Summary of changes: docs/sql-ref-syntax-qry-select-tvf.md | 99 --- 1 file changed, 80 insertions(+), 19 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b9585cd -> db74fd0d)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b9585cd [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse with DataSourceV2 add db74fd0d [SPARK-32552][SQL][DOCS] Complete the documentation for Table-valued Function No new revisions were added by this update. Summary of changes: docs/sql-ref-syntax-qry-select-tvf.md | 99 --- 1 file changed, 80 insertions(+), 19 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse with DataSourceV2
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 898211b [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse with DataSourceV2 898211b is described below commit 898211b54e2c9e212f19d8bad6b7e91b66e5659a Author: mingjial AuthorDate: Sun Aug 23 17:40:59 2020 -0700 [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse with DataSourceV2 ### What changes were proposed in this pull request? Copy to master branch the unit test added for branch-2.4(https://github.com/apache/spark/pull/29430). ### Why are the changes needed? The unit test will pass at master branch, indicating that issue reported in https://issues.apache.org/jira/browse/SPARK-32609 is already fixed at master branch. But adding this unit test for future possible failure catch. ### Does this PR introduce _any_ user-facing change? no. ### How was this patch tested? sbt test run Closes #29435 from mingjialiu/master. Authored-by: mingjial Signed-off-by: Dongjoon Hyun (cherry picked from commit b9585cde31fe99aecca42146c71c552218cba591) Signed-off-by: Dongjoon Hyun --- .../spark/sql/connector/DataSourceV2Suite.scala | 19 +++ 1 file changed, 19 insertions(+) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala index 2d8761f..a9c521e 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala @@ -394,6 +394,25 @@ class DataSourceV2Suite extends QueryTest with SharedSparkSession with AdaptiveS checkAnswer(df, (0 until 3).map(i => Row(i))) } } + + test("SPARK-32609: DataSourceV2 with different pushedfilters should be different") { +def getScanExec(query: DataFrame): BatchScanExec = { + query.queryExecution.executedPlan.collect { +case d: BatchScanExec => d + }.head +} + +Seq(classOf[AdvancedDataSourceV2], classOf[JavaAdvancedDataSourceV2]).foreach { cls => + withClue(cls.getName) { +val df = spark.read.format(cls.getName).load() +val q1 = df.select('i).filter('i > 6) +val q2 = df.select('i).filter('i > 5) +val scan1 = getScanExec(q1) +val scan2 = getScanExec(q2) +assert(!scan1.equals(scan2)) + } +} + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32552][SQL][DOCS] Complete the documentation for Table-valued Function
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new da60de5 [SPARK-32552][SQL][DOCS] Complete the documentation for Table-valued Function da60de5 is described below commit da60de563a92bb85902681fb0569b43bbc489559 Author: Huaxin Gao AuthorDate: Mon Aug 24 09:43:41 2020 +0900 [SPARK-32552][SQL][DOCS] Complete the documentation for Table-valued Function # What changes were proposed in this pull request? There are two types of TVF. We only documented one type. Adding the doc for the 2nd type. ### Why are the changes needed? complete Table-valued Function doc ### Does this PR introduce _any_ user-facing change? https://user-images.githubusercontent.com/13592258/89595926-c5eae680-d80a-11ea-918b-0c3646f9930e.png;> https://user-images.githubusercontent.com/13592258/89595929-c84d4080-d80a-11ea-9803-30eb502ccd05.png;> https://user-images.githubusercontent.com/13592258/89595931-ca170400-d80a-11ea-8812-2f009746edac.png;> https://user-images.githubusercontent.com/13592258/89595934-cb483100-d80a-11ea-9e18-9357aa9f2c5c.png;> ### How was this patch tested? Manually build and check Closes #29355 from huaxingao/tvf. Authored-by: Huaxin Gao Signed-off-by: Takeshi Yamamuro (cherry picked from commit db74fd0d3320f120540133094a9975963941b98c) Signed-off-by: Takeshi Yamamuro --- docs/sql-ref-syntax-qry-select-tvf.md | 99 --- 1 file changed, 80 insertions(+), 19 deletions(-) diff --git a/docs/sql-ref-syntax-qry-select-tvf.md b/docs/sql-ref-syntax-qry-select-tvf.md index cc8d7c34..b04e2f5 100644 --- a/docs/sql-ref-syntax-qry-select-tvf.md +++ b/docs/sql-ref-syntax-qry-select-tvf.md @@ -21,28 +21,14 @@ license: | ### Description -A table-valued function (TVF) is a function that returns a relation or a set of rows. - -### Syntax - -```sql -function_name ( expression [ , ... ] ) [ table_alias ] -``` - -### Parameters - -* **expression** - -Specifies a combination of one or more values, operators and SQL functions that results in a value. - -* **table_alias** - -Specifies a temporary name with an optional column name list. - -**Syntax:** `[ AS ] table_name [ ( column_name [ , ... ] ) ]` +A table-valued function (TVF) is a function that returns a relation or a set of rows. There are two types of TVFs in Spark SQL: +1. a TVF that can be specified in a FROM clause, e.g. range; +2. a TVF that can be specified in SELECT/LATERAL VIEW clauses, e.g. explode. ### Supported Table-valued Functions + TVFs that can be specified in a FROM clause: + |Function|Argument Type(s)|Description| |||---| |**range** ( *end* )|Long|Creates a table with a single *LongType* column named *id*, containing rows in a range from 0 to *end* (exclusive) with step value 1.| @@ -50,6 +36,20 @@ function_name ( expression [ , ... ] ) [ table_alias ] |**range** ( *start, end, step* )|Long, Long, Long|Creates a table with a single *LongType* column named *id*, containing rows in a range from *start* to *end* (exclusive) with *step* value.| |**range** ( *start, end, step, numPartitions* )|Long, Long, Long, Int|Creates a table with a single *LongType* column named *id*, containing rows in a range from *start* to *end* (exclusive) with *step* value, with partition number *numPartitions* specified.| + TVFs that can be specified in SELECT/LATERAL VIEW clauses: + +|Function|Argument Type(s)|Description| +|||---| +|**explode** ( *expr* )|Array/Map|Separates the elements of array *expr* into multiple rows, or the elements of map *expr* into multiple rows and columns. Unless specified otherwise, uses the default column name col for elements of the array or key and value for the elements of the map.| +|**explode_outer** ( *expr* )|Array/Map|Separates the elements of array *expr* into multiple rows, or the elements of map *expr* into multiple rows and columns. Unless specified otherwise, uses the default column name col for elements of the array or key and value for the elements of the map.| +|**inline** ( *expr* )|Expression|Explodes an array of structs into a table. Uses column names col1, col2, etc. by default unless specified otherwise.| +|**inline_outer** ( *expr* )|Expression|Explodes an array of structs into a table. Uses column names col1, col2, etc. by default unless specified otherwise.| +|**posexplode** ( *expr* )|Array/Map|Separates the elements of array *expr* into multiple rows with positions, or the elements of map *expr* into multiple rows and columns with positions. Unless specified otherwise, uses the column name pos for position, col for elements of the array or key and value
[spark] branch master updated (772c706 -> 8749f2e)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 772c706 [SPARK-32648][SS] Remove unused DELETE_ACTION in FileStreamSinkLog add 8749f2e [SPARK-32675][MESOS] --py-files option is appended without passing value for it No new revisions were added by this update. Summary of changes: .../apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala | 4 +++- .../spark/scheduler/cluster/mesos/MesosClusterSchedulerSuite.scala| 2 +- 2 files changed, 4 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b9585cd -> db74fd0d)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b9585cd [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse with DataSourceV2 add db74fd0d [SPARK-32552][SQL][DOCS] Complete the documentation for Table-valued Function No new revisions were added by this update. Summary of changes: docs/sql-ref-syntax-qry-select-tvf.md | 99 --- 1 file changed, 80 insertions(+), 19 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (772c706 -> 8749f2e)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 772c706 [SPARK-32648][SS] Remove unused DELETE_ACTION in FileStreamSinkLog add 8749f2e [SPARK-32675][MESOS] --py-files option is appended without passing value for it No new revisions were added by this update. Summary of changes: .../apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala | 4 +++- .../spark/scheduler/cluster/mesos/MesosClusterSchedulerSuite.scala| 2 +- 2 files changed, 4 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse with DataSourceV2
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 898211b [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse with DataSourceV2 898211b is described below commit 898211b54e2c9e212f19d8bad6b7e91b66e5659a Author: mingjial AuthorDate: Sun Aug 23 17:40:59 2020 -0700 [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse with DataSourceV2 ### What changes were proposed in this pull request? Copy to master branch the unit test added for branch-2.4(https://github.com/apache/spark/pull/29430). ### Why are the changes needed? The unit test will pass at master branch, indicating that issue reported in https://issues.apache.org/jira/browse/SPARK-32609 is already fixed at master branch. But adding this unit test for future possible failure catch. ### Does this PR introduce _any_ user-facing change? no. ### How was this patch tested? sbt test run Closes #29435 from mingjialiu/master. Authored-by: mingjial Signed-off-by: Dongjoon Hyun (cherry picked from commit b9585cde31fe99aecca42146c71c552218cba591) Signed-off-by: Dongjoon Hyun --- .../spark/sql/connector/DataSourceV2Suite.scala | 19 +++ 1 file changed, 19 insertions(+) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala index 2d8761f..a9c521e 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala @@ -394,6 +394,25 @@ class DataSourceV2Suite extends QueryTest with SharedSparkSession with AdaptiveS checkAnswer(df, (0 until 3).map(i => Row(i))) } } + + test("SPARK-32609: DataSourceV2 with different pushedfilters should be different") { +def getScanExec(query: DataFrame): BatchScanExec = { + query.queryExecution.executedPlan.collect { +case d: BatchScanExec => d + }.head +} + +Seq(classOf[AdvancedDataSourceV2], classOf[JavaAdvancedDataSourceV2]).foreach { cls => + withClue(cls.getName) { +val df = spark.read.format(cls.getName).load() +val q1 = df.select('i).filter('i > 6) +val q2 = df.select('i).filter('i > 5) +val scan1 = getScanExec(q1) +val scan2 = getScanExec(q2) +assert(!scan1.equals(scan2)) + } +} + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8749f2e -> b9585cd)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8749f2e [SPARK-32675][MESOS] --py-files option is appended without passing value for it add b9585cd [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse with DataSourceV2 No new revisions were added by this update. Summary of changes: .../spark/sql/connector/DataSourceV2Suite.scala | 19 +++ 1 file changed, 19 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (772c706 -> 8749f2e)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 772c706 [SPARK-32648][SS] Remove unused DELETE_ACTION in FileStreamSinkLog add 8749f2e [SPARK-32675][MESOS] --py-files option is appended without passing value for it No new revisions were added by this update. Summary of changes: .../apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala | 4 +++- .../spark/scheduler/cluster/mesos/MesosClusterSchedulerSuite.scala| 2 +- 2 files changed, 4 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32552][SQL][DOCS] Complete the documentation for Table-valued Function
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new da60de5 [SPARK-32552][SQL][DOCS] Complete the documentation for Table-valued Function da60de5 is described below commit da60de563a92bb85902681fb0569b43bbc489559 Author: Huaxin Gao AuthorDate: Mon Aug 24 09:43:41 2020 +0900 [SPARK-32552][SQL][DOCS] Complete the documentation for Table-valued Function # What changes were proposed in this pull request? There are two types of TVF. We only documented one type. Adding the doc for the 2nd type. ### Why are the changes needed? complete Table-valued Function doc ### Does this PR introduce _any_ user-facing change? https://user-images.githubusercontent.com/13592258/89595926-c5eae680-d80a-11ea-918b-0c3646f9930e.png;> https://user-images.githubusercontent.com/13592258/89595929-c84d4080-d80a-11ea-9803-30eb502ccd05.png;> https://user-images.githubusercontent.com/13592258/89595931-ca170400-d80a-11ea-8812-2f009746edac.png;> https://user-images.githubusercontent.com/13592258/89595934-cb483100-d80a-11ea-9e18-9357aa9f2c5c.png;> ### How was this patch tested? Manually build and check Closes #29355 from huaxingao/tvf. Authored-by: Huaxin Gao Signed-off-by: Takeshi Yamamuro (cherry picked from commit db74fd0d3320f120540133094a9975963941b98c) Signed-off-by: Takeshi Yamamuro --- docs/sql-ref-syntax-qry-select-tvf.md | 99 --- 1 file changed, 80 insertions(+), 19 deletions(-) diff --git a/docs/sql-ref-syntax-qry-select-tvf.md b/docs/sql-ref-syntax-qry-select-tvf.md index cc8d7c34..b04e2f5 100644 --- a/docs/sql-ref-syntax-qry-select-tvf.md +++ b/docs/sql-ref-syntax-qry-select-tvf.md @@ -21,28 +21,14 @@ license: | ### Description -A table-valued function (TVF) is a function that returns a relation or a set of rows. - -### Syntax - -```sql -function_name ( expression [ , ... ] ) [ table_alias ] -``` - -### Parameters - -* **expression** - -Specifies a combination of one or more values, operators and SQL functions that results in a value. - -* **table_alias** - -Specifies a temporary name with an optional column name list. - -**Syntax:** `[ AS ] table_name [ ( column_name [ , ... ] ) ]` +A table-valued function (TVF) is a function that returns a relation or a set of rows. There are two types of TVFs in Spark SQL: +1. a TVF that can be specified in a FROM clause, e.g. range; +2. a TVF that can be specified in SELECT/LATERAL VIEW clauses, e.g. explode. ### Supported Table-valued Functions + TVFs that can be specified in a FROM clause: + |Function|Argument Type(s)|Description| |||---| |**range** ( *end* )|Long|Creates a table with a single *LongType* column named *id*, containing rows in a range from 0 to *end* (exclusive) with step value 1.| @@ -50,6 +36,20 @@ function_name ( expression [ , ... ] ) [ table_alias ] |**range** ( *start, end, step* )|Long, Long, Long|Creates a table with a single *LongType* column named *id*, containing rows in a range from *start* to *end* (exclusive) with *step* value.| |**range** ( *start, end, step, numPartitions* )|Long, Long, Long, Int|Creates a table with a single *LongType* column named *id*, containing rows in a range from *start* to *end* (exclusive) with *step* value, with partition number *numPartitions* specified.| + TVFs that can be specified in SELECT/LATERAL VIEW clauses: + +|Function|Argument Type(s)|Description| +|||---| +|**explode** ( *expr* )|Array/Map|Separates the elements of array *expr* into multiple rows, or the elements of map *expr* into multiple rows and columns. Unless specified otherwise, uses the default column name col for elements of the array or key and value for the elements of the map.| +|**explode_outer** ( *expr* )|Array/Map|Separates the elements of array *expr* into multiple rows, or the elements of map *expr* into multiple rows and columns. Unless specified otherwise, uses the default column name col for elements of the array or key and value for the elements of the map.| +|**inline** ( *expr* )|Expression|Explodes an array of structs into a table. Uses column names col1, col2, etc. by default unless specified otherwise.| +|**inline_outer** ( *expr* )|Expression|Explodes an array of structs into a table. Uses column names col1, col2, etc. by default unless specified otherwise.| +|**posexplode** ( *expr* )|Array/Map|Separates the elements of array *expr* into multiple rows with positions, or the elements of map *expr* into multiple rows and columns with positions. Unless specified otherwise, uses the column name pos for position, col for elements of the array or key and value
[spark] branch master updated (772c706 -> 8749f2e)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 772c706 [SPARK-32648][SS] Remove unused DELETE_ACTION in FileStreamSinkLog add 8749f2e [SPARK-32675][MESOS] --py-files option is appended without passing value for it No new revisions were added by this update. Summary of changes: .../apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala | 4 +++- .../spark/scheduler/cluster/mesos/MesosClusterSchedulerSuite.scala| 2 +- 2 files changed, 4 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (772c706 -> 8749f2e)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 772c706 [SPARK-32648][SS] Remove unused DELETE_ACTION in FileStreamSinkLog add 8749f2e [SPARK-32675][MESOS] --py-files option is appended without passing value for it No new revisions were added by this update. Summary of changes: .../apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala | 4 +++- .../spark/scheduler/cluster/mesos/MesosClusterSchedulerSuite.scala| 2 +- 2 files changed, 4 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse with DataSourceV2
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 898211b [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse with DataSourceV2 898211b is described below commit 898211b54e2c9e212f19d8bad6b7e91b66e5659a Author: mingjial AuthorDate: Sun Aug 23 17:40:59 2020 -0700 [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse with DataSourceV2 ### What changes were proposed in this pull request? Copy to master branch the unit test added for branch-2.4(https://github.com/apache/spark/pull/29430). ### Why are the changes needed? The unit test will pass at master branch, indicating that issue reported in https://issues.apache.org/jira/browse/SPARK-32609 is already fixed at master branch. But adding this unit test for future possible failure catch. ### Does this PR introduce _any_ user-facing change? no. ### How was this patch tested? sbt test run Closes #29435 from mingjialiu/master. Authored-by: mingjial Signed-off-by: Dongjoon Hyun (cherry picked from commit b9585cde31fe99aecca42146c71c552218cba591) Signed-off-by: Dongjoon Hyun --- .../spark/sql/connector/DataSourceV2Suite.scala | 19 +++ 1 file changed, 19 insertions(+) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala index 2d8761f..a9c521e 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala @@ -394,6 +394,25 @@ class DataSourceV2Suite extends QueryTest with SharedSparkSession with AdaptiveS checkAnswer(df, (0 until 3).map(i => Row(i))) } } + + test("SPARK-32609: DataSourceV2 with different pushedfilters should be different") { +def getScanExec(query: DataFrame): BatchScanExec = { + query.queryExecution.executedPlan.collect { +case d: BatchScanExec => d + }.head +} + +Seq(classOf[AdvancedDataSourceV2], classOf[JavaAdvancedDataSourceV2]).foreach { cls => + withClue(cls.getName) { +val df = spark.read.format(cls.getName).load() +val q1 = df.select('i).filter('i > 6) +val q2 = df.select('i).filter('i > 5) +val scan1 = getScanExec(q1) +val scan2 = getScanExec(q2) +assert(!scan1.equals(scan2)) + } +} + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b9585cd -> db74fd0d)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b9585cd [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse with DataSourceV2 add db74fd0d [SPARK-32552][SQL][DOCS] Complete the documentation for Table-valued Function No new revisions were added by this update. Summary of changes: docs/sql-ref-syntax-qry-select-tvf.md | 99 --- 1 file changed, 80 insertions(+), 19 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse with DataSourceV2
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 898211b [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse with DataSourceV2 898211b is described below commit 898211b54e2c9e212f19d8bad6b7e91b66e5659a Author: mingjial AuthorDate: Sun Aug 23 17:40:59 2020 -0700 [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse with DataSourceV2 ### What changes were proposed in this pull request? Copy to master branch the unit test added for branch-2.4(https://github.com/apache/spark/pull/29430). ### Why are the changes needed? The unit test will pass at master branch, indicating that issue reported in https://issues.apache.org/jira/browse/SPARK-32609 is already fixed at master branch. But adding this unit test for future possible failure catch. ### Does this PR introduce _any_ user-facing change? no. ### How was this patch tested? sbt test run Closes #29435 from mingjialiu/master. Authored-by: mingjial Signed-off-by: Dongjoon Hyun (cherry picked from commit b9585cde31fe99aecca42146c71c552218cba591) Signed-off-by: Dongjoon Hyun --- .../spark/sql/connector/DataSourceV2Suite.scala | 19 +++ 1 file changed, 19 insertions(+) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala index 2d8761f..a9c521e 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala @@ -394,6 +394,25 @@ class DataSourceV2Suite extends QueryTest with SharedSparkSession with AdaptiveS checkAnswer(df, (0 until 3).map(i => Row(i))) } } + + test("SPARK-32609: DataSourceV2 with different pushedfilters should be different") { +def getScanExec(query: DataFrame): BatchScanExec = { + query.queryExecution.executedPlan.collect { +case d: BatchScanExec => d + }.head +} + +Seq(classOf[AdvancedDataSourceV2], classOf[JavaAdvancedDataSourceV2]).foreach { cls => + withClue(cls.getName) { +val df = spark.read.format(cls.getName).load() +val q1 = df.select('i).filter('i > 6) +val q2 = df.select('i).filter('i > 5) +val scan1 = getScanExec(q1) +val scan2 = getScanExec(q2) +assert(!scan1.equals(scan2)) + } +} + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8749f2e -> b9585cd)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8749f2e [SPARK-32675][MESOS] --py-files option is appended without passing value for it add b9585cd [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse with DataSourceV2 No new revisions were added by this update. Summary of changes: .../spark/sql/connector/DataSourceV2Suite.scala | 19 +++ 1 file changed, 19 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8749f2e -> b9585cd)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8749f2e [SPARK-32675][MESOS] --py-files option is appended without passing value for it add b9585cd [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse with DataSourceV2 No new revisions were added by this update. Summary of changes: .../spark/sql/connector/DataSourceV2Suite.scala | 19 +++ 1 file changed, 19 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ac520d4 -> 772c706)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ac520d4 [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans add 772c706 [SPARK-32648][SS] Remove unused DELETE_ACTION in FileStreamSinkLog No new revisions were added by this update. Summary of changes: .../sql/execution/streaming/FileStreamSinkLog.scala | 2 -- .../streaming/FileStreamSinkLogSuite.scala | 21 - 2 files changed, 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse with DataSourceV2
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 898211b [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse with DataSourceV2 898211b is described below commit 898211b54e2c9e212f19d8bad6b7e91b66e5659a Author: mingjial AuthorDate: Sun Aug 23 17:40:59 2020 -0700 [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse with DataSourceV2 ### What changes were proposed in this pull request? Copy to master branch the unit test added for branch-2.4(https://github.com/apache/spark/pull/29430). ### Why are the changes needed? The unit test will pass at master branch, indicating that issue reported in https://issues.apache.org/jira/browse/SPARK-32609 is already fixed at master branch. But adding this unit test for future possible failure catch. ### Does this PR introduce _any_ user-facing change? no. ### How was this patch tested? sbt test run Closes #29435 from mingjialiu/master. Authored-by: mingjial Signed-off-by: Dongjoon Hyun (cherry picked from commit b9585cde31fe99aecca42146c71c552218cba591) Signed-off-by: Dongjoon Hyun --- .../spark/sql/connector/DataSourceV2Suite.scala | 19 +++ 1 file changed, 19 insertions(+) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala index 2d8761f..a9c521e 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala @@ -394,6 +394,25 @@ class DataSourceV2Suite extends QueryTest with SharedSparkSession with AdaptiveS checkAnswer(df, (0 until 3).map(i => Row(i))) } } + + test("SPARK-32609: DataSourceV2 with different pushedfilters should be different") { +def getScanExec(query: DataFrame): BatchScanExec = { + query.queryExecution.executedPlan.collect { +case d: BatchScanExec => d + }.head +} + +Seq(classOf[AdvancedDataSourceV2], classOf[JavaAdvancedDataSourceV2]).foreach { cls => + withClue(cls.getName) { +val df = spark.read.format(cls.getName).load() +val q1 = df.select('i).filter('i > 6) +val q2 = df.select('i).filter('i > 5) +val scan1 = getScanExec(q1) +val scan2 = getScanExec(q2) +assert(!scan1.equals(scan2)) + } +} + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8749f2e -> b9585cd)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8749f2e [SPARK-32675][MESOS] --py-files option is appended without passing value for it add b9585cd [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse with DataSourceV2 No new revisions were added by this update. Summary of changes: .../spark/sql/connector/DataSourceV2Suite.scala | 19 +++ 1 file changed, 19 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32552][SQL][DOCS] Complete the documentation for Table-valued Function
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new da60de5 [SPARK-32552][SQL][DOCS] Complete the documentation for Table-valued Function da60de5 is described below commit da60de563a92bb85902681fb0569b43bbc489559 Author: Huaxin Gao AuthorDate: Mon Aug 24 09:43:41 2020 +0900 [SPARK-32552][SQL][DOCS] Complete the documentation for Table-valued Function # What changes were proposed in this pull request? There are two types of TVF. We only documented one type. Adding the doc for the 2nd type. ### Why are the changes needed? complete Table-valued Function doc ### Does this PR introduce _any_ user-facing change? https://user-images.githubusercontent.com/13592258/89595926-c5eae680-d80a-11ea-918b-0c3646f9930e.png;> https://user-images.githubusercontent.com/13592258/89595929-c84d4080-d80a-11ea-9803-30eb502ccd05.png;> https://user-images.githubusercontent.com/13592258/89595931-ca170400-d80a-11ea-8812-2f009746edac.png;> https://user-images.githubusercontent.com/13592258/89595934-cb483100-d80a-11ea-9e18-9357aa9f2c5c.png;> ### How was this patch tested? Manually build and check Closes #29355 from huaxingao/tvf. Authored-by: Huaxin Gao Signed-off-by: Takeshi Yamamuro (cherry picked from commit db74fd0d3320f120540133094a9975963941b98c) Signed-off-by: Takeshi Yamamuro --- docs/sql-ref-syntax-qry-select-tvf.md | 99 --- 1 file changed, 80 insertions(+), 19 deletions(-) diff --git a/docs/sql-ref-syntax-qry-select-tvf.md b/docs/sql-ref-syntax-qry-select-tvf.md index cc8d7c34..b04e2f5 100644 --- a/docs/sql-ref-syntax-qry-select-tvf.md +++ b/docs/sql-ref-syntax-qry-select-tvf.md @@ -21,28 +21,14 @@ license: | ### Description -A table-valued function (TVF) is a function that returns a relation or a set of rows. - -### Syntax - -```sql -function_name ( expression [ , ... ] ) [ table_alias ] -``` - -### Parameters - -* **expression** - -Specifies a combination of one or more values, operators and SQL functions that results in a value. - -* **table_alias** - -Specifies a temporary name with an optional column name list. - -**Syntax:** `[ AS ] table_name [ ( column_name [ , ... ] ) ]` +A table-valued function (TVF) is a function that returns a relation or a set of rows. There are two types of TVFs in Spark SQL: +1. a TVF that can be specified in a FROM clause, e.g. range; +2. a TVF that can be specified in SELECT/LATERAL VIEW clauses, e.g. explode. ### Supported Table-valued Functions + TVFs that can be specified in a FROM clause: + |Function|Argument Type(s)|Description| |||---| |**range** ( *end* )|Long|Creates a table with a single *LongType* column named *id*, containing rows in a range from 0 to *end* (exclusive) with step value 1.| @@ -50,6 +36,20 @@ function_name ( expression [ , ... ] ) [ table_alias ] |**range** ( *start, end, step* )|Long, Long, Long|Creates a table with a single *LongType* column named *id*, containing rows in a range from *start* to *end* (exclusive) with *step* value.| |**range** ( *start, end, step, numPartitions* )|Long, Long, Long, Int|Creates a table with a single *LongType* column named *id*, containing rows in a range from *start* to *end* (exclusive) with *step* value, with partition number *numPartitions* specified.| + TVFs that can be specified in SELECT/LATERAL VIEW clauses: + +|Function|Argument Type(s)|Description| +|||---| +|**explode** ( *expr* )|Array/Map|Separates the elements of array *expr* into multiple rows, or the elements of map *expr* into multiple rows and columns. Unless specified otherwise, uses the default column name col for elements of the array or key and value for the elements of the map.| +|**explode_outer** ( *expr* )|Array/Map|Separates the elements of array *expr* into multiple rows, or the elements of map *expr* into multiple rows and columns. Unless specified otherwise, uses the default column name col for elements of the array or key and value for the elements of the map.| +|**inline** ( *expr* )|Expression|Explodes an array of structs into a table. Uses column names col1, col2, etc. by default unless specified otherwise.| +|**inline_outer** ( *expr* )|Expression|Explodes an array of structs into a table. Uses column names col1, col2, etc. by default unless specified otherwise.| +|**posexplode** ( *expr* )|Array/Map|Separates the elements of array *expr* into multiple rows with positions, or the elements of map *expr* into multiple rows and columns with positions. Unless specified otherwise, uses the column name pos for position, col for elements of the array or key and value
[spark] branch master updated (ac520d4 -> 772c706)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ac520d4 [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans add 772c706 [SPARK-32648][SS] Remove unused DELETE_ACTION in FileStreamSinkLog No new revisions were added by this update. Summary of changes: .../sql/execution/streaming/FileStreamSinkLog.scala | 2 -- .../streaming/FileStreamSinkLogSuite.scala | 21 - 2 files changed, 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ac520d4 -> 772c706)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ac520d4 [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans add 772c706 [SPARK-32648][SS] Remove unused DELETE_ACTION in FileStreamSinkLog No new revisions were added by this update. Summary of changes: .../sql/execution/streaming/FileStreamSinkLog.scala | 2 -- .../streaming/FileStreamSinkLogSuite.scala | 21 - 2 files changed, 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b9585cd -> db74fd0d)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b9585cd [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse with DataSourceV2 add db74fd0d [SPARK-32552][SQL][DOCS] Complete the documentation for Table-valued Function No new revisions were added by this update. Summary of changes: docs/sql-ref-syntax-qry-select-tvf.md | 99 --- 1 file changed, 80 insertions(+), 19 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8749f2e -> b9585cd)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8749f2e [SPARK-32675][MESOS] --py-files option is appended without passing value for it add b9585cd [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse with DataSourceV2 No new revisions were added by this update. Summary of changes: .../spark/sql/connector/DataSourceV2Suite.scala | 19 +++ 1 file changed, 19 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (1c798f9 -> ac520d4)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 1c798f9 [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] Override `get()` and use Julian days in `DaysWritable` add ac520d4 [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans No new revisions were added by this update. Summary of changes: .../spark/ml/clustering/BisectingKMeans.scala | 33 ++- .../org/apache/spark/ml/clustering/KMeans.scala| 33 ++- .../spark/mllib/clustering/BisectingKMeans.scala | 47 ++ .../org/apache/spark/mllib/clustering/KMeans.scala | 29 +++-- 4 files changed, 59 insertions(+), 83 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (1c798f9 -> ac520d4)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 1c798f9 [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] Override `get()` and use Julian days in `DaysWritable` add ac520d4 [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans No new revisions were added by this update. Summary of changes: .../spark/ml/clustering/BisectingKMeans.scala | 33 ++- .../org/apache/spark/ml/clustering/KMeans.scala| 33 ++- .../spark/mllib/clustering/BisectingKMeans.scala | 47 ++ .../org/apache/spark/mllib/clustering/KMeans.scala | 29 +++-- 4 files changed, 59 insertions(+), 83 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (1c798f9 -> ac520d4)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 1c798f9 [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] Override `get()` and use Julian days in `DaysWritable` add ac520d4 [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans No new revisions were added by this update. Summary of changes: .../spark/ml/clustering/BisectingKMeans.scala | 33 ++- .../org/apache/spark/ml/clustering/KMeans.scala| 33 ++- .../spark/mllib/clustering/BisectingKMeans.scala | 47 ++ .../org/apache/spark/mllib/clustering/KMeans.scala | 29 +++-- 4 files changed, 59 insertions(+), 83 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (1c798f9 -> ac520d4)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 1c798f9 [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] Override `get()` and use Julian days in `DaysWritable` add ac520d4 [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans No new revisions were added by this update. Summary of changes: .../spark/ml/clustering/BisectingKMeans.scala | 33 ++- .../org/apache/spark/ml/clustering/KMeans.scala| 33 ++- .../spark/mllib/clustering/BisectingKMeans.scala | 47 ++ .../org/apache/spark/mllib/clustering/KMeans.scala | 29 +++-- 4 files changed, 59 insertions(+), 83 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ac520d4 [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans ac520d4 is described below commit ac520d4a7c40a1d67358ee64af26e7f73face448 Author: zhengruifeng AuthorDate: Sun Aug 23 17:14:40 2020 -0500 [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans ### What changes were proposed in this pull request? Fix double caching in KMeans/BiKMeans: 1, let the callers of `runWithWeight` to pass whether `handlePersistence` is needed; 2, persist and unpersist inside of `runWithWeight`; 3, persist the `norms` if needed according to the comments; ### Why are the changes needed? avoid double caching ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing testsuites Closes #29501 from zhengruifeng/kmeans_handlePersistence. Authored-by: zhengruifeng Signed-off-by: Sean Owen --- .../spark/ml/clustering/BisectingKMeans.scala | 33 ++- .../org/apache/spark/ml/clustering/KMeans.scala| 33 ++- .../spark/mllib/clustering/BisectingKMeans.scala | 47 ++ .../org/apache/spark/mllib/clustering/KMeans.scala | 29 +++-- 4 files changed, 59 insertions(+), 83 deletions(-) diff --git a/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala b/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala index 5a60bed..061091c 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala @@ -29,9 +29,8 @@ import org.apache.spark.ml.util._ import org.apache.spark.ml.util.Instrumentation.instrumented import org.apache.spark.mllib.clustering.{BisectingKMeans => MLlibBisectingKMeans, BisectingKMeansModel => MLlibBisectingKMeansModel} -import org.apache.spark.mllib.linalg.{Vector => OldVector, Vectors => OldVectors} +import org.apache.spark.mllib.linalg.{Vectors => OldVectors} import org.apache.spark.mllib.linalg.VectorImplicits._ -import org.apache.spark.rdd.RDD import org.apache.spark.sql.{DataFrame, Dataset, Row} import org.apache.spark.sql.functions._ import org.apache.spark.sql.types.{DoubleType, IntegerType, StructType} @@ -276,21 +275,6 @@ class BisectingKMeans @Since("2.0.0") ( override def fit(dataset: Dataset[_]): BisectingKMeansModel = instrumented { instr => transformSchema(dataset.schema, logging = true) -val handlePersistence = dataset.storageLevel == StorageLevel.NONE -val w = if (isDefined(weightCol) && $(weightCol).nonEmpty) { - checkNonNegativeWeight(col($(weightCol)).cast(DoubleType)) -} else { - lit(1.0) -} - -val instances: RDD[(OldVector, Double)] = dataset - .select(DatasetUtils.columnToVector(dataset, getFeaturesCol), w).rdd.map { - case Row(point: Vector, weight: Double) => (OldVectors.fromML(point), weight) -} -if (handlePersistence) { - instances.persist(StorageLevel.MEMORY_AND_DISK) -} - instr.logPipelineStage(this) instr.logDataset(dataset) instr.logParams(this, featuresCol, predictionCol, k, maxIter, seed, @@ -302,11 +286,18 @@ class BisectingKMeans @Since("2.0.0") ( .setMinDivisibleClusterSize($(minDivisibleClusterSize)) .setSeed($(seed)) .setDistanceMeasure($(distanceMeasure)) -val parentModel = bkm.runWithWeight(instances, Some(instr)) -val model = copyValues(new BisectingKMeansModel(uid, parentModel).setParent(this)) -if (handlePersistence) { - instances.unpersist() + +val w = if (isDefined(weightCol) && $(weightCol).nonEmpty) { + checkNonNegativeWeight(col($(weightCol)).cast(DoubleType)) +} else { + lit(1.0) } +val instances = dataset.select(DatasetUtils.columnToVector(dataset, getFeaturesCol), w) + .rdd.map { case Row(point: Vector, weight: Double) => (OldVectors.fromML(point), weight) } + +val handlePersistence = dataset.storageLevel == StorageLevel.NONE +val parentModel = bkm.runWithWeight(instances, handlePersistence, Some(instr)) +val model = copyValues(new BisectingKMeansModel(uid, parentModel).setParent(this)) val summary = new BisectingKMeansSummary( model.transform(dataset), diff --git a/mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala b/mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala index 5c06973..f6f6eb7 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala @@ -32,7 +32,6 @@ import org.apache.spark.ml.util.Instrumentation.instrumented import
[spark] branch branch-3.0 updated: [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] Override `get()` and use Julian days in `DaysWritable`
This is an automated email from the ASF dual-hosted git repository. viirya pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new f088c28 [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] Override `get()` and use Julian days in `DaysWritable` f088c28 is described below commit f088c28a53571afe5146100fd2e76c2b5ec92862 Author: Max Gekk AuthorDate: Sun Aug 23 12:43:30 2020 -0700 [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] Override `get()` and use Julian days in `DaysWritable` ### What changes were proposed in this pull request? Override `def get: Date` in `DaysWritable` use the `daysToMillis(int d)` from the parent class `DateWritable` instead of `long daysToMillis(int d, boolean doesTimeMatter)`. ### Why are the changes needed? It fixes failures of `HiveSerDeReadWriteSuite` with the profile `hive-1.2`. In that case, the parent class `DateWritable` has different implementation before the commit to Hive https://github.com/apache/hive/commit/da3ed68eda10533f3c50aae19731ac6d059cda87. In particular, `get()` calls `new Date(daysToMillis(daysSinceEpoch))` instead of overrided `def get(doesTimeMatter: Boolean): Date` in the child class. The `get()` method returns wrong result `1970-01-01` because it uses not updated [...] ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? By running the test suite `HiveSerDeReadWriteSuite`: ``` $ build/sbt -Phive-1.2 -Phadoop-2.7 "test:testOnly org.apache.spark.sql.hive.execution.HiveSerDeReadWriteSuite" ``` and ``` $ build/sbt -Phive-2.3 -Phadoop-2.7 "test:testOnly org.apache.spark.sql.hive.execution.HiveSerDeReadWriteSuite" ``` Closes #29523 from MaxGekk/insert-date-into-hive-table-1.2. Authored-by: Max Gekk Signed-off-by: Liang-Chi Hsieh (cherry picked from commit 1c798f973fa8307cc1f15eec067886e8e9aecb59) Signed-off-by: Liang-Chi Hsieh --- .../org/apache/spark/sql/execution/datasources/DaysWritable.scala | 3 +++ 1 file changed, 3 insertions(+) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala index 56c176e..a04c2fc 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala @@ -54,6 +54,9 @@ class DaysWritable( } override def getDays: Int = julianDays + override def get: Date = { +new Date(DateWritable.daysToMillis(julianDays)) + } override def get(doesTimeMatter: Boolean): Date = { new Date(DateWritable.daysToMillis(julianDays, doesTimeMatter)) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] Override `get()` and use Julian days in `DaysWritable`
This is an automated email from the ASF dual-hosted git repository. viirya pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new f088c28 [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] Override `get()` and use Julian days in `DaysWritable` f088c28 is described below commit f088c28a53571afe5146100fd2e76c2b5ec92862 Author: Max Gekk AuthorDate: Sun Aug 23 12:43:30 2020 -0700 [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] Override `get()` and use Julian days in `DaysWritable` ### What changes were proposed in this pull request? Override `def get: Date` in `DaysWritable` use the `daysToMillis(int d)` from the parent class `DateWritable` instead of `long daysToMillis(int d, boolean doesTimeMatter)`. ### Why are the changes needed? It fixes failures of `HiveSerDeReadWriteSuite` with the profile `hive-1.2`. In that case, the parent class `DateWritable` has different implementation before the commit to Hive https://github.com/apache/hive/commit/da3ed68eda10533f3c50aae19731ac6d059cda87. In particular, `get()` calls `new Date(daysToMillis(daysSinceEpoch))` instead of overrided `def get(doesTimeMatter: Boolean): Date` in the child class. The `get()` method returns wrong result `1970-01-01` because it uses not updated [...] ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? By running the test suite `HiveSerDeReadWriteSuite`: ``` $ build/sbt -Phive-1.2 -Phadoop-2.7 "test:testOnly org.apache.spark.sql.hive.execution.HiveSerDeReadWriteSuite" ``` and ``` $ build/sbt -Phive-2.3 -Phadoop-2.7 "test:testOnly org.apache.spark.sql.hive.execution.HiveSerDeReadWriteSuite" ``` Closes #29523 from MaxGekk/insert-date-into-hive-table-1.2. Authored-by: Max Gekk Signed-off-by: Liang-Chi Hsieh (cherry picked from commit 1c798f973fa8307cc1f15eec067886e8e9aecb59) Signed-off-by: Liang-Chi Hsieh --- .../org/apache/spark/sql/execution/datasources/DaysWritable.scala | 3 +++ 1 file changed, 3 insertions(+) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala index 56c176e..a04c2fc 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala @@ -54,6 +54,9 @@ class DaysWritable( } override def getDays: Int = julianDays + override def get: Date = { +new Date(DateWritable.daysToMillis(julianDays)) + } override def get(doesTimeMatter: Boolean): Date = { new Date(DateWritable.daysToMillis(julianDays, doesTimeMatter)) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9808c15 -> 1c798f9)
This is an automated email from the ASF dual-hosted git repository. viirya pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9808c15 [SPARK-32608][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script Transform ROW FORMAT DELIMIT value should format value add 1c798f9 [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] Override `get()` and use Julian days in `DaysWritable` No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/execution/datasources/DaysWritable.scala | 3 +++ 1 file changed, 3 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] Override `get()` and use Julian days in `DaysWritable`
This is an automated email from the ASF dual-hosted git repository. viirya pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new f088c28 [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] Override `get()` and use Julian days in `DaysWritable` f088c28 is described below commit f088c28a53571afe5146100fd2e76c2b5ec92862 Author: Max Gekk AuthorDate: Sun Aug 23 12:43:30 2020 -0700 [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] Override `get()` and use Julian days in `DaysWritable` ### What changes were proposed in this pull request? Override `def get: Date` in `DaysWritable` use the `daysToMillis(int d)` from the parent class `DateWritable` instead of `long daysToMillis(int d, boolean doesTimeMatter)`. ### Why are the changes needed? It fixes failures of `HiveSerDeReadWriteSuite` with the profile `hive-1.2`. In that case, the parent class `DateWritable` has different implementation before the commit to Hive https://github.com/apache/hive/commit/da3ed68eda10533f3c50aae19731ac6d059cda87. In particular, `get()` calls `new Date(daysToMillis(daysSinceEpoch))` instead of overrided `def get(doesTimeMatter: Boolean): Date` in the child class. The `get()` method returns wrong result `1970-01-01` because it uses not updated [...] ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? By running the test suite `HiveSerDeReadWriteSuite`: ``` $ build/sbt -Phive-1.2 -Phadoop-2.7 "test:testOnly org.apache.spark.sql.hive.execution.HiveSerDeReadWriteSuite" ``` and ``` $ build/sbt -Phive-2.3 -Phadoop-2.7 "test:testOnly org.apache.spark.sql.hive.execution.HiveSerDeReadWriteSuite" ``` Closes #29523 from MaxGekk/insert-date-into-hive-table-1.2. Authored-by: Max Gekk Signed-off-by: Liang-Chi Hsieh (cherry picked from commit 1c798f973fa8307cc1f15eec067886e8e9aecb59) Signed-off-by: Liang-Chi Hsieh --- .../org/apache/spark/sql/execution/datasources/DaysWritable.scala | 3 +++ 1 file changed, 3 insertions(+) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala index 56c176e..a04c2fc 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala @@ -54,6 +54,9 @@ class DaysWritable( } override def getDays: Int = julianDays + override def get: Date = { +new Date(DateWritable.daysToMillis(julianDays)) + } override def get(doesTimeMatter: Boolean): Date = { new Date(DateWritable.daysToMillis(julianDays, doesTimeMatter)) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9808c15 -> 1c798f9)
This is an automated email from the ASF dual-hosted git repository. viirya pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9808c15 [SPARK-32608][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script Transform ROW FORMAT DELIMIT value should format value add 1c798f9 [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] Override `get()` and use Julian days in `DaysWritable` No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/execution/datasources/DaysWritable.scala | 3 +++ 1 file changed, 3 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] Override `get()` and use Julian days in `DaysWritable`
This is an automated email from the ASF dual-hosted git repository. viirya pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new f088c28 [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] Override `get()` and use Julian days in `DaysWritable` f088c28 is described below commit f088c28a53571afe5146100fd2e76c2b5ec92862 Author: Max Gekk AuthorDate: Sun Aug 23 12:43:30 2020 -0700 [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] Override `get()` and use Julian days in `DaysWritable` ### What changes were proposed in this pull request? Override `def get: Date` in `DaysWritable` use the `daysToMillis(int d)` from the parent class `DateWritable` instead of `long daysToMillis(int d, boolean doesTimeMatter)`. ### Why are the changes needed? It fixes failures of `HiveSerDeReadWriteSuite` with the profile `hive-1.2`. In that case, the parent class `DateWritable` has different implementation before the commit to Hive https://github.com/apache/hive/commit/da3ed68eda10533f3c50aae19731ac6d059cda87. In particular, `get()` calls `new Date(daysToMillis(daysSinceEpoch))` instead of overrided `def get(doesTimeMatter: Boolean): Date` in the child class. The `get()` method returns wrong result `1970-01-01` because it uses not updated [...] ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? By running the test suite `HiveSerDeReadWriteSuite`: ``` $ build/sbt -Phive-1.2 -Phadoop-2.7 "test:testOnly org.apache.spark.sql.hive.execution.HiveSerDeReadWriteSuite" ``` and ``` $ build/sbt -Phive-2.3 -Phadoop-2.7 "test:testOnly org.apache.spark.sql.hive.execution.HiveSerDeReadWriteSuite" ``` Closes #29523 from MaxGekk/insert-date-into-hive-table-1.2. Authored-by: Max Gekk Signed-off-by: Liang-Chi Hsieh (cherry picked from commit 1c798f973fa8307cc1f15eec067886e8e9aecb59) Signed-off-by: Liang-Chi Hsieh --- .../org/apache/spark/sql/execution/datasources/DaysWritable.scala | 3 +++ 1 file changed, 3 insertions(+) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala index 56c176e..a04c2fc 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala @@ -54,6 +54,9 @@ class DaysWritable( } override def getDays: Int = julianDays + override def get: Date = { +new Date(DateWritable.daysToMillis(julianDays)) + } override def get(doesTimeMatter: Boolean): Date = { new Date(DateWritable.daysToMillis(julianDays, doesTimeMatter)) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9808c15 -> 1c798f9)
This is an automated email from the ASF dual-hosted git repository. viirya pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9808c15 [SPARK-32608][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script Transform ROW FORMAT DELIMIT value should format value add 1c798f9 [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] Override `get()` and use Julian days in `DaysWritable` No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/execution/datasources/DaysWritable.scala | 3 +++ 1 file changed, 3 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] Override `get()` and use Julian days in `DaysWritable`
This is an automated email from the ASF dual-hosted git repository. viirya pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new f088c28 [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] Override `get()` and use Julian days in `DaysWritable` f088c28 is described below commit f088c28a53571afe5146100fd2e76c2b5ec92862 Author: Max Gekk AuthorDate: Sun Aug 23 12:43:30 2020 -0700 [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] Override `get()` and use Julian days in `DaysWritable` ### What changes were proposed in this pull request? Override `def get: Date` in `DaysWritable` use the `daysToMillis(int d)` from the parent class `DateWritable` instead of `long daysToMillis(int d, boolean doesTimeMatter)`. ### Why are the changes needed? It fixes failures of `HiveSerDeReadWriteSuite` with the profile `hive-1.2`. In that case, the parent class `DateWritable` has different implementation before the commit to Hive https://github.com/apache/hive/commit/da3ed68eda10533f3c50aae19731ac6d059cda87. In particular, `get()` calls `new Date(daysToMillis(daysSinceEpoch))` instead of overrided `def get(doesTimeMatter: Boolean): Date` in the child class. The `get()` method returns wrong result `1970-01-01` because it uses not updated [...] ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? By running the test suite `HiveSerDeReadWriteSuite`: ``` $ build/sbt -Phive-1.2 -Phadoop-2.7 "test:testOnly org.apache.spark.sql.hive.execution.HiveSerDeReadWriteSuite" ``` and ``` $ build/sbt -Phive-2.3 -Phadoop-2.7 "test:testOnly org.apache.spark.sql.hive.execution.HiveSerDeReadWriteSuite" ``` Closes #29523 from MaxGekk/insert-date-into-hive-table-1.2. Authored-by: Max Gekk Signed-off-by: Liang-Chi Hsieh (cherry picked from commit 1c798f973fa8307cc1f15eec067886e8e9aecb59) Signed-off-by: Liang-Chi Hsieh --- .../org/apache/spark/sql/execution/datasources/DaysWritable.scala | 3 +++ 1 file changed, 3 insertions(+) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala index 56c176e..a04c2fc 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala @@ -54,6 +54,9 @@ class DaysWritable( } override def getDays: Int = julianDays + override def get: Date = { +new Date(DateWritable.daysToMillis(julianDays)) + } override def get(doesTimeMatter: Boolean): Date = { new Date(DateWritable.daysToMillis(julianDays, doesTimeMatter)) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9808c15 -> 1c798f9)
This is an automated email from the ASF dual-hosted git repository. viirya pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9808c15 [SPARK-32608][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script Transform ROW FORMAT DELIMIT value should format value add 1c798f9 [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] Override `get()` and use Julian days in `DaysWritable` No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/execution/datasources/DaysWritable.scala | 3 +++ 1 file changed, 3 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9808c15 -> 1c798f9)
This is an automated email from the ASF dual-hosted git repository. viirya pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9808c15 [SPARK-32608][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script Transform ROW FORMAT DELIMIT value should format value add 1c798f9 [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] Override `get()` and use Julian days in `DaysWritable` No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/execution/datasources/DaysWritable.scala | 3 +++ 1 file changed, 3 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32608][SQL][3.0][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script Transform ROW FORMAT DELIMIT value should format value
This is an automated email from the ASF dual-hosted git repository. viirya pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new f5d5422 [SPARK-32608][SQL][3.0][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script Transform ROW FORMAT DELIMIT value should format value f5d5422 is described below commit f5d5422a4f87f69514d95f80f5f3db8246d61256 Author: angerszhu AuthorDate: Sun Aug 23 08:20:05 2020 -0700 [SPARK-32608][SQL][3.0][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script Transform ROW FORMAT DELIMIT value should format value ### What changes were proposed in this pull request? As mentioned in https://github.com/apache/spark/pull/29428#issuecomment-678735163 by viirya , fix bug in UT, since in script transformation no-serde mode, output of decimal is same in both hive-1.2/hive-2.3 ### Why are the changes needed? FIX UT ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? EXISTED UT Closes #29521 from AngersZh/SPARK-32608-3.0-FOLLOW-UP. Authored-by: angerszhu Signed-off-by: Liang-Chi Hsieh --- .../apache/spark/sql/hive/execution/ScriptTransformationSuite.scala | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala index 15a932f..0d1fe20 100644 --- a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala +++ b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala @@ -299,7 +299,7 @@ class ScriptTransformationSuite extends SparkPlanTest with SQLTestUtils with Tes 'a.cast("string"), 'b.cast("string"), 'c.cast("string"), - decimalToString('d), + 'd.cast("string"), 'e.cast("string")).collect()) // input/output with different delimit and show result @@ -322,7 +322,7 @@ class ScriptTransformationSuite extends SparkPlanTest with SQLTestUtils with Tes 'a.cast("string"), 'b.cast("string"), 'c.cast("string"), -decimalToString('d), +'d.cast("string"), 'e.cast("string"))).collect()) } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32608][SQL][3.0][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script Transform ROW FORMAT DELIMIT value should format value
This is an automated email from the ASF dual-hosted git repository. viirya pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new f5d5422 [SPARK-32608][SQL][3.0][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script Transform ROW FORMAT DELIMIT value should format value f5d5422 is described below commit f5d5422a4f87f69514d95f80f5f3db8246d61256 Author: angerszhu AuthorDate: Sun Aug 23 08:20:05 2020 -0700 [SPARK-32608][SQL][3.0][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script Transform ROW FORMAT DELIMIT value should format value ### What changes were proposed in this pull request? As mentioned in https://github.com/apache/spark/pull/29428#issuecomment-678735163 by viirya , fix bug in UT, since in script transformation no-serde mode, output of decimal is same in both hive-1.2/hive-2.3 ### Why are the changes needed? FIX UT ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? EXISTED UT Closes #29521 from AngersZh/SPARK-32608-3.0-FOLLOW-UP. Authored-by: angerszhu Signed-off-by: Liang-Chi Hsieh --- .../apache/spark/sql/hive/execution/ScriptTransformationSuite.scala | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala index 15a932f..0d1fe20 100644 --- a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala +++ b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala @@ -299,7 +299,7 @@ class ScriptTransformationSuite extends SparkPlanTest with SQLTestUtils with Tes 'a.cast("string"), 'b.cast("string"), 'c.cast("string"), - decimalToString('d), + 'd.cast("string"), 'e.cast("string")).collect()) // input/output with different delimit and show result @@ -322,7 +322,7 @@ class ScriptTransformationSuite extends SparkPlanTest with SQLTestUtils with Tes 'a.cast("string"), 'b.cast("string"), 'c.cast("string"), -decimalToString('d), +'d.cast("string"), 'e.cast("string"))).collect()) } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32608][SQL][3.0][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script Transform ROW FORMAT DELIMIT value should format value
This is an automated email from the ASF dual-hosted git repository. viirya pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new f5d5422 [SPARK-32608][SQL][3.0][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script Transform ROW FORMAT DELIMIT value should format value f5d5422 is described below commit f5d5422a4f87f69514d95f80f5f3db8246d61256 Author: angerszhu AuthorDate: Sun Aug 23 08:20:05 2020 -0700 [SPARK-32608][SQL][3.0][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script Transform ROW FORMAT DELIMIT value should format value ### What changes were proposed in this pull request? As mentioned in https://github.com/apache/spark/pull/29428#issuecomment-678735163 by viirya , fix bug in UT, since in script transformation no-serde mode, output of decimal is same in both hive-1.2/hive-2.3 ### Why are the changes needed? FIX UT ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? EXISTED UT Closes #29521 from AngersZh/SPARK-32608-3.0-FOLLOW-UP. Authored-by: angerszhu Signed-off-by: Liang-Chi Hsieh --- .../apache/spark/sql/hive/execution/ScriptTransformationSuite.scala | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala index 15a932f..0d1fe20 100644 --- a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala +++ b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala @@ -299,7 +299,7 @@ class ScriptTransformationSuite extends SparkPlanTest with SQLTestUtils with Tes 'a.cast("string"), 'b.cast("string"), 'c.cast("string"), - decimalToString('d), + 'd.cast("string"), 'e.cast("string")).collect()) // input/output with different delimit and show result @@ -322,7 +322,7 @@ class ScriptTransformationSuite extends SparkPlanTest with SQLTestUtils with Tes 'a.cast("string"), 'b.cast("string"), 'c.cast("string"), -decimalToString('d), +'d.cast("string"), 'e.cast("string"))).collect()) } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32608][SQL][3.0][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script Transform ROW FORMAT DELIMIT value should format value
This is an automated email from the ASF dual-hosted git repository. viirya pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new f5d5422 [SPARK-32608][SQL][3.0][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script Transform ROW FORMAT DELIMIT value should format value f5d5422 is described below commit f5d5422a4f87f69514d95f80f5f3db8246d61256 Author: angerszhu AuthorDate: Sun Aug 23 08:20:05 2020 -0700 [SPARK-32608][SQL][3.0][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script Transform ROW FORMAT DELIMIT value should format value ### What changes were proposed in this pull request? As mentioned in https://github.com/apache/spark/pull/29428#issuecomment-678735163 by viirya , fix bug in UT, since in script transformation no-serde mode, output of decimal is same in both hive-1.2/hive-2.3 ### Why are the changes needed? FIX UT ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? EXISTED UT Closes #29521 from AngersZh/SPARK-32608-3.0-FOLLOW-UP. Authored-by: angerszhu Signed-off-by: Liang-Chi Hsieh --- .../apache/spark/sql/hive/execution/ScriptTransformationSuite.scala | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala index 15a932f..0d1fe20 100644 --- a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala +++ b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala @@ -299,7 +299,7 @@ class ScriptTransformationSuite extends SparkPlanTest with SQLTestUtils with Tes 'a.cast("string"), 'b.cast("string"), 'c.cast("string"), - decimalToString('d), + 'd.cast("string"), 'e.cast("string")).collect()) // input/output with different delimit and show result @@ -322,7 +322,7 @@ class ScriptTransformationSuite extends SparkPlanTest with SQLTestUtils with Tes 'a.cast("string"), 'b.cast("string"), 'c.cast("string"), -decimalToString('d), +'d.cast("string"), 'e.cast("string"))).collect()) } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32608][SQL][3.0][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script Transform ROW FORMAT DELIMIT value should format value
This is an automated email from the ASF dual-hosted git repository. viirya pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new f5d5422 [SPARK-32608][SQL][3.0][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script Transform ROW FORMAT DELIMIT value should format value f5d5422 is described below commit f5d5422a4f87f69514d95f80f5f3db8246d61256 Author: angerszhu AuthorDate: Sun Aug 23 08:20:05 2020 -0700 [SPARK-32608][SQL][3.0][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script Transform ROW FORMAT DELIMIT value should format value ### What changes were proposed in this pull request? As mentioned in https://github.com/apache/spark/pull/29428#issuecomment-678735163 by viirya , fix bug in UT, since in script transformation no-serde mode, output of decimal is same in both hive-1.2/hive-2.3 ### Why are the changes needed? FIX UT ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? EXISTED UT Closes #29521 from AngersZh/SPARK-32608-3.0-FOLLOW-UP. Authored-by: angerszhu Signed-off-by: Liang-Chi Hsieh --- .../apache/spark/sql/hive/execution/ScriptTransformationSuite.scala | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala index 15a932f..0d1fe20 100644 --- a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala +++ b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala @@ -299,7 +299,7 @@ class ScriptTransformationSuite extends SparkPlanTest with SQLTestUtils with Tes 'a.cast("string"), 'b.cast("string"), 'c.cast("string"), - decimalToString('d), + 'd.cast("string"), 'e.cast("string")).collect()) // input/output with different delimit and show result @@ -322,7 +322,7 @@ class ScriptTransformationSuite extends SparkPlanTest with SQLTestUtils with Tes 'a.cast("string"), 'b.cast("string"), 'c.cast("string"), -decimalToString('d), +'d.cast("string"), 'e.cast("string"))).collect()) } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (aa0b0b8 -> 9808c15)
This is an automated email from the ASF dual-hosted git repository. viirya pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from aa0b0b8 Revert "[SPARK-32646][SQL] ORC predicate pushdown should work with case-insensitive analysis" add 9808c15 [SPARK-32608][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script Transform ROW FORMAT DELIMIT value should format value No new revisions were added by this update. Summary of changes: .../apache/spark/sql/execution/BaseScriptTransformationSuite.scala| 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (aa0b0b8 -> 9808c15)
This is an automated email from the ASF dual-hosted git repository. viirya pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from aa0b0b8 Revert "[SPARK-32646][SQL] ORC predicate pushdown should work with case-insensitive analysis" add 9808c15 [SPARK-32608][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script Transform ROW FORMAT DELIMIT value should format value No new revisions were added by this update. Summary of changes: .../apache/spark/sql/execution/BaseScriptTransformationSuite.scala| 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (aa0b0b8 -> 9808c15)
This is an automated email from the ASF dual-hosted git repository. viirya pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from aa0b0b8 Revert "[SPARK-32646][SQL] ORC predicate pushdown should work with case-insensitive analysis" add 9808c15 [SPARK-32608][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script Transform ROW FORMAT DELIMIT value should format value No new revisions were added by this update. Summary of changes: .../apache/spark/sql/execution/BaseScriptTransformationSuite.scala| 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (aa0b0b8 -> 9808c15)
This is an automated email from the ASF dual-hosted git repository. viirya pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from aa0b0b8 Revert "[SPARK-32646][SQL] ORC predicate pushdown should work with case-insensitive analysis" add 9808c15 [SPARK-32608][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script Transform ROW FORMAT DELIMIT value should format value No new revisions were added by this update. Summary of changes: .../apache/spark/sql/execution/BaseScriptTransformationSuite.scala| 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-32608][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script Transform ROW FORMAT DELIMIT value should format value
This is an automated email from the ASF dual-hosted git repository. viirya pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 9808c15 [SPARK-32608][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script Transform ROW FORMAT DELIMIT value should format value 9808c15 is described below commit 9808c15eecff6f9947e062ae507cfd87837fff0d Author: angerszhu AuthorDate: Sun Aug 23 08:08:55 2020 -0700 [SPARK-32608][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script Transform ROW FORMAT DELIMIT value should format value ### What changes were proposed in this pull request? As mentioned in https://github.com/apache/spark/pull/29428#issuecomment-678735163 by viirya , fix bug in UT, since in script transformation no-serde mode, output of decimal is same in both hive-1.2/hive-2.3 ### Why are the changes needed? FIX UT ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? EXISTED UT Closes #29520 from AngersZh/SPARK-32608-FOLLOW. Authored-by: angerszhu Signed-off-by: Liang-Chi Hsieh --- .../apache/spark/sql/execution/BaseScriptTransformationSuite.scala| 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/BaseScriptTransformationSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/BaseScriptTransformationSuite.scala index a82d87c..b36c06b 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/BaseScriptTransformationSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/BaseScriptTransformationSuite.scala @@ -345,7 +345,7 @@ abstract class BaseScriptTransformationSuite extends SparkPlanTest with SQLTestU 'a.cast("string"), 'b.cast("string"), 'c.cast("string"), - decimalToString('d), + 'd.cast("string"), 'e.cast("string")).collect()) // input/output with different delimit and show result @@ -368,7 +368,7 @@ abstract class BaseScriptTransformationSuite extends SparkPlanTest with SQLTestU 'a.cast("string"), 'b.cast("string"), 'c.cast("string"), -decimalToString('d), +'d.cast("string"), 'e.cast("string"))).collect()) } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] maropu commented on pull request #286: Add descriptions about GitHub Actions in the "Useful Developer Tools" page
maropu commented on pull request #286: URL: https://github.com/apache/spark-website/pull/286#issuecomment-678762858 Thanks, all! Merged to asf-site. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] asfgit closed pull request #286: Add descriptions about GitHub Actions in the "Useful Developer Tools" page
asfgit closed pull request #286: URL: https://github.com/apache/spark-website/pull/286 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-website] branch asf-site updated: Add descriptions about GitHub Actions in the "Useful Developer Tools" page
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new 0b3a4e6 Add descriptions about GitHub Actions in the "Useful Developer Tools" page 0b3a4e6 is described below commit 0b3a4e606efbd97d6c53407fb60c62a0518c157f Author: Takeshi Yamamuro AuthorDate: Sun Aug 23 20:31:13 2020 +0900 Add descriptions about GitHub Actions in the "Useful Developer Tools" page This PR adds descriptions about how to run tests in a forked repository using GitHub Actions. This comes from https://github.com/apache/spark/pull/29504. https://user-images.githubusercontent.com/692303/90958036-6c520f80-e4cc-11ea-8bc4-f1602bd45bf4.png;> Author: Takeshi Yamamuro Closes #286 from maropu/github-actions. --- developer-tools.md | 19 +++ images/running-tests-using-github-actions.png | Bin 0 -> 312696 bytes site/developer-tools.html | 21 + site/images/running-tests-using-github-actions.png | Bin 0 -> 312696 bytes 4 files changed, 40 insertions(+) diff --git a/developer-tools.md b/developer-tools.md index c664dfc..0078538 100644 --- a/developer-tools.md +++ b/developer-tools.md @@ -228,6 +228,25 @@ Getting logs from the pods and containers directly is an exercise left to the re Kubernetes, and more importantly, minikube have rapid release cycles, and point releases have been found to be buggy and/or break older and existing functionality. If you are having trouble getting tests to pass on Jenkins, but locally things work, don't hesitate to file a Jira issue. +Running tests in your forked repository using GitHub Actions + +GitHub Actions is a functionality within GitHub that enables continuous integration and a wide range of automation. +We already have started using some action scripts and one of them is to run tests for [pull requests](https://spark.apache.org/contributing.html). +If you are planning to create a new pull request, it is important to check if tests can pass on your branch before creating a pull request. +This is because our GitHub Acrions script automatically runs tests for your pull request/following commits and +this can burden our limited resources of GitHub Actions. + +Our script enables you to run tests for a branch in your forked repository. +Let's say that you have a branch named "your_branch" for a pull request. +To run tests on "your_branch" and check test results: + +- Clicks a "Actions" tab in your forked repository. +- Selects a "Build and test" workflow in a "All workflows" list. +- Pushes a "Run workflow" button and enters "your_branch" in a "Target branch to run" field. +- When a "Build and test" workflow finished, clicks a "Report test results" workflow to check test results. + + + ScalaTest Issues If the following error occurs when running ScalaTest diff --git a/images/running-tests-using-github-actions.png b/images/running-tests-using-github-actions.png new file mode 100644 index 000..819203e Binary files /dev/null and b/images/running-tests-using-github-actions.png differ diff --git a/site/developer-tools.html b/site/developer-tools.html index ff34db0..f064c0a 100644 --- a/site/developer-tools.html +++ b/site/developer-tools.html @@ -406,6 +406,27 @@ minikube stop Kubernetes, and more importantly, minikube have rapid release cycles, and point releases have been found to be buggy and/or break older and existing functionality. If you are having trouble getting tests to pass on Jenkins, but locally things work, dont hesitate to file a Jira issue. +Running tests in your forked repository using GitHub Actions + +GitHub Actions is a functionality within GitHub that enables continuous integration and a wide range of automation. +We already have started using some action scripts and one of them is to run tests for https://spark.apache.org/contributing.html;>pull requests. +If you are planning to create a new pull request, it is important to check if tests can pass on your branch before creating a pull request. +This is because our GitHub Acrions script automatically runs tests for your pull request/following commits and +this can burden our limited resources of GitHub Actions. + +Our script enables you to run tests for a branch in your forked repository. +Lets say that you have a branch named your_branch for a pull request. +To run tests on your_branch and check test results: + + + Clicks a Actions tab in your forked repository. + Selects a Build and test workflow in a All workflows list. + Pushes a Run workflow button and enters your_branch in a Target branch to run field. + When a Build and test workflow finished, clicks a Report test results workflow to check test results. + + + + ScalaTest Issues If the
[spark] branch master updated (f258718 -> aa0b0b8)
This is an automated email from the ASF dual-hosted git repository. viirya pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f258718 [SPARK-32678][SQL] Rename EmptyHashedRelationWithAllNullKeys and simplify NAAJ generated code add aa0b0b8 Revert "[SPARK-32646][SQL] ORC predicate pushdown should work with case-insensitive analysis" No new revisions were added by this update. Summary of changes: .../execution/datasources/orc/OrcFileFormat.scala | 16 ++--- .../execution/datasources/orc/OrcFiltersBase.scala | 10 +--- .../sql/execution/datasources/orc/OrcUtils.scala | 14 - .../v2/orc/OrcPartitionReaderFactory.scala | 22 +-- .../sql/execution/datasources/v2/orc/OrcScan.scala | 2 +- .../datasources/v2/orc/OrcScanBuilder.scala| 5 ++ .../sql/execution/datasources/orc/OrcFilters.scala | 44 ++ .../execution/datasources/orc/OrcFilterSuite.scala | 70 +- .../sql/execution/datasources/orc/OrcFilters.scala | 50 +++- .../execution/datasources/orc/OrcFilterSuite.scala | 70 +- 10 files changed, 60 insertions(+), 243 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f258718 -> aa0b0b8)
This is an automated email from the ASF dual-hosted git repository. viirya pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f258718 [SPARK-32678][SQL] Rename EmptyHashedRelationWithAllNullKeys and simplify NAAJ generated code add aa0b0b8 Revert "[SPARK-32646][SQL] ORC predicate pushdown should work with case-insensitive analysis" No new revisions were added by this update. Summary of changes: .../execution/datasources/orc/OrcFileFormat.scala | 16 ++--- .../execution/datasources/orc/OrcFiltersBase.scala | 10 +--- .../sql/execution/datasources/orc/OrcUtils.scala | 14 - .../v2/orc/OrcPartitionReaderFactory.scala | 22 +-- .../sql/execution/datasources/v2/orc/OrcScan.scala | 2 +- .../datasources/v2/orc/OrcScanBuilder.scala| 5 ++ .../sql/execution/datasources/orc/OrcFilters.scala | 44 ++ .../execution/datasources/orc/OrcFilterSuite.scala | 70 +- .../sql/execution/datasources/orc/OrcFilters.scala | 50 +++- .../execution/datasources/orc/OrcFilterSuite.scala | 70 +- 10 files changed, 60 insertions(+), 243 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f258718 -> aa0b0b8)
This is an automated email from the ASF dual-hosted git repository. viirya pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f258718 [SPARK-32678][SQL] Rename EmptyHashedRelationWithAllNullKeys and simplify NAAJ generated code add aa0b0b8 Revert "[SPARK-32646][SQL] ORC predicate pushdown should work with case-insensitive analysis" No new revisions were added by this update. Summary of changes: .../execution/datasources/orc/OrcFileFormat.scala | 16 ++--- .../execution/datasources/orc/OrcFiltersBase.scala | 10 +--- .../sql/execution/datasources/orc/OrcUtils.scala | 14 - .../v2/orc/OrcPartitionReaderFactory.scala | 22 +-- .../sql/execution/datasources/v2/orc/OrcScan.scala | 2 +- .../datasources/v2/orc/OrcScanBuilder.scala| 5 ++ .../sql/execution/datasources/orc/OrcFilters.scala | 44 ++ .../execution/datasources/orc/OrcFilterSuite.scala | 70 +- .../sql/execution/datasources/orc/OrcFilters.scala | 50 +++- .../execution/datasources/orc/OrcFilterSuite.scala | 70 +- 10 files changed, 60 insertions(+), 243 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f258718 -> aa0b0b8)
This is an automated email from the ASF dual-hosted git repository. viirya pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f258718 [SPARK-32678][SQL] Rename EmptyHashedRelationWithAllNullKeys and simplify NAAJ generated code add aa0b0b8 Revert "[SPARK-32646][SQL] ORC predicate pushdown should work with case-insensitive analysis" No new revisions were added by this update. Summary of changes: .../execution/datasources/orc/OrcFileFormat.scala | 16 ++--- .../execution/datasources/orc/OrcFiltersBase.scala | 10 +--- .../sql/execution/datasources/orc/OrcUtils.scala | 14 - .../v2/orc/OrcPartitionReaderFactory.scala | 22 +-- .../sql/execution/datasources/v2/orc/OrcScan.scala | 2 +- .../datasources/v2/orc/OrcScanBuilder.scala| 5 ++ .../sql/execution/datasources/orc/OrcFilters.scala | 44 ++ .../execution/datasources/orc/OrcFilterSuite.scala | 70 +- .../sql/execution/datasources/orc/OrcFilters.scala | 50 +++- .../execution/datasources/orc/OrcFilterSuite.scala | 70 +- 10 files changed, 60 insertions(+), 243 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f258718 -> aa0b0b8)
This is an automated email from the ASF dual-hosted git repository. viirya pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f258718 [SPARK-32678][SQL] Rename EmptyHashedRelationWithAllNullKeys and simplify NAAJ generated code add aa0b0b8 Revert "[SPARK-32646][SQL] ORC predicate pushdown should work with case-insensitive analysis" No new revisions were added by this update. Summary of changes: .../execution/datasources/orc/OrcFileFormat.scala | 16 ++--- .../execution/datasources/orc/OrcFiltersBase.scala | 10 +--- .../sql/execution/datasources/orc/OrcUtils.scala | 14 - .../v2/orc/OrcPartitionReaderFactory.scala | 22 +-- .../sql/execution/datasources/v2/orc/OrcScan.scala | 2 +- .../datasources/v2/orc/OrcScanBuilder.scala| 5 ++ .../sql/execution/datasources/orc/OrcFilters.scala | 44 ++ .../execution/datasources/orc/OrcFilterSuite.scala | 70 +- .../sql/execution/datasources/orc/OrcFilters.scala | 50 +++- .../execution/datasources/orc/OrcFilterSuite.scala | 70 +- 10 files changed, 60 insertions(+), 243 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org