date:20200823

[spark] branch master updated (db74fd0d -> 11c6a23)

2020-08-23 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from db74fd0d [SPARK-32552][SQL][DOCS] Complete the documentation for 
Table-valued Function
 add 11c6a23  [SPARK-32352][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] 
Exclude partition columns from data columns

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/execution/datasources/FileSourceStrategy.scala   | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (db74fd0d -> 11c6a23)

2020-08-23 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from db74fd0d [SPARK-32552][SQL][DOCS] Complete the documentation for 
Table-valued Function
 add 11c6a23  [SPARK-32352][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] 
Exclude partition columns from data columns

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/execution/datasources/FileSourceStrategy.scala   | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (db74fd0d -> 11c6a23)

2020-08-23 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from db74fd0d [SPARK-32552][SQL][DOCS] Complete the documentation for 
Table-valued Function
 add 11c6a23  [SPARK-32352][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] 
Exclude partition columns from data columns

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/execution/datasources/FileSourceStrategy.scala   | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (db74fd0d -> 11c6a23)

2020-08-23 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from db74fd0d [SPARK-32552][SQL][DOCS] Complete the documentation for 
Table-valued Function
 add 11c6a23  [SPARK-32352][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] 
Exclude partition columns from data columns

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/execution/datasources/FileSourceStrategy.scala   | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (db74fd0d -> 11c6a23)

2020-08-23 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from db74fd0d [SPARK-32552][SQL][DOCS] Complete the documentation for 
Table-valued Function
 add 11c6a23  [SPARK-32352][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] 
Exclude partition columns from data columns

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/execution/datasources/FileSourceStrategy.scala   | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code

2020-08-23 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 8aa644e  [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code
8aa644e is described below

commit 8aa644e9a991cd7f965aec082adcc3a3d19d452f
Author: Louiszr 
AuthorDate: Sun Aug 23 21:10:52 2020 -0700

[SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code

### What changes were proposed in this pull request?

- Removed `foldCol` related code introduced in #29445 which is causing 
issues in the base branch.
- Fixed `CrossValidatorModel.copy()` so that it correctly calls `.copy()` 
on the models instead of lists of models.

### Why are the changes needed?

- `foldCol` is from 3.1 hence causing tests to fail.
- `CrossValidatorModel.copy()` is supposed to shallow copy models not lists 
of models.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

- Existing tests created in #29445 ran and passed.
- Updated `test_copy` to make sure `copy()` is called on models instead of 
lists of models.

Closes #29524 from Louiszr/remove-foldcol-3.0.

Authored-by: Louiszr 
Signed-off-by: Huaxin Gao 
---
 python/pyspark/ml/tests/test_tuning.py | 11 ---
 python/pyspark/ml/tuning.py|  7 ---
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/python/pyspark/ml/tests/test_tuning.py 
b/python/pyspark/ml/tests/test_tuning.py
index b250740..b1acaf6 100644
--- a/python/pyspark/ml/tests/test_tuning.py
+++ b/python/pyspark/ml/tests/test_tuning.py
@@ -101,7 +101,6 @@ class CrossValidatorTests(SparkSessionTestCase):
 lambda x: x.getEstimator().uid,
 # SPARK-32092: CrossValidator.copy() needs to copy all existing 
params
 lambda x: x.getNumFolds(),
-lambda x: x.getFoldCol(),
 lambda x: x.getCollectSubModels(),
 lambda x: x.getParallelism(),
 lambda x: x.getSeed()
@@ -116,7 +115,6 @@ class CrossValidatorTests(SparkSessionTestCase):
 # SPARK-32092: CrossValidatorModel.copy() needs to copy all existing 
params
 for param in [
 lambda x: x.getNumFolds(),
-lambda x: x.getFoldCol(),
 lambda x: x.getSeed()
 ]:
 self.assertEqual(param(cvModel), param(cvModelCopied))
@@ -127,9 +125,9 @@ class CrossValidatorTests(SparkSessionTestCase):
 'foo',
 "Changing the original avgMetrics should not affect the copied 
model"
 )
-cvModel.subModels[0] = 'foo'
+cvModel.subModels[0][0].getInducedError = lambda: 'foo'
 self.assertNotEqual(
-cvModelCopied.subModels[0],
+cvModelCopied.subModels[0][0].getInducedError(),
 'foo',
 "Changing the original subModels should not affect the copied 
model"
 )
@@ -224,7 +222,6 @@ class CrossValidatorTests(SparkSessionTestCase):
 loadedCvModel = CrossValidatorModel.load(cvModelPath)
 for param in [
 lambda x: x.getNumFolds(),
-lambda x: x.getFoldCol(),
 lambda x: x.getSeed(),
 lambda x: len(x.subModels)
 ]:
@@ -780,9 +777,9 @@ class TrainValidationSplitTests(SparkSessionTestCase):
 'foo',
 "Changing the original validationMetrics should not affect the 
copied model"
 )
-tvsModel.subModels[0] = 'foo'
+tvsModel.subModels[0].getInducedError = lambda: 'foo'
 self.assertNotEqual(
-tvsModelCopied.subModels[0],
+tvsModelCopied.subModels[0].getInducedError(),
 'foo',
 "Changing the original subModels should not affect the copied 
model"
 )
diff --git a/python/pyspark/ml/tuning.py b/python/pyspark/ml/tuning.py
index 91f34ef..6283c8b 100644
--- a/python/pyspark/ml/tuning.py
+++ b/python/pyspark/ml/tuning.py
@@ -480,7 +480,10 @@ class CrossValidatorModel(Model, _CrossValidatorParams, 
MLReadable, MLWritable):
 extra = dict()
 bestModel = self.bestModel.copy(extra)
 avgMetrics = list(self.avgMetrics)
-subModels = [model.copy() for model in self.subModels]
+subModels = [
+[sub_model.copy() for sub_model in fold_sub_models]
+for fold_sub_models in self.subModels
+]
 return self._copyValues(CrossValidatorModel(bestModel, avgMetrics, 
subModels), extra=extra)
 
 @since("2.3.0")
@@ -511,7 +514,6 @@ class CrossValidatorModel(Model, _CrossValidatorParams, 
MLReadable, MLWritable):
 "estimator": estimator,
 "estimatorParamMaps": epms,
 "numFolds": java_stage.getNumFolds(),
-"foldCol": java_stage.getFoldCol(),
 "seed":

[spark] branch branch-3.0 updated: [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code

2020-08-23 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 8aa644e  [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code
8aa644e is described below

commit 8aa644e9a991cd7f965aec082adcc3a3d19d452f
Author: Louiszr 
AuthorDate: Sun Aug 23 21:10:52 2020 -0700

[SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code

### What changes were proposed in this pull request?

- Removed `foldCol` related code introduced in #29445 which is causing 
issues in the base branch.
- Fixed `CrossValidatorModel.copy()` so that it correctly calls `.copy()` 
on the models instead of lists of models.

### Why are the changes needed?

- `foldCol` is from 3.1 hence causing tests to fail.
- `CrossValidatorModel.copy()` is supposed to shallow copy models not lists 
of models.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

- Existing tests created in #29445 ran and passed.
- Updated `test_copy` to make sure `copy()` is called on models instead of 
lists of models.

Closes #29524 from Louiszr/remove-foldcol-3.0.

Authored-by: Louiszr 
Signed-off-by: Huaxin Gao 
---
 python/pyspark/ml/tests/test_tuning.py | 11 ---
 python/pyspark/ml/tuning.py|  7 ---
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/python/pyspark/ml/tests/test_tuning.py 
b/python/pyspark/ml/tests/test_tuning.py
index b250740..b1acaf6 100644
--- a/python/pyspark/ml/tests/test_tuning.py
+++ b/python/pyspark/ml/tests/test_tuning.py
@@ -101,7 +101,6 @@ class CrossValidatorTests(SparkSessionTestCase):
 lambda x: x.getEstimator().uid,
 # SPARK-32092: CrossValidator.copy() needs to copy all existing 
params
 lambda x: x.getNumFolds(),
-lambda x: x.getFoldCol(),
 lambda x: x.getCollectSubModels(),
 lambda x: x.getParallelism(),
 lambda x: x.getSeed()
@@ -116,7 +115,6 @@ class CrossValidatorTests(SparkSessionTestCase):
 # SPARK-32092: CrossValidatorModel.copy() needs to copy all existing 
params
 for param in [
 lambda x: x.getNumFolds(),
-lambda x: x.getFoldCol(),
 lambda x: x.getSeed()
 ]:
 self.assertEqual(param(cvModel), param(cvModelCopied))
@@ -127,9 +125,9 @@ class CrossValidatorTests(SparkSessionTestCase):
 'foo',
 "Changing the original avgMetrics should not affect the copied 
model"
 )
-cvModel.subModels[0] = 'foo'
+cvModel.subModels[0][0].getInducedError = lambda: 'foo'
 self.assertNotEqual(
-cvModelCopied.subModels[0],
+cvModelCopied.subModels[0][0].getInducedError(),
 'foo',
 "Changing the original subModels should not affect the copied 
model"
 )
@@ -224,7 +222,6 @@ class CrossValidatorTests(SparkSessionTestCase):
 loadedCvModel = CrossValidatorModel.load(cvModelPath)
 for param in [
 lambda x: x.getNumFolds(),
-lambda x: x.getFoldCol(),
 lambda x: x.getSeed(),
 lambda x: len(x.subModels)
 ]:
@@ -780,9 +777,9 @@ class TrainValidationSplitTests(SparkSessionTestCase):
 'foo',
 "Changing the original validationMetrics should not affect the 
copied model"
 )
-tvsModel.subModels[0] = 'foo'
+tvsModel.subModels[0].getInducedError = lambda: 'foo'
 self.assertNotEqual(
-tvsModelCopied.subModels[0],
+tvsModelCopied.subModels[0].getInducedError(),
 'foo',
 "Changing the original subModels should not affect the copied 
model"
 )
diff --git a/python/pyspark/ml/tuning.py b/python/pyspark/ml/tuning.py
index 91f34ef..6283c8b 100644
--- a/python/pyspark/ml/tuning.py
+++ b/python/pyspark/ml/tuning.py
@@ -480,7 +480,10 @@ class CrossValidatorModel(Model, _CrossValidatorParams, 
MLReadable, MLWritable):
 extra = dict()
 bestModel = self.bestModel.copy(extra)
 avgMetrics = list(self.avgMetrics)
-subModels = [model.copy() for model in self.subModels]
+subModels = [
+[sub_model.copy() for sub_model in fold_sub_models]
+for fold_sub_models in self.subModels
+]
 return self._copyValues(CrossValidatorModel(bestModel, avgMetrics, 
subModels), extra=extra)
 
 @since("2.3.0")
@@ -511,7 +514,6 @@ class CrossValidatorModel(Model, _CrossValidatorParams, 
MLReadable, MLWritable):
 "estimator": estimator,
 "estimatorParamMaps": epms,
 "numFolds": java_stage.getNumFolds(),
-"foldCol": java_stage.getFoldCol(),
 "seed":

[spark] branch branch-3.0 updated: [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code

2020-08-23 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 8aa644e  [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code
8aa644e is described below

commit 8aa644e9a991cd7f965aec082adcc3a3d19d452f
Author: Louiszr 
AuthorDate: Sun Aug 23 21:10:52 2020 -0700

[SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code

### What changes were proposed in this pull request?

- Removed `foldCol` related code introduced in #29445 which is causing 
issues in the base branch.
- Fixed `CrossValidatorModel.copy()` so that it correctly calls `.copy()` 
on the models instead of lists of models.

### Why are the changes needed?

- `foldCol` is from 3.1 hence causing tests to fail.
- `CrossValidatorModel.copy()` is supposed to shallow copy models not lists 
of models.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

- Existing tests created in #29445 ran and passed.
- Updated `test_copy` to make sure `copy()` is called on models instead of 
lists of models.

Closes #29524 from Louiszr/remove-foldcol-3.0.

Authored-by: Louiszr 
Signed-off-by: Huaxin Gao 
---
 python/pyspark/ml/tests/test_tuning.py | 11 ---
 python/pyspark/ml/tuning.py|  7 ---
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/python/pyspark/ml/tests/test_tuning.py 
b/python/pyspark/ml/tests/test_tuning.py
index b250740..b1acaf6 100644
--- a/python/pyspark/ml/tests/test_tuning.py
+++ b/python/pyspark/ml/tests/test_tuning.py
@@ -101,7 +101,6 @@ class CrossValidatorTests(SparkSessionTestCase):
 lambda x: x.getEstimator().uid,
 # SPARK-32092: CrossValidator.copy() needs to copy all existing 
params
 lambda x: x.getNumFolds(),
-lambda x: x.getFoldCol(),
 lambda x: x.getCollectSubModels(),
 lambda x: x.getParallelism(),
 lambda x: x.getSeed()
@@ -116,7 +115,6 @@ class CrossValidatorTests(SparkSessionTestCase):
 # SPARK-32092: CrossValidatorModel.copy() needs to copy all existing 
params
 for param in [
 lambda x: x.getNumFolds(),
-lambda x: x.getFoldCol(),
 lambda x: x.getSeed()
 ]:
 self.assertEqual(param(cvModel), param(cvModelCopied))
@@ -127,9 +125,9 @@ class CrossValidatorTests(SparkSessionTestCase):
 'foo',
 "Changing the original avgMetrics should not affect the copied 
model"
 )
-cvModel.subModels[0] = 'foo'
+cvModel.subModels[0][0].getInducedError = lambda: 'foo'
 self.assertNotEqual(
-cvModelCopied.subModels[0],
+cvModelCopied.subModels[0][0].getInducedError(),
 'foo',
 "Changing the original subModels should not affect the copied 
model"
 )
@@ -224,7 +222,6 @@ class CrossValidatorTests(SparkSessionTestCase):
 loadedCvModel = CrossValidatorModel.load(cvModelPath)
 for param in [
 lambda x: x.getNumFolds(),
-lambda x: x.getFoldCol(),
 lambda x: x.getSeed(),
 lambda x: len(x.subModels)
 ]:
@@ -780,9 +777,9 @@ class TrainValidationSplitTests(SparkSessionTestCase):
 'foo',
 "Changing the original validationMetrics should not affect the 
copied model"
 )
-tvsModel.subModels[0] = 'foo'
+tvsModel.subModels[0].getInducedError = lambda: 'foo'
 self.assertNotEqual(
-tvsModelCopied.subModels[0],
+tvsModelCopied.subModels[0].getInducedError(),
 'foo',
 "Changing the original subModels should not affect the copied 
model"
 )
diff --git a/python/pyspark/ml/tuning.py b/python/pyspark/ml/tuning.py
index 91f34ef..6283c8b 100644
--- a/python/pyspark/ml/tuning.py
+++ b/python/pyspark/ml/tuning.py
@@ -480,7 +480,10 @@ class CrossValidatorModel(Model, _CrossValidatorParams, 
MLReadable, MLWritable):
 extra = dict()
 bestModel = self.bestModel.copy(extra)
 avgMetrics = list(self.avgMetrics)
-subModels = [model.copy() for model in self.subModels]
+subModels = [
+[sub_model.copy() for sub_model in fold_sub_models]
+for fold_sub_models in self.subModels
+]
 return self._copyValues(CrossValidatorModel(bestModel, avgMetrics, 
subModels), extra=extra)
 
 @since("2.3.0")
@@ -511,7 +514,6 @@ class CrossValidatorModel(Model, _CrossValidatorParams, 
MLReadable, MLWritable):
 "estimator": estimator,
 "estimatorParamMaps": epms,
 "numFolds": java_stage.getNumFolds(),
-"foldCol": java_stage.getFoldCol(),
 "seed":

[spark] branch branch-3.0 updated: [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code

2020-08-23 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 8aa644e  [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code
8aa644e is described below

commit 8aa644e9a991cd7f965aec082adcc3a3d19d452f
Author: Louiszr 
AuthorDate: Sun Aug 23 21:10:52 2020 -0700

[SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code

### What changes were proposed in this pull request?

- Removed `foldCol` related code introduced in #29445 which is causing 
issues in the base branch.
- Fixed `CrossValidatorModel.copy()` so that it correctly calls `.copy()` 
on the models instead of lists of models.

### Why are the changes needed?

- `foldCol` is from 3.1 hence causing tests to fail.
- `CrossValidatorModel.copy()` is supposed to shallow copy models not lists 
of models.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

- Existing tests created in #29445 ran and passed.
- Updated `test_copy` to make sure `copy()` is called on models instead of 
lists of models.

Closes #29524 from Louiszr/remove-foldcol-3.0.

Authored-by: Louiszr 
Signed-off-by: Huaxin Gao 
---
 python/pyspark/ml/tests/test_tuning.py | 11 ---
 python/pyspark/ml/tuning.py|  7 ---
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/python/pyspark/ml/tests/test_tuning.py 
b/python/pyspark/ml/tests/test_tuning.py
index b250740..b1acaf6 100644
--- a/python/pyspark/ml/tests/test_tuning.py
+++ b/python/pyspark/ml/tests/test_tuning.py
@@ -101,7 +101,6 @@ class CrossValidatorTests(SparkSessionTestCase):
 lambda x: x.getEstimator().uid,
 # SPARK-32092: CrossValidator.copy() needs to copy all existing 
params
 lambda x: x.getNumFolds(),
-lambda x: x.getFoldCol(),
 lambda x: x.getCollectSubModels(),
 lambda x: x.getParallelism(),
 lambda x: x.getSeed()
@@ -116,7 +115,6 @@ class CrossValidatorTests(SparkSessionTestCase):
 # SPARK-32092: CrossValidatorModel.copy() needs to copy all existing 
params
 for param in [
 lambda x: x.getNumFolds(),
-lambda x: x.getFoldCol(),
 lambda x: x.getSeed()
 ]:
 self.assertEqual(param(cvModel), param(cvModelCopied))
@@ -127,9 +125,9 @@ class CrossValidatorTests(SparkSessionTestCase):
 'foo',
 "Changing the original avgMetrics should not affect the copied 
model"
 )
-cvModel.subModels[0] = 'foo'
+cvModel.subModels[0][0].getInducedError = lambda: 'foo'
 self.assertNotEqual(
-cvModelCopied.subModels[0],
+cvModelCopied.subModels[0][0].getInducedError(),
 'foo',
 "Changing the original subModels should not affect the copied 
model"
 )
@@ -224,7 +222,6 @@ class CrossValidatorTests(SparkSessionTestCase):
 loadedCvModel = CrossValidatorModel.load(cvModelPath)
 for param in [
 lambda x: x.getNumFolds(),
-lambda x: x.getFoldCol(),
 lambda x: x.getSeed(),
 lambda x: len(x.subModels)
 ]:
@@ -780,9 +777,9 @@ class TrainValidationSplitTests(SparkSessionTestCase):
 'foo',
 "Changing the original validationMetrics should not affect the 
copied model"
 )
-tvsModel.subModels[0] = 'foo'
+tvsModel.subModels[0].getInducedError = lambda: 'foo'
 self.assertNotEqual(
-tvsModelCopied.subModels[0],
+tvsModelCopied.subModels[0].getInducedError(),
 'foo',
 "Changing the original subModels should not affect the copied 
model"
 )
diff --git a/python/pyspark/ml/tuning.py b/python/pyspark/ml/tuning.py
index 91f34ef..6283c8b 100644
--- a/python/pyspark/ml/tuning.py
+++ b/python/pyspark/ml/tuning.py
@@ -480,7 +480,10 @@ class CrossValidatorModel(Model, _CrossValidatorParams, 
MLReadable, MLWritable):
 extra = dict()
 bestModel = self.bestModel.copy(extra)
 avgMetrics = list(self.avgMetrics)
-subModels = [model.copy() for model in self.subModels]
+subModels = [
+[sub_model.copy() for sub_model in fold_sub_models]
+for fold_sub_models in self.subModels
+]
 return self._copyValues(CrossValidatorModel(bestModel, avgMetrics, 
subModels), extra=extra)
 
 @since("2.3.0")
@@ -511,7 +514,6 @@ class CrossValidatorModel(Model, _CrossValidatorParams, 
MLReadable, MLWritable):
 "estimator": estimator,
 "estimatorParamMaps": epms,
 "numFolds": java_stage.getNumFolds(),
-"foldCol": java_stage.getFoldCol(),
 "seed":

[spark] branch branch-3.0 updated (da60de5 -> 8aa644e)

2020-08-23 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from da60de5  [SPARK-32552][SQL][DOCS] Complete the documentation for 
Table-valued Function
 add 8aa644e  [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code

No new revisions were added by this update.

Summary of changes:
 python/pyspark/ml/tests/test_tuning.py | 11 ---
 python/pyspark/ml/tuning.py|  7 ---
 2 files changed, 8 insertions(+), 10 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (ac520d4 -> 772c706)

2020-08-23 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ac520d4  [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans
 add 772c706  [SPARK-32648][SS] Remove unused DELETE_ACTION in 
FileStreamSinkLog

No new revisions were added by this update.

Summary of changes:
 .../sql/execution/streaming/FileStreamSinkLog.scala |  2 --
 .../streaming/FileStreamSinkLogSuite.scala  | 21 -
 2 files changed, 23 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (ac520d4 -> 772c706)

2020-08-23 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ac520d4  [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans
 add 772c706  [SPARK-32648][SS] Remove unused DELETE_ACTION in 
FileStreamSinkLog

No new revisions were added by this update.

Summary of changes:
 .../sql/execution/streaming/FileStreamSinkLog.scala |  2 --
 .../streaming/FileStreamSinkLogSuite.scala  | 21 -
 2 files changed, 23 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated (898211b -> da60de5)

2020-08-23 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 898211b  [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse 
with DataSourceV2
 add da60de5  [SPARK-32552][SQL][DOCS] Complete the documentation for 
Table-valued Function

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-qry-select-tvf.md | 99 ---
 1 file changed, 80 insertions(+), 19 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (b9585cd -> db74fd0d)

2020-08-23 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b9585cd  [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse 
with DataSourceV2
 add db74fd0d [SPARK-32552][SQL][DOCS] Complete the documentation for 
Table-valued Function

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-qry-select-tvf.md | 99 ---
 1 file changed, 80 insertions(+), 19 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated (898211b -> da60de5)

2020-08-23 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 898211b  [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse 
with DataSourceV2
 add da60de5  [SPARK-32552][SQL][DOCS] Complete the documentation for 
Table-valued Function

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-qry-select-tvf.md | 99 ---
 1 file changed, 80 insertions(+), 19 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (b9585cd -> db74fd0d)

2020-08-23 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b9585cd  [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse 
with DataSourceV2
 add db74fd0d [SPARK-32552][SQL][DOCS] Complete the documentation for 
Table-valued Function

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-qry-select-tvf.md | 99 ---
 1 file changed, 80 insertions(+), 19 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse with DataSourceV2

2020-08-23 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 898211b  [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse 
with DataSourceV2
898211b is described below

commit 898211b54e2c9e212f19d8bad6b7e91b66e5659a
Author: mingjial 
AuthorDate: Sun Aug 23 17:40:59 2020 -0700

[SPARK-32609][TEST] Add Tests for Incorrect exchange reuse with DataSourceV2

### What changes were proposed in this pull request?
Copy  to master branch the unit test added for 
branch-2.4(https://github.com/apache/spark/pull/29430).

### Why are the changes needed?
The unit test will pass at master branch, indicating that issue reported in 
https://issues.apache.org/jira/browse/SPARK-32609 is already fixed at master 
branch. But adding this unit test for future possible failure catch.

### Does this PR introduce _any_ user-facing change?
no.

### How was this patch tested?
sbt test run

Closes #29435 from mingjialiu/master.

Authored-by: mingjial 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit b9585cde31fe99aecca42146c71c552218cba591)
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/connector/DataSourceV2Suite.scala   | 19 +++
 1 file changed, 19 insertions(+)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala
index 2d8761f..a9c521e 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala
@@ -394,6 +394,25 @@ class DataSourceV2Suite extends QueryTest with 
SharedSparkSession with AdaptiveS
   checkAnswer(df, (0 until 3).map(i => Row(i)))
 }
   }
+
+  test("SPARK-32609: DataSourceV2 with different pushedfilters should be 
different") {
+def getScanExec(query: DataFrame): BatchScanExec = {
+  query.queryExecution.executedPlan.collect {
+case d: BatchScanExec => d
+  }.head
+}
+
+Seq(classOf[AdvancedDataSourceV2], 
classOf[JavaAdvancedDataSourceV2]).foreach { cls =>
+  withClue(cls.getName) {
+val df = spark.read.format(cls.getName).load()
+val q1 = df.select('i).filter('i > 6)
+val q2 = df.select('i).filter('i > 5)
+val scan1 = getScanExec(q1)
+val scan2 = getScanExec(q2)
+assert(!scan1.equals(scan2))
+  }
+}
+  }
 }
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32552][SQL][DOCS] Complete the documentation for Table-valued Function

2020-08-23 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new da60de5  [SPARK-32552][SQL][DOCS] Complete the documentation for 
Table-valued Function
da60de5 is described below

commit da60de563a92bb85902681fb0569b43bbc489559
Author: Huaxin Gao 
AuthorDate: Mon Aug 24 09:43:41 2020 +0900

[SPARK-32552][SQL][DOCS] Complete the documentation for Table-valued 
Function

# What changes were proposed in this pull request?
There are two types of TVF. We only documented one type. Adding the doc for 
the 2nd type.

### Why are the changes needed?
complete Table-valued Function doc

### Does this PR introduce _any_ user-facing change?
https://user-images.githubusercontent.com/13592258/89595926-c5eae680-d80a-11ea-918b-0c3646f9930e.png;>

https://user-images.githubusercontent.com/13592258/89595929-c84d4080-d80a-11ea-9803-30eb502ccd05.png;>

https://user-images.githubusercontent.com/13592258/89595931-ca170400-d80a-11ea-8812-2f009746edac.png;>

https://user-images.githubusercontent.com/13592258/89595934-cb483100-d80a-11ea-9e18-9357aa9f2c5c.png;>

### How was this patch tested?
Manually build and check

Closes #29355 from huaxingao/tvf.

Authored-by: Huaxin Gao 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit db74fd0d3320f120540133094a9975963941b98c)
Signed-off-by: Takeshi Yamamuro 
---
 docs/sql-ref-syntax-qry-select-tvf.md | 99 ---
 1 file changed, 80 insertions(+), 19 deletions(-)

diff --git a/docs/sql-ref-syntax-qry-select-tvf.md 
b/docs/sql-ref-syntax-qry-select-tvf.md
index cc8d7c34..b04e2f5 100644
--- a/docs/sql-ref-syntax-qry-select-tvf.md
+++ b/docs/sql-ref-syntax-qry-select-tvf.md
@@ -21,28 +21,14 @@ license: |
 
 ### Description
 
-A table-valued function (TVF) is a function that returns a relation or a set 
of rows.
-
-### Syntax
-
-```sql
-function_name ( expression [ , ... ] ) [ table_alias ]
-```
-
-### Parameters
-
-* **expression**
-
-Specifies a combination of one or more values, operators and SQL functions 
that results in a value.
-
-* **table_alias**
-
-Specifies a temporary name with an optional column name list.
-
-**Syntax:** `[ AS ] table_name [ ( column_name [ , ... ] ) ]`
+A table-valued function (TVF) is a function that returns a relation or a set 
of rows. There are two types of TVFs in Spark SQL:
+1. a TVF that can be specified in a FROM clause, e.g. range;
+2. a TVF that can be specified in SELECT/LATERAL VIEW clauses, e.g. explode.
 
 ### Supported Table-valued Functions
 
+ TVFs that can be specified in a FROM clause:
+
 |Function|Argument Type(s)|Description|
 |||---|
 |**range** ( *end* )|Long|Creates a table with a single *LongType* column 
named *id*,  containing rows in a range from 0 to *end* (exclusive) with 
step value 1.|
@@ -50,6 +36,20 @@ function_name ( expression [ , ... ] ) [ table_alias ]
 |**range** ( *start, end, step* )|Long, Long, Long|Creates a table with a 
single *LongType* column named *id*,  containing rows in a range from 
*start* to *end* (exclusive) with *step* value.|
 |**range** ( *start, end, step, numPartitions* )|Long, Long, Long, Int|Creates 
a table with a single *LongType* column named *id*,  containing rows in a 
range from *start* to *end* (exclusive) with *step* value, with partition 
number *numPartitions* specified.|
 
+ TVFs that can be specified in SELECT/LATERAL VIEW clauses:
+
+|Function|Argument Type(s)|Description|
+|||---|
+|**explode** ( *expr* )|Array/Map|Separates the elements of array *expr* into 
multiple rows, or the elements of map *expr* into multiple rows and columns. 
Unless specified otherwise, uses the default column name col for elements of 
the array or key and value for the elements of the map.|
+|**explode_outer**  ( *expr* )|Array/Map|Separates the elements of array 
*expr* into multiple rows, or the elements of map *expr* into multiple rows and 
columns. Unless specified otherwise, uses the default column name col for 
elements of the array or key and value for the elements of the map.|
+|**inline** ( *expr* )|Expression|Explodes an array of structs into a table. 
Uses column names col1, col2, etc. by default unless specified otherwise.|
+|**inline_outer**  ( *expr* )|Expression|Explodes an array of structs into 
a table. Uses column names col1, col2, etc. by default unless specified 
otherwise.|
+|**posexplode**  ( *expr* )|Array/Map|Separates the elements of array 
*expr* into multiple rows with positions, or the elements of map *expr* into 
multiple rows and columns with positions. Unless specified otherwise, uses the 
column name pos for position, col for elements of the array or key and value

[spark] branch master updated (772c706 -> 8749f2e)

2020-08-23 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 772c706  [SPARK-32648][SS] Remove unused DELETE_ACTION in 
FileStreamSinkLog
 add 8749f2e  [SPARK-32675][MESOS] --py-files option is appended without 
passing value for it

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala  | 4 +++-
 .../spark/scheduler/cluster/mesos/MesosClusterSchedulerSuite.scala| 2 +-
 2 files changed, 4 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (b9585cd -> db74fd0d)

2020-08-23 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b9585cd  [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse 
with DataSourceV2
 add db74fd0d [SPARK-32552][SQL][DOCS] Complete the documentation for 
Table-valued Function

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-qry-select-tvf.md | 99 ---
 1 file changed, 80 insertions(+), 19 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (772c706 -> 8749f2e)

2020-08-23 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 772c706  [SPARK-32648][SS] Remove unused DELETE_ACTION in 
FileStreamSinkLog
 add 8749f2e  [SPARK-32675][MESOS] --py-files option is appended without 
passing value for it

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala  | 4 +++-
 .../spark/scheduler/cluster/mesos/MesosClusterSchedulerSuite.scala| 2 +-
 2 files changed, 4 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse with DataSourceV2

2020-08-23 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 898211b  [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse 
with DataSourceV2
898211b is described below

commit 898211b54e2c9e212f19d8bad6b7e91b66e5659a
Author: mingjial 
AuthorDate: Sun Aug 23 17:40:59 2020 -0700

[SPARK-32609][TEST] Add Tests for Incorrect exchange reuse with DataSourceV2

### What changes were proposed in this pull request?
Copy  to master branch the unit test added for 
branch-2.4(https://github.com/apache/spark/pull/29430).

### Why are the changes needed?
The unit test will pass at master branch, indicating that issue reported in 
https://issues.apache.org/jira/browse/SPARK-32609 is already fixed at master 
branch. But adding this unit test for future possible failure catch.

### Does this PR introduce _any_ user-facing change?
no.

### How was this patch tested?
sbt test run

Closes #29435 from mingjialiu/master.

Authored-by: mingjial 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit b9585cde31fe99aecca42146c71c552218cba591)
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/connector/DataSourceV2Suite.scala   | 19 +++
 1 file changed, 19 insertions(+)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala
index 2d8761f..a9c521e 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala
@@ -394,6 +394,25 @@ class DataSourceV2Suite extends QueryTest with 
SharedSparkSession with AdaptiveS
   checkAnswer(df, (0 until 3).map(i => Row(i)))
 }
   }
+
+  test("SPARK-32609: DataSourceV2 with different pushedfilters should be 
different") {
+def getScanExec(query: DataFrame): BatchScanExec = {
+  query.queryExecution.executedPlan.collect {
+case d: BatchScanExec => d
+  }.head
+}
+
+Seq(classOf[AdvancedDataSourceV2], 
classOf[JavaAdvancedDataSourceV2]).foreach { cls =>
+  withClue(cls.getName) {
+val df = spark.read.format(cls.getName).load()
+val q1 = df.select('i).filter('i > 6)
+val q2 = df.select('i).filter('i > 5)
+val scan1 = getScanExec(q1)
+val scan2 = getScanExec(q2)
+assert(!scan1.equals(scan2))
+  }
+}
+  }
 }
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (8749f2e -> b9585cd)

2020-08-23 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8749f2e  [SPARK-32675][MESOS] --py-files option is appended without 
passing value for it
 add b9585cd  [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse 
with DataSourceV2

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/connector/DataSourceV2Suite.scala   | 19 +++
 1 file changed, 19 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (772c706 -> 8749f2e)

2020-08-23 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 772c706  [SPARK-32648][SS] Remove unused DELETE_ACTION in 
FileStreamSinkLog
 add 8749f2e  [SPARK-32675][MESOS] --py-files option is appended without 
passing value for it

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala  | 4 +++-
 .../spark/scheduler/cluster/mesos/MesosClusterSchedulerSuite.scala| 2 +-
 2 files changed, 4 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32552][SQL][DOCS] Complete the documentation for Table-valued Function

2020-08-23 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new da60de5  [SPARK-32552][SQL][DOCS] Complete the documentation for 
Table-valued Function
da60de5 is described below

commit da60de563a92bb85902681fb0569b43bbc489559
Author: Huaxin Gao 
AuthorDate: Mon Aug 24 09:43:41 2020 +0900

[SPARK-32552][SQL][DOCS] Complete the documentation for Table-valued 
Function

# What changes were proposed in this pull request?
There are two types of TVF. We only documented one type. Adding the doc for 
the 2nd type.

### Why are the changes needed?
complete Table-valued Function doc

### Does this PR introduce _any_ user-facing change?
https://user-images.githubusercontent.com/13592258/89595926-c5eae680-d80a-11ea-918b-0c3646f9930e.png;>

https://user-images.githubusercontent.com/13592258/89595929-c84d4080-d80a-11ea-9803-30eb502ccd05.png;>

https://user-images.githubusercontent.com/13592258/89595931-ca170400-d80a-11ea-8812-2f009746edac.png;>

https://user-images.githubusercontent.com/13592258/89595934-cb483100-d80a-11ea-9e18-9357aa9f2c5c.png;>

### How was this patch tested?
Manually build and check

Closes #29355 from huaxingao/tvf.

Authored-by: Huaxin Gao 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit db74fd0d3320f120540133094a9975963941b98c)
Signed-off-by: Takeshi Yamamuro 
---
 docs/sql-ref-syntax-qry-select-tvf.md | 99 ---
 1 file changed, 80 insertions(+), 19 deletions(-)

diff --git a/docs/sql-ref-syntax-qry-select-tvf.md 
b/docs/sql-ref-syntax-qry-select-tvf.md
index cc8d7c34..b04e2f5 100644
--- a/docs/sql-ref-syntax-qry-select-tvf.md
+++ b/docs/sql-ref-syntax-qry-select-tvf.md
@@ -21,28 +21,14 @@ license: |
 
 ### Description
 
-A table-valued function (TVF) is a function that returns a relation or a set 
of rows.
-
-### Syntax
-
-```sql
-function_name ( expression [ , ... ] ) [ table_alias ]
-```
-
-### Parameters
-
-* **expression**
-
-Specifies a combination of one or more values, operators and SQL functions 
that results in a value.
-
-* **table_alias**
-
-Specifies a temporary name with an optional column name list.
-
-**Syntax:** `[ AS ] table_name [ ( column_name [ , ... ] ) ]`
+A table-valued function (TVF) is a function that returns a relation or a set 
of rows. There are two types of TVFs in Spark SQL:
+1. a TVF that can be specified in a FROM clause, e.g. range;
+2. a TVF that can be specified in SELECT/LATERAL VIEW clauses, e.g. explode.
 
 ### Supported Table-valued Functions
 
+ TVFs that can be specified in a FROM clause:
+
 |Function|Argument Type(s)|Description|
 |||---|
 |**range** ( *end* )|Long|Creates a table with a single *LongType* column 
named *id*,  containing rows in a range from 0 to *end* (exclusive) with 
step value 1.|
@@ -50,6 +36,20 @@ function_name ( expression [ , ... ] ) [ table_alias ]
 |**range** ( *start, end, step* )|Long, Long, Long|Creates a table with a 
single *LongType* column named *id*,  containing rows in a range from 
*start* to *end* (exclusive) with *step* value.|
 |**range** ( *start, end, step, numPartitions* )|Long, Long, Long, Int|Creates 
a table with a single *LongType* column named *id*,  containing rows in a 
range from *start* to *end* (exclusive) with *step* value, with partition 
number *numPartitions* specified.|
 
+ TVFs that can be specified in SELECT/LATERAL VIEW clauses:
+
+|Function|Argument Type(s)|Description|
+|||---|
+|**explode** ( *expr* )|Array/Map|Separates the elements of array *expr* into 
multiple rows, or the elements of map *expr* into multiple rows and columns. 
Unless specified otherwise, uses the default column name col for elements of 
the array or key and value for the elements of the map.|
+|**explode_outer**  ( *expr* )|Array/Map|Separates the elements of array 
*expr* into multiple rows, or the elements of map *expr* into multiple rows and 
columns. Unless specified otherwise, uses the default column name col for 
elements of the array or key and value for the elements of the map.|
+|**inline** ( *expr* )|Expression|Explodes an array of structs into a table. 
Uses column names col1, col2, etc. by default unless specified otherwise.|
+|**inline_outer**  ( *expr* )|Expression|Explodes an array of structs into 
a table. Uses column names col1, col2, etc. by default unless specified 
otherwise.|
+|**posexplode**  ( *expr* )|Array/Map|Separates the elements of array 
*expr* into multiple rows with positions, or the elements of map *expr* into 
multiple rows and columns with positions. Unless specified otherwise, uses the 
column name pos for position, col for elements of the array or key and value

[spark] branch master updated (772c706 -> 8749f2e)

2020-08-23 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 772c706  [SPARK-32648][SS] Remove unused DELETE_ACTION in 
FileStreamSinkLog
 add 8749f2e  [SPARK-32675][MESOS] --py-files option is appended without 
passing value for it

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala  | 4 +++-
 .../spark/scheduler/cluster/mesos/MesosClusterSchedulerSuite.scala| 2 +-
 2 files changed, 4 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (772c706 -> 8749f2e)

2020-08-23 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 772c706  [SPARK-32648][SS] Remove unused DELETE_ACTION in 
FileStreamSinkLog
 add 8749f2e  [SPARK-32675][MESOS] --py-files option is appended without 
passing value for it

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala  | 4 +++-
 .../spark/scheduler/cluster/mesos/MesosClusterSchedulerSuite.scala| 2 +-
 2 files changed, 4 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse with DataSourceV2

2020-08-23 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 898211b  [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse 
with DataSourceV2
898211b is described below

commit 898211b54e2c9e212f19d8bad6b7e91b66e5659a
Author: mingjial 
AuthorDate: Sun Aug 23 17:40:59 2020 -0700

[SPARK-32609][TEST] Add Tests for Incorrect exchange reuse with DataSourceV2

### What changes were proposed in this pull request?
Copy  to master branch the unit test added for 
branch-2.4(https://github.com/apache/spark/pull/29430).

### Why are the changes needed?
The unit test will pass at master branch, indicating that issue reported in 
https://issues.apache.org/jira/browse/SPARK-32609 is already fixed at master 
branch. But adding this unit test for future possible failure catch.

### Does this PR introduce _any_ user-facing change?
no.

### How was this patch tested?
sbt test run

Closes #29435 from mingjialiu/master.

Authored-by: mingjial 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit b9585cde31fe99aecca42146c71c552218cba591)
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/connector/DataSourceV2Suite.scala   | 19 +++
 1 file changed, 19 insertions(+)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala
index 2d8761f..a9c521e 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala
@@ -394,6 +394,25 @@ class DataSourceV2Suite extends QueryTest with 
SharedSparkSession with AdaptiveS
   checkAnswer(df, (0 until 3).map(i => Row(i)))
 }
   }
+
+  test("SPARK-32609: DataSourceV2 with different pushedfilters should be 
different") {
+def getScanExec(query: DataFrame): BatchScanExec = {
+  query.queryExecution.executedPlan.collect {
+case d: BatchScanExec => d
+  }.head
+}
+
+Seq(classOf[AdvancedDataSourceV2], 
classOf[JavaAdvancedDataSourceV2]).foreach { cls =>
+  withClue(cls.getName) {
+val df = spark.read.format(cls.getName).load()
+val q1 = df.select('i).filter('i > 6)
+val q2 = df.select('i).filter('i > 5)
+val scan1 = getScanExec(q1)
+val scan2 = getScanExec(q2)
+assert(!scan1.equals(scan2))
+  }
+}
+  }
 }
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (b9585cd -> db74fd0d)

2020-08-23 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b9585cd  [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse 
with DataSourceV2
 add db74fd0d [SPARK-32552][SQL][DOCS] Complete the documentation for 
Table-valued Function

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-qry-select-tvf.md | 99 ---
 1 file changed, 80 insertions(+), 19 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse with DataSourceV2

2020-08-23 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 898211b  [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse 
with DataSourceV2
898211b is described below

commit 898211b54e2c9e212f19d8bad6b7e91b66e5659a
Author: mingjial 
AuthorDate: Sun Aug 23 17:40:59 2020 -0700

[SPARK-32609][TEST] Add Tests for Incorrect exchange reuse with DataSourceV2

### What changes were proposed in this pull request?
Copy  to master branch the unit test added for 
branch-2.4(https://github.com/apache/spark/pull/29430).

### Why are the changes needed?
The unit test will pass at master branch, indicating that issue reported in 
https://issues.apache.org/jira/browse/SPARK-32609 is already fixed at master 
branch. But adding this unit test for future possible failure catch.

### Does this PR introduce _any_ user-facing change?
no.

### How was this patch tested?
sbt test run

Closes #29435 from mingjialiu/master.

Authored-by: mingjial 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit b9585cde31fe99aecca42146c71c552218cba591)
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/connector/DataSourceV2Suite.scala   | 19 +++
 1 file changed, 19 insertions(+)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala
index 2d8761f..a9c521e 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala
@@ -394,6 +394,25 @@ class DataSourceV2Suite extends QueryTest with 
SharedSparkSession with AdaptiveS
   checkAnswer(df, (0 until 3).map(i => Row(i)))
 }
   }
+
+  test("SPARK-32609: DataSourceV2 with different pushedfilters should be 
different") {
+def getScanExec(query: DataFrame): BatchScanExec = {
+  query.queryExecution.executedPlan.collect {
+case d: BatchScanExec => d
+  }.head
+}
+
+Seq(classOf[AdvancedDataSourceV2], 
classOf[JavaAdvancedDataSourceV2]).foreach { cls =>
+  withClue(cls.getName) {
+val df = spark.read.format(cls.getName).load()
+val q1 = df.select('i).filter('i > 6)
+val q2 = df.select('i).filter('i > 5)
+val scan1 = getScanExec(q1)
+val scan2 = getScanExec(q2)
+assert(!scan1.equals(scan2))
+  }
+}
+  }
 }
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (8749f2e -> b9585cd)

2020-08-23 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8749f2e  [SPARK-32675][MESOS] --py-files option is appended without 
passing value for it
 add b9585cd  [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse 
with DataSourceV2

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/connector/DataSourceV2Suite.scala   | 19 +++
 1 file changed, 19 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (8749f2e -> b9585cd)

2020-08-23 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8749f2e  [SPARK-32675][MESOS] --py-files option is appended without 
passing value for it
 add b9585cd  [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse 
with DataSourceV2

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/connector/DataSourceV2Suite.scala   | 19 +++
 1 file changed, 19 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (ac520d4 -> 772c706)

2020-08-23 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ac520d4  [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans
 add 772c706  [SPARK-32648][SS] Remove unused DELETE_ACTION in 
FileStreamSinkLog

No new revisions were added by this update.

Summary of changes:
 .../sql/execution/streaming/FileStreamSinkLog.scala |  2 --
 .../streaming/FileStreamSinkLogSuite.scala  | 21 -
 2 files changed, 23 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse with DataSourceV2

2020-08-23 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 898211b  [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse 
with DataSourceV2
898211b is described below

commit 898211b54e2c9e212f19d8bad6b7e91b66e5659a
Author: mingjial 
AuthorDate: Sun Aug 23 17:40:59 2020 -0700

[SPARK-32609][TEST] Add Tests for Incorrect exchange reuse with DataSourceV2

### What changes were proposed in this pull request?
Copy  to master branch the unit test added for 
branch-2.4(https://github.com/apache/spark/pull/29430).

### Why are the changes needed?
The unit test will pass at master branch, indicating that issue reported in 
https://issues.apache.org/jira/browse/SPARK-32609 is already fixed at master 
branch. But adding this unit test for future possible failure catch.

### Does this PR introduce _any_ user-facing change?
no.

### How was this patch tested?
sbt test run

Closes #29435 from mingjialiu/master.

Authored-by: mingjial 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit b9585cde31fe99aecca42146c71c552218cba591)
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/connector/DataSourceV2Suite.scala   | 19 +++
 1 file changed, 19 insertions(+)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala
index 2d8761f..a9c521e 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala
@@ -394,6 +394,25 @@ class DataSourceV2Suite extends QueryTest with 
SharedSparkSession with AdaptiveS
   checkAnswer(df, (0 until 3).map(i => Row(i)))
 }
   }
+
+  test("SPARK-32609: DataSourceV2 with different pushedfilters should be 
different") {
+def getScanExec(query: DataFrame): BatchScanExec = {
+  query.queryExecution.executedPlan.collect {
+case d: BatchScanExec => d
+  }.head
+}
+
+Seq(classOf[AdvancedDataSourceV2], 
classOf[JavaAdvancedDataSourceV2]).foreach { cls =>
+  withClue(cls.getName) {
+val df = spark.read.format(cls.getName).load()
+val q1 = df.select('i).filter('i > 6)
+val q2 = df.select('i).filter('i > 5)
+val scan1 = getScanExec(q1)
+val scan2 = getScanExec(q2)
+assert(!scan1.equals(scan2))
+  }
+}
+  }
 }
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (8749f2e -> b9585cd)

2020-08-23 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8749f2e  [SPARK-32675][MESOS] --py-files option is appended without 
passing value for it
 add b9585cd  [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse 
with DataSourceV2

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/connector/DataSourceV2Suite.scala   | 19 +++
 1 file changed, 19 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32552][SQL][DOCS] Complete the documentation for Table-valued Function

2020-08-23 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new da60de5  [SPARK-32552][SQL][DOCS] Complete the documentation for 
Table-valued Function
da60de5 is described below

commit da60de563a92bb85902681fb0569b43bbc489559
Author: Huaxin Gao 
AuthorDate: Mon Aug 24 09:43:41 2020 +0900

[SPARK-32552][SQL][DOCS] Complete the documentation for Table-valued 
Function

# What changes were proposed in this pull request?
There are two types of TVF. We only documented one type. Adding the doc for 
the 2nd type.

### Why are the changes needed?
complete Table-valued Function doc

### Does this PR introduce _any_ user-facing change?
https://user-images.githubusercontent.com/13592258/89595926-c5eae680-d80a-11ea-918b-0c3646f9930e.png;>

https://user-images.githubusercontent.com/13592258/89595929-c84d4080-d80a-11ea-9803-30eb502ccd05.png;>

https://user-images.githubusercontent.com/13592258/89595931-ca170400-d80a-11ea-8812-2f009746edac.png;>

https://user-images.githubusercontent.com/13592258/89595934-cb483100-d80a-11ea-9e18-9357aa9f2c5c.png;>

### How was this patch tested?
Manually build and check

Closes #29355 from huaxingao/tvf.

Authored-by: Huaxin Gao 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit db74fd0d3320f120540133094a9975963941b98c)
Signed-off-by: Takeshi Yamamuro 
---
 docs/sql-ref-syntax-qry-select-tvf.md | 99 ---
 1 file changed, 80 insertions(+), 19 deletions(-)

diff --git a/docs/sql-ref-syntax-qry-select-tvf.md 
b/docs/sql-ref-syntax-qry-select-tvf.md
index cc8d7c34..b04e2f5 100644
--- a/docs/sql-ref-syntax-qry-select-tvf.md
+++ b/docs/sql-ref-syntax-qry-select-tvf.md
@@ -21,28 +21,14 @@ license: |
 
 ### Description
 
-A table-valued function (TVF) is a function that returns a relation or a set 
of rows.
-
-### Syntax
-
-```sql
-function_name ( expression [ , ... ] ) [ table_alias ]
-```
-
-### Parameters
-
-* **expression**
-
-Specifies a combination of one or more values, operators and SQL functions 
that results in a value.
-
-* **table_alias**
-
-Specifies a temporary name with an optional column name list.
-
-**Syntax:** `[ AS ] table_name [ ( column_name [ , ... ] ) ]`
+A table-valued function (TVF) is a function that returns a relation or a set 
of rows. There are two types of TVFs in Spark SQL:
+1. a TVF that can be specified in a FROM clause, e.g. range;
+2. a TVF that can be specified in SELECT/LATERAL VIEW clauses, e.g. explode.
 
 ### Supported Table-valued Functions
 
+ TVFs that can be specified in a FROM clause:
+
 |Function|Argument Type(s)|Description|
 |||---|
 |**range** ( *end* )|Long|Creates a table with a single *LongType* column 
named *id*,  containing rows in a range from 0 to *end* (exclusive) with 
step value 1.|
@@ -50,6 +36,20 @@ function_name ( expression [ , ... ] ) [ table_alias ]
 |**range** ( *start, end, step* )|Long, Long, Long|Creates a table with a 
single *LongType* column named *id*,  containing rows in a range from 
*start* to *end* (exclusive) with *step* value.|
 |**range** ( *start, end, step, numPartitions* )|Long, Long, Long, Int|Creates 
a table with a single *LongType* column named *id*,  containing rows in a 
range from *start* to *end* (exclusive) with *step* value, with partition 
number *numPartitions* specified.|
 
+ TVFs that can be specified in SELECT/LATERAL VIEW clauses:
+
+|Function|Argument Type(s)|Description|
+|||---|
+|**explode** ( *expr* )|Array/Map|Separates the elements of array *expr* into 
multiple rows, or the elements of map *expr* into multiple rows and columns. 
Unless specified otherwise, uses the default column name col for elements of 
the array or key and value for the elements of the map.|
+|**explode_outer**  ( *expr* )|Array/Map|Separates the elements of array 
*expr* into multiple rows, or the elements of map *expr* into multiple rows and 
columns. Unless specified otherwise, uses the default column name col for 
elements of the array or key and value for the elements of the map.|
+|**inline** ( *expr* )|Expression|Explodes an array of structs into a table. 
Uses column names col1, col2, etc. by default unless specified otherwise.|
+|**inline_outer**  ( *expr* )|Expression|Explodes an array of structs into 
a table. Uses column names col1, col2, etc. by default unless specified 
otherwise.|
+|**posexplode**  ( *expr* )|Array/Map|Separates the elements of array 
*expr* into multiple rows with positions, or the elements of map *expr* into 
multiple rows and columns with positions. Unless specified otherwise, uses the 
column name pos for position, col for elements of the array or key and value

[spark] branch master updated (ac520d4 -> 772c706)

2020-08-23 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ac520d4  [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans
 add 772c706  [SPARK-32648][SS] Remove unused DELETE_ACTION in 
FileStreamSinkLog

No new revisions were added by this update.

Summary of changes:
 .../sql/execution/streaming/FileStreamSinkLog.scala |  2 --
 .../streaming/FileStreamSinkLogSuite.scala  | 21 -
 2 files changed, 23 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (ac520d4 -> 772c706)

2020-08-23 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ac520d4  [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans
 add 772c706  [SPARK-32648][SS] Remove unused DELETE_ACTION in 
FileStreamSinkLog

No new revisions were added by this update.

Summary of changes:
 .../sql/execution/streaming/FileStreamSinkLog.scala |  2 --
 .../streaming/FileStreamSinkLogSuite.scala  | 21 -
 2 files changed, 23 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (b9585cd -> db74fd0d)

2020-08-23 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b9585cd  [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse 
with DataSourceV2
 add db74fd0d [SPARK-32552][SQL][DOCS] Complete the documentation for 
Table-valued Function

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-qry-select-tvf.md | 99 ---
 1 file changed, 80 insertions(+), 19 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (8749f2e -> b9585cd)

2020-08-23 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8749f2e  [SPARK-32675][MESOS] --py-files option is appended without 
passing value for it
 add b9585cd  [SPARK-32609][TEST] Add Tests for Incorrect exchange reuse 
with DataSourceV2

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/connector/DataSourceV2Suite.scala   | 19 +++
 1 file changed, 19 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (1c798f9 -> ac520d4)

2020-08-23 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 1c798f9  [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] 
Override `get()` and use Julian days in `DaysWritable`
 add ac520d4  [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/clustering/BisectingKMeans.scala  | 33 ++-
 .../org/apache/spark/ml/clustering/KMeans.scala| 33 ++-
 .../spark/mllib/clustering/BisectingKMeans.scala   | 47 ++
 .../org/apache/spark/mllib/clustering/KMeans.scala | 29 +++--
 4 files changed, 59 insertions(+), 83 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (1c798f9 -> ac520d4)

2020-08-23 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 1c798f9  [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] 
Override `get()` and use Julian days in `DaysWritable`
 add ac520d4  [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/clustering/BisectingKMeans.scala  | 33 ++-
 .../org/apache/spark/ml/clustering/KMeans.scala| 33 ++-
 .../spark/mllib/clustering/BisectingKMeans.scala   | 47 ++
 .../org/apache/spark/mllib/clustering/KMeans.scala | 29 +++--
 4 files changed, 59 insertions(+), 83 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (1c798f9 -> ac520d4)

2020-08-23 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 1c798f9  [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] 
Override `get()` and use Julian days in `DaysWritable`
 add ac520d4  [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/clustering/BisectingKMeans.scala  | 33 ++-
 .../org/apache/spark/ml/clustering/KMeans.scala| 33 ++-
 .../spark/mllib/clustering/BisectingKMeans.scala   | 47 ++
 .../org/apache/spark/mllib/clustering/KMeans.scala | 29 +++--
 4 files changed, 59 insertions(+), 83 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (1c798f9 -> ac520d4)

2020-08-23 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 1c798f9  [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] 
Override `get()` and use Julian days in `DaysWritable`
 add ac520d4  [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/clustering/BisectingKMeans.scala  | 33 ++-
 .../org/apache/spark/ml/clustering/KMeans.scala| 33 ++-
 .../spark/mllib/clustering/BisectingKMeans.scala   | 47 ++
 .../org/apache/spark/mllib/clustering/KMeans.scala | 29 +++--
 4 files changed, 59 insertions(+), 83 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans

2020-08-23 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ac520d4  [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans
ac520d4 is described below

commit ac520d4a7c40a1d67358ee64af26e7f73face448
Author: zhengruifeng 
AuthorDate: Sun Aug 23 17:14:40 2020 -0500

[SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans

### What changes were proposed in this pull request?
Fix double caching in KMeans/BiKMeans:
1, let the callers of `runWithWeight` to pass whether `handlePersistence` 
is needed;
2, persist and unpersist inside of `runWithWeight`;
3, persist the `norms` if needed according to the comments;

### Why are the changes needed?
avoid double caching

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
existing testsuites

Closes #29501 from zhengruifeng/kmeans_handlePersistence.

Authored-by: zhengruifeng 
Signed-off-by: Sean Owen 
---
 .../spark/ml/clustering/BisectingKMeans.scala  | 33 ++-
 .../org/apache/spark/ml/clustering/KMeans.scala| 33 ++-
 .../spark/mllib/clustering/BisectingKMeans.scala   | 47 ++
 .../org/apache/spark/mllib/clustering/KMeans.scala | 29 +++--
 4 files changed, 59 insertions(+), 83 deletions(-)

diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala 
b/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala
index 5a60bed..061091c 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala
@@ -29,9 +29,8 @@ import org.apache.spark.ml.util._
 import org.apache.spark.ml.util.Instrumentation.instrumented
 import org.apache.spark.mllib.clustering.{BisectingKMeans => 
MLlibBisectingKMeans,
   BisectingKMeansModel => MLlibBisectingKMeansModel}
-import org.apache.spark.mllib.linalg.{Vector => OldVector, Vectors => 
OldVectors}
+import org.apache.spark.mllib.linalg.{Vectors => OldVectors}
 import org.apache.spark.mllib.linalg.VectorImplicits._
-import org.apache.spark.rdd.RDD
 import org.apache.spark.sql.{DataFrame, Dataset, Row}
 import org.apache.spark.sql.functions._
 import org.apache.spark.sql.types.{DoubleType, IntegerType, StructType}
@@ -276,21 +275,6 @@ class BisectingKMeans @Since("2.0.0") (
   override def fit(dataset: Dataset[_]): BisectingKMeansModel = instrumented { 
instr =>
 transformSchema(dataset.schema, logging = true)
 
-val handlePersistence = dataset.storageLevel == StorageLevel.NONE
-val w = if (isDefined(weightCol) && $(weightCol).nonEmpty) {
-  checkNonNegativeWeight(col($(weightCol)).cast(DoubleType))
-} else {
-  lit(1.0)
-}
-
-val instances: RDD[(OldVector, Double)] = dataset
-  .select(DatasetUtils.columnToVector(dataset, getFeaturesCol), w).rdd.map 
{
-  case Row(point: Vector, weight: Double) => (OldVectors.fromML(point), 
weight)
-}
-if (handlePersistence) {
-  instances.persist(StorageLevel.MEMORY_AND_DISK)
-}
-
 instr.logPipelineStage(this)
 instr.logDataset(dataset)
 instr.logParams(this, featuresCol, predictionCol, k, maxIter, seed,
@@ -302,11 +286,18 @@ class BisectingKMeans @Since("2.0.0") (
   .setMinDivisibleClusterSize($(minDivisibleClusterSize))
   .setSeed($(seed))
   .setDistanceMeasure($(distanceMeasure))
-val parentModel = bkm.runWithWeight(instances, Some(instr))
-val model = copyValues(new BisectingKMeansModel(uid, 
parentModel).setParent(this))
-if (handlePersistence) {
-  instances.unpersist()
+
+val w = if (isDefined(weightCol) && $(weightCol).nonEmpty) {
+  checkNonNegativeWeight(col($(weightCol)).cast(DoubleType))
+} else {
+  lit(1.0)
 }
+val instances = dataset.select(DatasetUtils.columnToVector(dataset, 
getFeaturesCol), w)
+  .rdd.map { case Row(point: Vector, weight: Double) => 
(OldVectors.fromML(point), weight) }
+
+val handlePersistence = dataset.storageLevel == StorageLevel.NONE
+val parentModel = bkm.runWithWeight(instances, handlePersistence, 
Some(instr))
+val model = copyValues(new BisectingKMeansModel(uid, 
parentModel).setParent(this))
 
 val summary = new BisectingKMeansSummary(
   model.transform(dataset),
diff --git a/mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
b/mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala
index 5c06973..f6f6eb7 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala
@@ -32,7 +32,6 @@ import org.apache.spark.ml.util.Instrumentation.instrumented
 import

[spark] branch branch-3.0 updated: [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] Override `get()` and use Julian days in `DaysWritable`

2020-08-23 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f088c28  [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] 
Override `get()` and use Julian days in `DaysWritable`
f088c28 is described below

commit f088c28a53571afe5146100fd2e76c2b5ec92862
Author: Max Gekk 
AuthorDate: Sun Aug 23 12:43:30 2020 -0700

[SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] Override `get()` 
and use Julian days in `DaysWritable`

### What changes were proposed in this pull request?
Override `def get: Date` in `DaysWritable` use the `daysToMillis(int d)` 
from the parent class `DateWritable` instead of `long daysToMillis(int d, 
boolean doesTimeMatter)`.

### Why are the changes needed?
It fixes failures of `HiveSerDeReadWriteSuite` with the profile `hive-1.2`. 
In that case, the parent class `DateWritable` has different implementation 
before the commit to Hive 
https://github.com/apache/hive/commit/da3ed68eda10533f3c50aae19731ac6d059cda87. 
In particular, `get()` calls `new Date(daysToMillis(daysSinceEpoch))` instead 
of overrided `def get(doesTimeMatter: Boolean): Date` in the child class. The 
`get()` method returns wrong result `1970-01-01` because it uses not updated 
[...]

### Does this PR introduce _any_ user-facing change?
Yes.

### How was this patch tested?
By running the test suite `HiveSerDeReadWriteSuite`:
```
$ build/sbt -Phive-1.2 -Phadoop-2.7 "test:testOnly 
org.apache.spark.sql.hive.execution.HiveSerDeReadWriteSuite"
```
and
```
$ build/sbt -Phive-2.3 -Phadoop-2.7 "test:testOnly 
org.apache.spark.sql.hive.execution.HiveSerDeReadWriteSuite"
```

Closes #29523 from MaxGekk/insert-date-into-hive-table-1.2.

Authored-by: Max Gekk 
Signed-off-by: Liang-Chi Hsieh 
(cherry picked from commit 1c798f973fa8307cc1f15eec067886e8e9aecb59)
Signed-off-by: Liang-Chi Hsieh 
---
 .../org/apache/spark/sql/execution/datasources/DaysWritable.scala  | 3 +++
 1 file changed, 3 insertions(+)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala
index 56c176e..a04c2fc 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala
@@ -54,6 +54,9 @@ class DaysWritable(
   }
 
   override def getDays: Int = julianDays
+  override def get: Date = {
+new Date(DateWritable.daysToMillis(julianDays))
+  }
   override def get(doesTimeMatter: Boolean): Date = {
 new Date(DateWritable.daysToMillis(julianDays, doesTimeMatter))
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] Override `get()` and use Julian days in `DaysWritable`

2020-08-23 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f088c28  [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] 
Override `get()` and use Julian days in `DaysWritable`
f088c28 is described below

commit f088c28a53571afe5146100fd2e76c2b5ec92862
Author: Max Gekk 
AuthorDate: Sun Aug 23 12:43:30 2020 -0700

[SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] Override `get()` 
and use Julian days in `DaysWritable`

### What changes were proposed in this pull request?
Override `def get: Date` in `DaysWritable` use the `daysToMillis(int d)` 
from the parent class `DateWritable` instead of `long daysToMillis(int d, 
boolean doesTimeMatter)`.

### Why are the changes needed?
It fixes failures of `HiveSerDeReadWriteSuite` with the profile `hive-1.2`. 
In that case, the parent class `DateWritable` has different implementation 
before the commit to Hive 
https://github.com/apache/hive/commit/da3ed68eda10533f3c50aae19731ac6d059cda87. 
In particular, `get()` calls `new Date(daysToMillis(daysSinceEpoch))` instead 
of overrided `def get(doesTimeMatter: Boolean): Date` in the child class. The 
`get()` method returns wrong result `1970-01-01` because it uses not updated 
[...]

### Does this PR introduce _any_ user-facing change?
Yes.

### How was this patch tested?
By running the test suite `HiveSerDeReadWriteSuite`:
```
$ build/sbt -Phive-1.2 -Phadoop-2.7 "test:testOnly 
org.apache.spark.sql.hive.execution.HiveSerDeReadWriteSuite"
```
and
```
$ build/sbt -Phive-2.3 -Phadoop-2.7 "test:testOnly 
org.apache.spark.sql.hive.execution.HiveSerDeReadWriteSuite"
```

Closes #29523 from MaxGekk/insert-date-into-hive-table-1.2.

Authored-by: Max Gekk 
Signed-off-by: Liang-Chi Hsieh 
(cherry picked from commit 1c798f973fa8307cc1f15eec067886e8e9aecb59)
Signed-off-by: Liang-Chi Hsieh 
---
 .../org/apache/spark/sql/execution/datasources/DaysWritable.scala  | 3 +++
 1 file changed, 3 insertions(+)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala
index 56c176e..a04c2fc 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala
@@ -54,6 +54,9 @@ class DaysWritable(
   }
 
   override def getDays: Int = julianDays
+  override def get: Date = {
+new Date(DateWritable.daysToMillis(julianDays))
+  }
   override def get(doesTimeMatter: Boolean): Date = {
 new Date(DateWritable.daysToMillis(julianDays, doesTimeMatter))
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (9808c15 -> 1c798f9)

2020-08-23 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9808c15  [SPARK-32608][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] 
Script Transform ROW FORMAT DELIMIT value should format value
 add 1c798f9  [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] 
Override `get()` and use Julian days in `DaysWritable`

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/execution/datasources/DaysWritable.scala  | 3 +++
 1 file changed, 3 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] Override `get()` and use Julian days in `DaysWritable`

2020-08-23 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f088c28  [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] 
Override `get()` and use Julian days in `DaysWritable`
f088c28 is described below

commit f088c28a53571afe5146100fd2e76c2b5ec92862
Author: Max Gekk 
AuthorDate: Sun Aug 23 12:43:30 2020 -0700

[SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] Override `get()` 
and use Julian days in `DaysWritable`

### What changes were proposed in this pull request?
Override `def get: Date` in `DaysWritable` use the `daysToMillis(int d)` 
from the parent class `DateWritable` instead of `long daysToMillis(int d, 
boolean doesTimeMatter)`.

### Why are the changes needed?
It fixes failures of `HiveSerDeReadWriteSuite` with the profile `hive-1.2`. 
In that case, the parent class `DateWritable` has different implementation 
before the commit to Hive 
https://github.com/apache/hive/commit/da3ed68eda10533f3c50aae19731ac6d059cda87. 
In particular, `get()` calls `new Date(daysToMillis(daysSinceEpoch))` instead 
of overrided `def get(doesTimeMatter: Boolean): Date` in the child class. The 
`get()` method returns wrong result `1970-01-01` because it uses not updated 
[...]

### Does this PR introduce _any_ user-facing change?
Yes.

### How was this patch tested?
By running the test suite `HiveSerDeReadWriteSuite`:
```
$ build/sbt -Phive-1.2 -Phadoop-2.7 "test:testOnly 
org.apache.spark.sql.hive.execution.HiveSerDeReadWriteSuite"
```
and
```
$ build/sbt -Phive-2.3 -Phadoop-2.7 "test:testOnly 
org.apache.spark.sql.hive.execution.HiveSerDeReadWriteSuite"
```

Closes #29523 from MaxGekk/insert-date-into-hive-table-1.2.

Authored-by: Max Gekk 
Signed-off-by: Liang-Chi Hsieh 
(cherry picked from commit 1c798f973fa8307cc1f15eec067886e8e9aecb59)
Signed-off-by: Liang-Chi Hsieh 
---
 .../org/apache/spark/sql/execution/datasources/DaysWritable.scala  | 3 +++
 1 file changed, 3 insertions(+)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala
index 56c176e..a04c2fc 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala
@@ -54,6 +54,9 @@ class DaysWritable(
   }
 
   override def getDays: Int = julianDays
+  override def get: Date = {
+new Date(DateWritable.daysToMillis(julianDays))
+  }
   override def get(doesTimeMatter: Boolean): Date = {
 new Date(DateWritable.daysToMillis(julianDays, doesTimeMatter))
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (9808c15 -> 1c798f9)

2020-08-23 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9808c15  [SPARK-32608][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] 
Script Transform ROW FORMAT DELIMIT value should format value
 add 1c798f9  [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] 
Override `get()` and use Julian days in `DaysWritable`

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/execution/datasources/DaysWritable.scala  | 3 +++
 1 file changed, 3 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] Override `get()` and use Julian days in `DaysWritable`

2020-08-23 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f088c28  [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] 
Override `get()` and use Julian days in `DaysWritable`
f088c28 is described below

commit f088c28a53571afe5146100fd2e76c2b5ec92862
Author: Max Gekk 
AuthorDate: Sun Aug 23 12:43:30 2020 -0700

[SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] Override `get()` 
and use Julian days in `DaysWritable`

### What changes were proposed in this pull request?
Override `def get: Date` in `DaysWritable` use the `daysToMillis(int d)` 
from the parent class `DateWritable` instead of `long daysToMillis(int d, 
boolean doesTimeMatter)`.

### Why are the changes needed?
It fixes failures of `HiveSerDeReadWriteSuite` with the profile `hive-1.2`. 
In that case, the parent class `DateWritable` has different implementation 
before the commit to Hive 
https://github.com/apache/hive/commit/da3ed68eda10533f3c50aae19731ac6d059cda87. 
In particular, `get()` calls `new Date(daysToMillis(daysSinceEpoch))` instead 
of overrided `def get(doesTimeMatter: Boolean): Date` in the child class. The 
`get()` method returns wrong result `1970-01-01` because it uses not updated 
[...]

### Does this PR introduce _any_ user-facing change?
Yes.

### How was this patch tested?
By running the test suite `HiveSerDeReadWriteSuite`:
```
$ build/sbt -Phive-1.2 -Phadoop-2.7 "test:testOnly 
org.apache.spark.sql.hive.execution.HiveSerDeReadWriteSuite"
```
and
```
$ build/sbt -Phive-2.3 -Phadoop-2.7 "test:testOnly 
org.apache.spark.sql.hive.execution.HiveSerDeReadWriteSuite"
```

Closes #29523 from MaxGekk/insert-date-into-hive-table-1.2.

Authored-by: Max Gekk 
Signed-off-by: Liang-Chi Hsieh 
(cherry picked from commit 1c798f973fa8307cc1f15eec067886e8e9aecb59)
Signed-off-by: Liang-Chi Hsieh 
---
 .../org/apache/spark/sql/execution/datasources/DaysWritable.scala  | 3 +++
 1 file changed, 3 insertions(+)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala
index 56c176e..a04c2fc 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala
@@ -54,6 +54,9 @@ class DaysWritable(
   }
 
   override def getDays: Int = julianDays
+  override def get: Date = {
+new Date(DateWritable.daysToMillis(julianDays))
+  }
   override def get(doesTimeMatter: Boolean): Date = {
 new Date(DateWritable.daysToMillis(julianDays, doesTimeMatter))
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (9808c15 -> 1c798f9)

2020-08-23 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9808c15  [SPARK-32608][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] 
Script Transform ROW FORMAT DELIMIT value should format value
 add 1c798f9  [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] 
Override `get()` and use Julian days in `DaysWritable`

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/execution/datasources/DaysWritable.scala  | 3 +++
 1 file changed, 3 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] Override `get()` and use Julian days in `DaysWritable`

2020-08-23 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f088c28  [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] 
Override `get()` and use Julian days in `DaysWritable`
f088c28 is described below

commit f088c28a53571afe5146100fd2e76c2b5ec92862
Author: Max Gekk 
AuthorDate: Sun Aug 23 12:43:30 2020 -0700

[SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] Override `get()` 
and use Julian days in `DaysWritable`

### What changes were proposed in this pull request?
Override `def get: Date` in `DaysWritable` use the `daysToMillis(int d)` 
from the parent class `DateWritable` instead of `long daysToMillis(int d, 
boolean doesTimeMatter)`.

### Why are the changes needed?
It fixes failures of `HiveSerDeReadWriteSuite` with the profile `hive-1.2`. 
In that case, the parent class `DateWritable` has different implementation 
before the commit to Hive 
https://github.com/apache/hive/commit/da3ed68eda10533f3c50aae19731ac6d059cda87. 
In particular, `get()` calls `new Date(daysToMillis(daysSinceEpoch))` instead 
of overrided `def get(doesTimeMatter: Boolean): Date` in the child class. The 
`get()` method returns wrong result `1970-01-01` because it uses not updated 
[...]

### Does this PR introduce _any_ user-facing change?
Yes.

### How was this patch tested?
By running the test suite `HiveSerDeReadWriteSuite`:
```
$ build/sbt -Phive-1.2 -Phadoop-2.7 "test:testOnly 
org.apache.spark.sql.hive.execution.HiveSerDeReadWriteSuite"
```
and
```
$ build/sbt -Phive-2.3 -Phadoop-2.7 "test:testOnly 
org.apache.spark.sql.hive.execution.HiveSerDeReadWriteSuite"
```

Closes #29523 from MaxGekk/insert-date-into-hive-table-1.2.

Authored-by: Max Gekk 
Signed-off-by: Liang-Chi Hsieh 
(cherry picked from commit 1c798f973fa8307cc1f15eec067886e8e9aecb59)
Signed-off-by: Liang-Chi Hsieh 
---
 .../org/apache/spark/sql/execution/datasources/DaysWritable.scala  | 3 +++
 1 file changed, 3 insertions(+)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala
index 56c176e..a04c2fc 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DaysWritable.scala
@@ -54,6 +54,9 @@ class DaysWritable(
   }
 
   override def getDays: Int = julianDays
+  override def get: Date = {
+new Date(DateWritable.daysToMillis(julianDays))
+  }
   override def get(doesTimeMatter: Boolean): Date = {
 new Date(DateWritable.daysToMillis(julianDays, doesTimeMatter))
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (9808c15 -> 1c798f9)

2020-08-23 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9808c15  [SPARK-32608][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] 
Script Transform ROW FORMAT DELIMIT value should format value
 add 1c798f9  [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] 
Override `get()` and use Julian days in `DaysWritable`

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/execution/datasources/DaysWritable.scala  | 3 +++
 1 file changed, 3 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (9808c15 -> 1c798f9)

2020-08-23 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9808c15  [SPARK-32608][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] 
Script Transform ROW FORMAT DELIMIT value should format value
 add 1c798f9  [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] 
Override `get()` and use Julian days in `DaysWritable`

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/execution/datasources/DaysWritable.scala  | 3 +++
 1 file changed, 3 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32608][SQL][3.0][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script Transform ROW FORMAT DELIMIT value should format value

2020-08-23 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f5d5422  
[SPARK-32608][SQL][3.0][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script 
Transform ROW FORMAT DELIMIT value should format value
f5d5422 is described below

commit f5d5422a4f87f69514d95f80f5f3db8246d61256
Author: angerszhu 
AuthorDate: Sun Aug 23 08:20:05 2020 -0700

[SPARK-32608][SQL][3.0][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script 
Transform ROW FORMAT DELIMIT value should format value

### What changes were proposed in this pull request?
As mentioned in 
https://github.com/apache/spark/pull/29428#issuecomment-678735163 by viirya ,
fix bug in UT, since in script transformation no-serde mode, output of 
decimal is same in both hive-1.2/hive-2.3

### Why are the changes needed?
FIX UT

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
EXISTED UT

Closes #29521 from AngersZh/SPARK-32608-3.0-FOLLOW-UP.

Authored-by: angerszhu 
Signed-off-by: Liang-Chi Hsieh 
---
 .../apache/spark/sql/hive/execution/ScriptTransformationSuite.scala   | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala
index 15a932f..0d1fe20 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala
@@ -299,7 +299,7 @@ class ScriptTransformationSuite extends SparkPlanTest with 
SQLTestUtils with Tes
   'a.cast("string"),
   'b.cast("string"),
   'c.cast("string"),
-  decimalToString('d),
+  'd.cast("string"),
   'e.cast("string")).collect())
 
   // input/output with different delimit and show result
@@ -322,7 +322,7 @@ class ScriptTransformationSuite extends SparkPlanTest with 
SQLTestUtils with Tes
 'a.cast("string"),
 'b.cast("string"),
 'c.cast("string"),
-decimalToString('d),
+'d.cast("string"),
 'e.cast("string"))).collect())
 }
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32608][SQL][3.0][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script Transform ROW FORMAT DELIMIT value should format value

2020-08-23 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f5d5422  
[SPARK-32608][SQL][3.0][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script 
Transform ROW FORMAT DELIMIT value should format value
f5d5422 is described below

commit f5d5422a4f87f69514d95f80f5f3db8246d61256
Author: angerszhu 
AuthorDate: Sun Aug 23 08:20:05 2020 -0700

[SPARK-32608][SQL][3.0][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script 
Transform ROW FORMAT DELIMIT value should format value

### What changes were proposed in this pull request?
As mentioned in 
https://github.com/apache/spark/pull/29428#issuecomment-678735163 by viirya ,
fix bug in UT, since in script transformation no-serde mode, output of 
decimal is same in both hive-1.2/hive-2.3

### Why are the changes needed?
FIX UT

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
EXISTED UT

Closes #29521 from AngersZh/SPARK-32608-3.0-FOLLOW-UP.

Authored-by: angerszhu 
Signed-off-by: Liang-Chi Hsieh 
---
 .../apache/spark/sql/hive/execution/ScriptTransformationSuite.scala   | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala
index 15a932f..0d1fe20 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala
@@ -299,7 +299,7 @@ class ScriptTransformationSuite extends SparkPlanTest with 
SQLTestUtils with Tes
   'a.cast("string"),
   'b.cast("string"),
   'c.cast("string"),
-  decimalToString('d),
+  'd.cast("string"),
   'e.cast("string")).collect())
 
   // input/output with different delimit and show result
@@ -322,7 +322,7 @@ class ScriptTransformationSuite extends SparkPlanTest with 
SQLTestUtils with Tes
 'a.cast("string"),
 'b.cast("string"),
 'c.cast("string"),
-decimalToString('d),
+'d.cast("string"),
 'e.cast("string"))).collect())
 }
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32608][SQL][3.0][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script Transform ROW FORMAT DELIMIT value should format value

2020-08-23 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f5d5422  
[SPARK-32608][SQL][3.0][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script 
Transform ROW FORMAT DELIMIT value should format value
f5d5422 is described below

commit f5d5422a4f87f69514d95f80f5f3db8246d61256
Author: angerszhu 
AuthorDate: Sun Aug 23 08:20:05 2020 -0700

[SPARK-32608][SQL][3.0][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script 
Transform ROW FORMAT DELIMIT value should format value

### What changes were proposed in this pull request?
As mentioned in 
https://github.com/apache/spark/pull/29428#issuecomment-678735163 by viirya ,
fix bug in UT, since in script transformation no-serde mode, output of 
decimal is same in both hive-1.2/hive-2.3

### Why are the changes needed?
FIX UT

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
EXISTED UT

Closes #29521 from AngersZh/SPARK-32608-3.0-FOLLOW-UP.

Authored-by: angerszhu 
Signed-off-by: Liang-Chi Hsieh 
---
 .../apache/spark/sql/hive/execution/ScriptTransformationSuite.scala   | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala
index 15a932f..0d1fe20 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala
@@ -299,7 +299,7 @@ class ScriptTransformationSuite extends SparkPlanTest with 
SQLTestUtils with Tes
   'a.cast("string"),
   'b.cast("string"),
   'c.cast("string"),
-  decimalToString('d),
+  'd.cast("string"),
   'e.cast("string")).collect())
 
   // input/output with different delimit and show result
@@ -322,7 +322,7 @@ class ScriptTransformationSuite extends SparkPlanTest with 
SQLTestUtils with Tes
 'a.cast("string"),
 'b.cast("string"),
 'c.cast("string"),
-decimalToString('d),
+'d.cast("string"),
 'e.cast("string"))).collect())
 }
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32608][SQL][3.0][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script Transform ROW FORMAT DELIMIT value should format value

2020-08-23 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f5d5422  
[SPARK-32608][SQL][3.0][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script 
Transform ROW FORMAT DELIMIT value should format value
f5d5422 is described below

commit f5d5422a4f87f69514d95f80f5f3db8246d61256
Author: angerszhu 
AuthorDate: Sun Aug 23 08:20:05 2020 -0700

[SPARK-32608][SQL][3.0][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script 
Transform ROW FORMAT DELIMIT value should format value

### What changes were proposed in this pull request?
As mentioned in 
https://github.com/apache/spark/pull/29428#issuecomment-678735163 by viirya ,
fix bug in UT, since in script transformation no-serde mode, output of 
decimal is same in both hive-1.2/hive-2.3

### Why are the changes needed?
FIX UT

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
EXISTED UT

Closes #29521 from AngersZh/SPARK-32608-3.0-FOLLOW-UP.

Authored-by: angerszhu 
Signed-off-by: Liang-Chi Hsieh 
---
 .../apache/spark/sql/hive/execution/ScriptTransformationSuite.scala   | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala
index 15a932f..0d1fe20 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala
@@ -299,7 +299,7 @@ class ScriptTransformationSuite extends SparkPlanTest with 
SQLTestUtils with Tes
   'a.cast("string"),
   'b.cast("string"),
   'c.cast("string"),
-  decimalToString('d),
+  'd.cast("string"),
   'e.cast("string")).collect())
 
   // input/output with different delimit and show result
@@ -322,7 +322,7 @@ class ScriptTransformationSuite extends SparkPlanTest with 
SQLTestUtils with Tes
 'a.cast("string"),
 'b.cast("string"),
 'c.cast("string"),
-decimalToString('d),
+'d.cast("string"),
 'e.cast("string"))).collect())
 }
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32608][SQL][3.0][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script Transform ROW FORMAT DELIMIT value should format value

2020-08-23 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f5d5422  
[SPARK-32608][SQL][3.0][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script 
Transform ROW FORMAT DELIMIT value should format value
f5d5422 is described below

commit f5d5422a4f87f69514d95f80f5f3db8246d61256
Author: angerszhu 
AuthorDate: Sun Aug 23 08:20:05 2020 -0700

[SPARK-32608][SQL][3.0][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script 
Transform ROW FORMAT DELIMIT value should format value

### What changes were proposed in this pull request?
As mentioned in 
https://github.com/apache/spark/pull/29428#issuecomment-678735163 by viirya ,
fix bug in UT, since in script transformation no-serde mode, output of 
decimal is same in both hive-1.2/hive-2.3

### Why are the changes needed?
FIX UT

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
EXISTED UT

Closes #29521 from AngersZh/SPARK-32608-3.0-FOLLOW-UP.

Authored-by: angerszhu 
Signed-off-by: Liang-Chi Hsieh 
---
 .../apache/spark/sql/hive/execution/ScriptTransformationSuite.scala   | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala
index 15a932f..0d1fe20 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala
@@ -299,7 +299,7 @@ class ScriptTransformationSuite extends SparkPlanTest with 
SQLTestUtils with Tes
   'a.cast("string"),
   'b.cast("string"),
   'c.cast("string"),
-  decimalToString('d),
+  'd.cast("string"),
   'e.cast("string")).collect())
 
   // input/output with different delimit and show result
@@ -322,7 +322,7 @@ class ScriptTransformationSuite extends SparkPlanTest with 
SQLTestUtils with Tes
 'a.cast("string"),
 'b.cast("string"),
 'c.cast("string"),
-decimalToString('d),
+'d.cast("string"),
 'e.cast("string"))).collect())
 }
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (aa0b0b8 -> 9808c15)

2020-08-23 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from aa0b0b8  Revert "[SPARK-32646][SQL] ORC predicate pushdown should work 
with case-insensitive analysis"
 add 9808c15  [SPARK-32608][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] 
Script Transform ROW FORMAT DELIMIT value should format value

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/execution/BaseScriptTransformationSuite.scala| 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (aa0b0b8 -> 9808c15)

2020-08-23 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from aa0b0b8  Revert "[SPARK-32646][SQL] ORC predicate pushdown should work 
with case-insensitive analysis"
 add 9808c15  [SPARK-32608][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] 
Script Transform ROW FORMAT DELIMIT value should format value

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/execution/BaseScriptTransformationSuite.scala| 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (aa0b0b8 -> 9808c15)

2020-08-23 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from aa0b0b8  Revert "[SPARK-32646][SQL] ORC predicate pushdown should work 
with case-insensitive analysis"
 add 9808c15  [SPARK-32608][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] 
Script Transform ROW FORMAT DELIMIT value should format value

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/execution/BaseScriptTransformationSuite.scala| 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (aa0b0b8 -> 9808c15)

2020-08-23 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from aa0b0b8  Revert "[SPARK-32646][SQL] ORC predicate pushdown should work 
with case-insensitive analysis"
 add 9808c15  [SPARK-32608][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] 
Script Transform ROW FORMAT DELIMIT value should format value

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/execution/BaseScriptTransformationSuite.scala| 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-32608][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script Transform ROW FORMAT DELIMIT value should format value

2020-08-23 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9808c15  [SPARK-32608][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] 
Script Transform ROW FORMAT DELIMIT value should format value
9808c15 is described below

commit 9808c15eecff6f9947e062ae507cfd87837fff0d
Author: angerszhu 
AuthorDate: Sun Aug 23 08:08:55 2020 -0700

[SPARK-32608][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script 
Transform ROW FORMAT DELIMIT value should format value

### What changes were proposed in this pull request?
As mentioned in 
https://github.com/apache/spark/pull/29428#issuecomment-678735163 by viirya ,
fix bug in UT, since in script transformation no-serde mode, output of 
decimal is same in both hive-1.2/hive-2.3

### Why are the changes needed?
FIX UT

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
EXISTED UT

Closes #29520 from AngersZh/SPARK-32608-FOLLOW.

Authored-by: angerszhu 
Signed-off-by: Liang-Chi Hsieh 
---
 .../apache/spark/sql/execution/BaseScriptTransformationSuite.scala| 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/BaseScriptTransformationSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/BaseScriptTransformationSuite.scala
index a82d87c..b36c06b 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/BaseScriptTransformationSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/BaseScriptTransformationSuite.scala
@@ -345,7 +345,7 @@ abstract class BaseScriptTransformationSuite extends 
SparkPlanTest with SQLTestU
   'a.cast("string"),
   'b.cast("string"),
   'c.cast("string"),
-  decimalToString('d),
+  'd.cast("string"),
   'e.cast("string")).collect())
 
   // input/output with different delimit and show result
@@ -368,7 +368,7 @@ abstract class BaseScriptTransformationSuite extends 
SparkPlanTest with SQLTestU
 'a.cast("string"),
 'b.cast("string"),
 'c.cast("string"),
-decimalToString('d),
+'d.cast("string"),
 'e.cast("string"))).collect())
 }
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] maropu commented on pull request #286: Add descriptions about GitHub Actions in the "Useful Developer Tools" page

2020-08-23 Thread GitBox



maropu commented on pull request #286:
URL: https://github.com/apache/spark-website/pull/286#issuecomment-678762858


   Thanks, all! Merged to asf-site.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] asfgit closed pull request #286: Add descriptions about GitHub Actions in the "Useful Developer Tools" page

2020-08-23 Thread GitBox



asfgit closed pull request #286:
URL: https://github.com/apache/spark-website/pull/286


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark-website] branch asf-site updated: Add descriptions about GitHub Actions in the "Useful Developer Tools" page

2020-08-23 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 0b3a4e6  Add descriptions about GitHub Actions in the "Useful 
Developer Tools" page
0b3a4e6 is described below

commit 0b3a4e606efbd97d6c53407fb60c62a0518c157f
Author: Takeshi Yamamuro 
AuthorDate: Sun Aug 23 20:31:13 2020 +0900

Add descriptions about GitHub Actions in the "Useful Developer Tools" page

This PR adds descriptions about how to run tests in a forked repository 
using GitHub Actions.
This comes from https://github.com/apache/spark/pull/29504.

https://user-images.githubusercontent.com/692303/90958036-6c520f80-e4cc-11ea-8bc4-f1602bd45bf4.png;>

Author: Takeshi Yamamuro 

Closes #286 from maropu/github-actions.
---
 developer-tools.md |  19 +++
 images/running-tests-using-github-actions.png  | Bin 0 -> 312696 bytes
 site/developer-tools.html  |  21 +
 site/images/running-tests-using-github-actions.png | Bin 0 -> 312696 bytes
 4 files changed, 40 insertions(+)

diff --git a/developer-tools.md b/developer-tools.md
index c664dfc..0078538 100644
--- a/developer-tools.md
+++ b/developer-tools.md
@@ -228,6 +228,25 @@ Getting logs from the pods and containers directly is an 
exercise left to the re
 
 Kubernetes, and more importantly, minikube have rapid release cycles, and 
point releases have been found to be buggy and/or break older and existing 
functionality.  If you are having trouble getting tests to pass on Jenkins, but 
locally things work, don't hesitate to file a Jira issue.
 
+Running tests in your forked repository using GitHub Actions
+
+GitHub Actions is a functionality within GitHub that enables continuous 
integration and a wide range of automation.
+We already have started using some action scripts and one of them is to run 
tests for [pull requests](https://spark.apache.org/contributing.html).
+If you are planning to create a new pull request, it is important to check if 
tests can pass on your branch before creating a pull request.
+This is because our GitHub Acrions script automatically runs tests for your 
pull request/following commits and
+this can burden our limited resources of GitHub Actions.
+
+Our script enables you to run tests for a branch in your forked repository.
+Let's say that you have a branch named "your_branch" for a pull request.
+To run tests on "your_branch" and check test results:
+
+- Clicks a "Actions" tab in your forked repository.
+- Selects a "Build and test" workflow in a "All workflows" list.
+- Pushes a "Run workflow" button and enters "your_branch" in a "Target branch 
to run" field.
+- When a "Build and test" workflow finished, clicks a "Report test results" 
workflow to check test results.
+
+
+
 ScalaTest Issues
 
 If the following error occurs when running ScalaTest
diff --git a/images/running-tests-using-github-actions.png 
b/images/running-tests-using-github-actions.png
new file mode 100644
index 000..819203e
Binary files /dev/null and b/images/running-tests-using-github-actions.png 
differ
diff --git a/site/developer-tools.html b/site/developer-tools.html
index ff34db0..f064c0a 100644
--- a/site/developer-tools.html
+++ b/site/developer-tools.html
@@ -406,6 +406,27 @@ minikube stop
 
 Kubernetes, and more importantly, minikube have rapid release cycles, and 
point releases have been found to be buggy and/or break older and existing 
functionality.  If you are having trouble getting tests to pass on Jenkins, but 
locally things work, dont hesitate to file a Jira issue.
 
+Running tests in your forked repository using GitHub Actions
+
+GitHub Actions is a functionality within GitHub that enables continuous 
integration and a wide range of automation.
+We already have started using some action scripts and one of them is to run 
tests for https://spark.apache.org/contributing.html;>pull 
requests.
+If you are planning to create a new pull request, it is important to check if 
tests can pass on your branch before creating a pull request.
+This is because our GitHub Acrions script automatically runs tests for your 
pull request/following commits and
+this can burden our limited resources of GitHub Actions.
+
+Our script enables you to run tests for a branch in your forked repository.
+Lets say that you have a branch named your_branch for a 
pull request.
+To run tests on your_branch and check test results:
+
+
+  Clicks a Actions tab in your forked repository.
+  Selects a Build and test workflow in a All 
workflows list.
+  Pushes a Run workflow button and enters 
your_branch in a Target branch to run field.
+  When a Build and test workflow finished, clicks a 
Report test results workflow to check test results.
+
+
+
+
 ScalaTest Issues
 
 If the

[spark] branch master updated (f258718 -> aa0b0b8)

2020-08-23 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f258718  [SPARK-32678][SQL] Rename EmptyHashedRelationWithAllNullKeys 
and simplify NAAJ generated code
 add aa0b0b8  Revert "[SPARK-32646][SQL] ORC predicate pushdown should work 
with case-insensitive analysis"

No new revisions were added by this update.

Summary of changes:
 .../execution/datasources/orc/OrcFileFormat.scala  | 16 ++---
 .../execution/datasources/orc/OrcFiltersBase.scala | 10 +---
 .../sql/execution/datasources/orc/OrcUtils.scala   | 14 -
 .../v2/orc/OrcPartitionReaderFactory.scala | 22 +--
 .../sql/execution/datasources/v2/orc/OrcScan.scala |  2 +-
 .../datasources/v2/orc/OrcScanBuilder.scala|  5 ++
 .../sql/execution/datasources/orc/OrcFilters.scala | 44 ++
 .../execution/datasources/orc/OrcFilterSuite.scala | 70 +-
 .../sql/execution/datasources/orc/OrcFilters.scala | 50 +++-
 .../execution/datasources/orc/OrcFilterSuite.scala | 70 +-
 10 files changed, 60 insertions(+), 243 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f258718 -> aa0b0b8)

2020-08-23 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f258718  [SPARK-32678][SQL] Rename EmptyHashedRelationWithAllNullKeys 
and simplify NAAJ generated code
 add aa0b0b8  Revert "[SPARK-32646][SQL] ORC predicate pushdown should work 
with case-insensitive analysis"

No new revisions were added by this update.

Summary of changes:
 .../execution/datasources/orc/OrcFileFormat.scala  | 16 ++---
 .../execution/datasources/orc/OrcFiltersBase.scala | 10 +---
 .../sql/execution/datasources/orc/OrcUtils.scala   | 14 -
 .../v2/orc/OrcPartitionReaderFactory.scala | 22 +--
 .../sql/execution/datasources/v2/orc/OrcScan.scala |  2 +-
 .../datasources/v2/orc/OrcScanBuilder.scala|  5 ++
 .../sql/execution/datasources/orc/OrcFilters.scala | 44 ++
 .../execution/datasources/orc/OrcFilterSuite.scala | 70 +-
 .../sql/execution/datasources/orc/OrcFilters.scala | 50 +++-
 .../execution/datasources/orc/OrcFilterSuite.scala | 70 +-
 10 files changed, 60 insertions(+), 243 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f258718 -> aa0b0b8)

2020-08-23 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f258718  [SPARK-32678][SQL] Rename EmptyHashedRelationWithAllNullKeys 
and simplify NAAJ generated code
 add aa0b0b8  Revert "[SPARK-32646][SQL] ORC predicate pushdown should work 
with case-insensitive analysis"

No new revisions were added by this update.

Summary of changes:
 .../execution/datasources/orc/OrcFileFormat.scala  | 16 ++---
 .../execution/datasources/orc/OrcFiltersBase.scala | 10 +---
 .../sql/execution/datasources/orc/OrcUtils.scala   | 14 -
 .../v2/orc/OrcPartitionReaderFactory.scala | 22 +--
 .../sql/execution/datasources/v2/orc/OrcScan.scala |  2 +-
 .../datasources/v2/orc/OrcScanBuilder.scala|  5 ++
 .../sql/execution/datasources/orc/OrcFilters.scala | 44 ++
 .../execution/datasources/orc/OrcFilterSuite.scala | 70 +-
 .../sql/execution/datasources/orc/OrcFilters.scala | 50 +++-
 .../execution/datasources/orc/OrcFilterSuite.scala | 70 +-
 10 files changed, 60 insertions(+), 243 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f258718 -> aa0b0b8)

2020-08-23 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f258718  [SPARK-32678][SQL] Rename EmptyHashedRelationWithAllNullKeys 
and simplify NAAJ generated code
 add aa0b0b8  Revert "[SPARK-32646][SQL] ORC predicate pushdown should work 
with case-insensitive analysis"

No new revisions were added by this update.

Summary of changes:
 .../execution/datasources/orc/OrcFileFormat.scala  | 16 ++---
 .../execution/datasources/orc/OrcFiltersBase.scala | 10 +---
 .../sql/execution/datasources/orc/OrcUtils.scala   | 14 -
 .../v2/orc/OrcPartitionReaderFactory.scala | 22 +--
 .../sql/execution/datasources/v2/orc/OrcScan.scala |  2 +-
 .../datasources/v2/orc/OrcScanBuilder.scala|  5 ++
 .../sql/execution/datasources/orc/OrcFilters.scala | 44 ++
 .../execution/datasources/orc/OrcFilterSuite.scala | 70 +-
 .../sql/execution/datasources/orc/OrcFilters.scala | 50 +++-
 .../execution/datasources/orc/OrcFilterSuite.scala | 70 +-
 10 files changed, 60 insertions(+), 243 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f258718 -> aa0b0b8)

2020-08-23 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f258718  [SPARK-32678][SQL] Rename EmptyHashedRelationWithAllNullKeys 
and simplify NAAJ generated code
 add aa0b0b8  Revert "[SPARK-32646][SQL] ORC predicate pushdown should work 
with case-insensitive analysis"

No new revisions were added by this update.

Summary of changes:
 .../execution/datasources/orc/OrcFileFormat.scala  | 16 ++---
 .../execution/datasources/orc/OrcFiltersBase.scala | 10 +---
 .../sql/execution/datasources/orc/OrcUtils.scala   | 14 -
 .../v2/orc/OrcPartitionReaderFactory.scala | 22 +--
 .../sql/execution/datasources/v2/orc/OrcScan.scala |  2 +-
 .../datasources/v2/orc/OrcScanBuilder.scala|  5 ++
 .../sql/execution/datasources/orc/OrcFilters.scala | 44 ++
 .../execution/datasources/orc/OrcFilterSuite.scala | 70 +-
 .../sql/execution/datasources/orc/OrcFilters.scala | 50 +++-
 .../execution/datasources/orc/OrcFilterSuite.scala | 70 +-
 10 files changed, 60 insertions(+), 243 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

73 matches

Mail list logo