date:20200824

[spark] branch master updated (3f1e56d -> c26a976)

2020-08-24 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3f1e56d  [SPARK-32641][SQL] withField + getField should return null if 
original struct was null
 add c26a976  Revert "[SPARK-32412][SQL] Unify error handling for spark 
thrift serv…

No new revisions were added by this update.

Summary of changes:
 .../SparkExecuteStatementOperation.scala   | 56 +---
 .../sql/hive/thriftserver/SparkOperation.scala | 35 +++--
 .../ThriftServerWithSparkContextSuite.scala| 61 +++---
 3 files changed, 75 insertions(+), 77 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (cee48a9 -> 3f1e56d)

2020-08-24 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from cee48a9  [SPARK-32646][SQL][TEST-HADOOP2.7][TEST-HIVE1.2] ORC 
predicate pushdown should work with case-insensitive analysis
 add 3f1e56d  [SPARK-32641][SQL] withField + getField should return null if 
original struct was null

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/optimizer/ComplexTypes.scala  |  3 +-
 .../sql/catalyst/optimizer/complexTypesSuite.scala | 96 --
 .../apache/spark/sql/ColumnExpressionSuite.scala   | 85 +++
 3 files changed, 156 insertions(+), 28 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (cee48a9 -> 3f1e56d)

2020-08-24 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from cee48a9  [SPARK-32646][SQL][TEST-HADOOP2.7][TEST-HIVE1.2] ORC 
predicate pushdown should work with case-insensitive analysis
 add 3f1e56d  [SPARK-32641][SQL] withField + getField should return null if 
original struct was null

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/optimizer/ComplexTypes.scala  |  3 +-
 .../sql/catalyst/optimizer/complexTypesSuite.scala | 96 --
 .../apache/spark/sql/ColumnExpressionSuite.scala   | 85 +++
 3 files changed, 156 insertions(+), 28 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (cee48a9 -> 3f1e56d)

2020-08-24 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from cee48a9  [SPARK-32646][SQL][TEST-HADOOP2.7][TEST-HIVE1.2] ORC 
predicate pushdown should work with case-insensitive analysis
 add 3f1e56d  [SPARK-32641][SQL] withField + getField should return null if 
original struct was null

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/optimizer/ComplexTypes.scala  |  3 +-
 .../sql/catalyst/optimizer/complexTypesSuite.scala | 96 --
 .../apache/spark/sql/ColumnExpressionSuite.scala   | 85 +++
 3 files changed, 156 insertions(+), 28 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f540031 -> cee48a9)

2020-08-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f540031  [SPARK-31000][PYTHON][SQL] Add ability to set table 
description via Catalog.createTable()
 add cee48a9  [SPARK-32646][SQL][TEST-HADOOP2.7][TEST-HIVE1.2] ORC 
predicate pushdown should work with case-insensitive analysis

No new revisions were added by this update.

Summary of changes:
 .../execution/datasources/orc/OrcFileFormat.scala  | 16 +++--
 .../execution/datasources/orc/OrcFiltersBase.scala | 10 +++-
 .../sql/execution/datasources/orc/OrcUtils.scala   | 14 +
 .../v2/orc/OrcPartitionReaderFactory.scala | 22 ++-
 .../sql/execution/datasources/v2/orc/OrcScan.scala |  2 +-
 .../datasources/v2/orc/OrcScanBuilder.scala|  5 --
 .../sql/execution/datasources/orc/OrcFilters.scala | 48 ---
 .../execution/datasources/orc/OrcFilterSuite.scala | 70 +-
 .../sql/execution/datasources/orc/OrcFilters.scala | 50 +---
 .../execution/datasources/orc/OrcFilterSuite.scala | 70 +-
 10 files changed, 245 insertions(+), 62 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (cee48a9 -> 3f1e56d)

2020-08-24 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from cee48a9  [SPARK-32646][SQL][TEST-HADOOP2.7][TEST-HIVE1.2] ORC 
predicate pushdown should work with case-insensitive analysis
 add 3f1e56d  [SPARK-32641][SQL] withField + getField should return null if 
original struct was null

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/optimizer/ComplexTypes.scala  |  3 +-
 .../sql/catalyst/optimizer/complexTypesSuite.scala | 96 --
 .../apache/spark/sql/ColumnExpressionSuite.scala   | 85 +++
 3 files changed, 156 insertions(+), 28 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f540031 -> cee48a9)

2020-08-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f540031  [SPARK-31000][PYTHON][SQL] Add ability to set table 
description via Catalog.createTable()
 add cee48a9  [SPARK-32646][SQL][TEST-HADOOP2.7][TEST-HIVE1.2] ORC 
predicate pushdown should work with case-insensitive analysis

No new revisions were added by this update.

Summary of changes:
 .../execution/datasources/orc/OrcFileFormat.scala  | 16 +++--
 .../execution/datasources/orc/OrcFiltersBase.scala | 10 +++-
 .../sql/execution/datasources/orc/OrcUtils.scala   | 14 +
 .../v2/orc/OrcPartitionReaderFactory.scala | 22 ++-
 .../sql/execution/datasources/v2/orc/OrcScan.scala |  2 +-
 .../datasources/v2/orc/OrcScanBuilder.scala|  5 --
 .../sql/execution/datasources/orc/OrcFilters.scala | 48 ---
 .../execution/datasources/orc/OrcFilterSuite.scala | 70 +-
 .../sql/execution/datasources/orc/OrcFilters.scala | 50 +---
 .../execution/datasources/orc/OrcFilterSuite.scala | 70 +-
 10 files changed, 245 insertions(+), 62 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (cee48a9 -> 3f1e56d)

2020-08-24 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from cee48a9  [SPARK-32646][SQL][TEST-HADOOP2.7][TEST-HIVE1.2] ORC 
predicate pushdown should work with case-insensitive analysis
 add 3f1e56d  [SPARK-32641][SQL] withField + getField should return null if 
original struct was null

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/optimizer/ComplexTypes.scala  |  3 +-
 .../sql/catalyst/optimizer/complexTypesSuite.scala | 96 --
 .../apache/spark/sql/ColumnExpressionSuite.scala   | 85 +++
 3 files changed, 156 insertions(+), 28 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f540031 -> cee48a9)

2020-08-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f540031  [SPARK-31000][PYTHON][SQL] Add ability to set table 
description via Catalog.createTable()
 add cee48a9  [SPARK-32646][SQL][TEST-HADOOP2.7][TEST-HIVE1.2] ORC 
predicate pushdown should work with case-insensitive analysis

No new revisions were added by this update.

Summary of changes:
 .../execution/datasources/orc/OrcFileFormat.scala  | 16 +++--
 .../execution/datasources/orc/OrcFiltersBase.scala | 10 +++-
 .../sql/execution/datasources/orc/OrcUtils.scala   | 14 +
 .../v2/orc/OrcPartitionReaderFactory.scala | 22 ++-
 .../sql/execution/datasources/v2/orc/OrcScan.scala |  2 +-
 .../datasources/v2/orc/OrcScanBuilder.scala|  5 --
 .../sql/execution/datasources/orc/OrcFilters.scala | 48 ---
 .../execution/datasources/orc/OrcFilterSuite.scala | 70 +-
 .../sql/execution/datasources/orc/OrcFilters.scala | 50 +---
 .../execution/datasources/orc/OrcFilterSuite.scala | 70 +-
 10 files changed, 245 insertions(+), 62 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32646][SQL][3.0][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate pushdown should work with case-insensitive analysis

2020-08-24 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 6c88d7c  [SPARK-32646][SQL][3.0][TEST-HADOOP2.7][TEST-HIVE1.2] ORC 
predicate pushdown should work with case-insensitive analysis
6c88d7c is described below

commit 6c88d7c1259ea9fe89f5c8190c683bba506d528e
Author: Liang-Chi Hsieh 
AuthorDate: Tue Aug 25 04:42:39 2020 +

[SPARK-32646][SQL][3.0][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate 
pushdown should work with case-insensitive analysis

### What changes were proposed in this pull request?

This PR proposes to fix ORC predicate pushdown under case-insensitive 
analysis case. The field names in pushed down predicates don't need to match in 
exact letter case with physical field names in ORC files, if we enable 
case-insensitive analysis.

### Why are the changes needed?

Currently ORC predicate pushdown doesn't work with case-insensitive 
analysis. A predicate "a < 0" cannot pushdown to ORC file with field name "A" 
under case-insensitive analysis.

But Parquet predicate pushdown works with this case. We should make ORC 
predicate pushdown work with case-insensitive analysis too.

### Does this PR introduce _any_ user-facing change?

Yes, after this PR, under case-insensitive analysis, ORC predicate pushdown 
will work.

### How was this patch tested?

Unit tests.

Closes #29513 from viirya/fix-orc-pushdown-3.0.

Authored-by: Liang-Chi Hsieh 
Signed-off-by: Wenchen Fan 
---
 .../execution/datasources/orc/OrcFileFormat.scala  | 16 +++--
 .../execution/datasources/orc/OrcFiltersBase.scala | 35 ++-
 .../sql/execution/datasources/orc/OrcUtils.scala   | 14 +
 .../v2/orc/OrcPartitionReaderFactory.scala | 22 ++-
 .../sql/execution/datasources/v2/orc/OrcScan.scala |  2 +-
 .../datasources/v2/orc/OrcScanBuilder.scala|  9 +--
 .../sql/execution/datasources/orc/OrcFilters.scala | 72 --
 .../execution/datasources/orc/OrcFilterSuite.scala | 49 ++-
 .../sql/execution/datasources/orc/OrcFilters.scala | 70 +++--
 .../execution/datasources/orc/OrcFilterSuite.scala | 49 ++-
 10 files changed, 253 insertions(+), 85 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
index 4dff1ec..69badb4 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
@@ -153,11 +153,6 @@ class OrcFileFormat
   filters: Seq[Filter],
   options: Map[String, String],
   hadoopConf: Configuration): (PartitionedFile) => Iterator[InternalRow] = 
{
-if (sparkSession.sessionState.conf.orcFilterPushDown) {
-  OrcFilters.createFilter(dataSchema, filters).foreach { f =>
-OrcInputFormat.setSearchArgument(hadoopConf, f, dataSchema.fieldNames)
-  }
-}
 
 val resultSchema = StructType(requiredSchema.fields ++ 
partitionSchema.fields)
 val sqlConf = sparkSession.sessionState.conf
@@ -169,6 +164,8 @@ class OrcFileFormat
 val broadcastedConf =
   sparkSession.sparkContext.broadcast(new 
SerializableConfiguration(hadoopConf))
 val isCaseSensitive = sparkSession.sessionState.conf.caseSensitiveAnalysis
+val orcFilterPushDown = sparkSession.sessionState.conf.orcFilterPushDown
+val ignoreCorruptFiles = sparkSession.sessionState.conf.ignoreCorruptFiles
 
 (file: PartitionedFile) => {
   val conf = broadcastedConf.value.value
@@ -186,6 +183,15 @@ class OrcFileFormat
   if (resultedColPruneInfo.isEmpty) {
 Iterator.empty
   } else {
+// ORC predicate pushdown
+if (orcFilterPushDown) {
+  OrcUtils.readCatalystSchema(filePath, conf, ignoreCorruptFiles).map 
{ fileSchema =>
+OrcFilters.createFilter(fileSchema, filters).foreach { f =>
+  OrcInputFormat.setSearchArgument(conf, f, fileSchema.fieldNames)
+}
+  }
+}
+
 val (requestedColIds, canPruneCols) = resultedColPruneInfo.get
 val resultSchemaString = OrcUtils.orcResultSchemaString(canPruneCols,
   dataSchema, resultSchema, partitionSchema, conf)
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFiltersBase.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFiltersBase.scala
index e673309..4554899 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFiltersBase.scala
+++

[spark] branch master updated (9151a58 -> f540031)

2020-08-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9151a58  [SPARK-31608][CORE][WEBUI][TEST] Add test suites for 
HybridStore and HistoryServerMemoryManager
 add f540031  [SPARK-31000][PYTHON][SQL] Add ability to set table 
description via Catalog.createTable()

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/catalog.py  | 13 +++-
 python/pyspark/sql/tests/test_catalog.py   | 17 -
 .../org/apache/spark/sql/catalog/Catalog.scala | 79 ++
 .../apache/spark/sql/internal/CatalogImpl.scala| 42 +++-
 .../apache/spark/sql/internal/CatalogSuite.scala   |  4 ++
 5 files changed, 149 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f540031 -> cee48a9)

2020-08-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f540031  [SPARK-31000][PYTHON][SQL] Add ability to set table 
description via Catalog.createTable()
 add cee48a9  [SPARK-32646][SQL][TEST-HADOOP2.7][TEST-HIVE1.2] ORC 
predicate pushdown should work with case-insensitive analysis

No new revisions were added by this update.

Summary of changes:
 .../execution/datasources/orc/OrcFileFormat.scala  | 16 +++--
 .../execution/datasources/orc/OrcFiltersBase.scala | 10 +++-
 .../sql/execution/datasources/orc/OrcUtils.scala   | 14 +
 .../v2/orc/OrcPartitionReaderFactory.scala | 22 ++-
 .../sql/execution/datasources/v2/orc/OrcScan.scala |  2 +-
 .../datasources/v2/orc/OrcScanBuilder.scala|  5 --
 .../sql/execution/datasources/orc/OrcFilters.scala | 48 ---
 .../execution/datasources/orc/OrcFilterSuite.scala | 70 +-
 .../sql/execution/datasources/orc/OrcFilters.scala | 50 +---
 .../execution/datasources/orc/OrcFilterSuite.scala | 70 +-
 10 files changed, 245 insertions(+), 62 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32646][SQL][3.0][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate pushdown should work with case-insensitive analysis

2020-08-24 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 6c88d7c  [SPARK-32646][SQL][3.0][TEST-HADOOP2.7][TEST-HIVE1.2] ORC 
predicate pushdown should work with case-insensitive analysis
6c88d7c is described below

commit 6c88d7c1259ea9fe89f5c8190c683bba506d528e
Author: Liang-Chi Hsieh 
AuthorDate: Tue Aug 25 04:42:39 2020 +

[SPARK-32646][SQL][3.0][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate 
pushdown should work with case-insensitive analysis

### What changes were proposed in this pull request?

This PR proposes to fix ORC predicate pushdown under case-insensitive 
analysis case. The field names in pushed down predicates don't need to match in 
exact letter case with physical field names in ORC files, if we enable 
case-insensitive analysis.

### Why are the changes needed?

Currently ORC predicate pushdown doesn't work with case-insensitive 
analysis. A predicate "a < 0" cannot pushdown to ORC file with field name "A" 
under case-insensitive analysis.

But Parquet predicate pushdown works with this case. We should make ORC 
predicate pushdown work with case-insensitive analysis too.

### Does this PR introduce _any_ user-facing change?

Yes, after this PR, under case-insensitive analysis, ORC predicate pushdown 
will work.

### How was this patch tested?

Unit tests.

Closes #29513 from viirya/fix-orc-pushdown-3.0.

Authored-by: Liang-Chi Hsieh 
Signed-off-by: Wenchen Fan 
---
 .../execution/datasources/orc/OrcFileFormat.scala  | 16 +++--
 .../execution/datasources/orc/OrcFiltersBase.scala | 35 ++-
 .../sql/execution/datasources/orc/OrcUtils.scala   | 14 +
 .../v2/orc/OrcPartitionReaderFactory.scala | 22 ++-
 .../sql/execution/datasources/v2/orc/OrcScan.scala |  2 +-
 .../datasources/v2/orc/OrcScanBuilder.scala|  9 +--
 .../sql/execution/datasources/orc/OrcFilters.scala | 72 --
 .../execution/datasources/orc/OrcFilterSuite.scala | 49 ++-
 .../sql/execution/datasources/orc/OrcFilters.scala | 70 +++--
 .../execution/datasources/orc/OrcFilterSuite.scala | 49 ++-
 10 files changed, 253 insertions(+), 85 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
index 4dff1ec..69badb4 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
@@ -153,11 +153,6 @@ class OrcFileFormat
   filters: Seq[Filter],
   options: Map[String, String],
   hadoopConf: Configuration): (PartitionedFile) => Iterator[InternalRow] = 
{
-if (sparkSession.sessionState.conf.orcFilterPushDown) {
-  OrcFilters.createFilter(dataSchema, filters).foreach { f =>
-OrcInputFormat.setSearchArgument(hadoopConf, f, dataSchema.fieldNames)
-  }
-}
 
 val resultSchema = StructType(requiredSchema.fields ++ 
partitionSchema.fields)
 val sqlConf = sparkSession.sessionState.conf
@@ -169,6 +164,8 @@ class OrcFileFormat
 val broadcastedConf =
   sparkSession.sparkContext.broadcast(new 
SerializableConfiguration(hadoopConf))
 val isCaseSensitive = sparkSession.sessionState.conf.caseSensitiveAnalysis
+val orcFilterPushDown = sparkSession.sessionState.conf.orcFilterPushDown
+val ignoreCorruptFiles = sparkSession.sessionState.conf.ignoreCorruptFiles
 
 (file: PartitionedFile) => {
   val conf = broadcastedConf.value.value
@@ -186,6 +183,15 @@ class OrcFileFormat
   if (resultedColPruneInfo.isEmpty) {
 Iterator.empty
   } else {
+// ORC predicate pushdown
+if (orcFilterPushDown) {
+  OrcUtils.readCatalystSchema(filePath, conf, ignoreCorruptFiles).map 
{ fileSchema =>
+OrcFilters.createFilter(fileSchema, filters).foreach { f =>
+  OrcInputFormat.setSearchArgument(conf, f, fileSchema.fieldNames)
+}
+  }
+}
+
 val (requestedColIds, canPruneCols) = resultedColPruneInfo.get
 val resultSchemaString = OrcUtils.orcResultSchemaString(canPruneCols,
   dataSchema, resultSchema, partitionSchema, conf)
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFiltersBase.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFiltersBase.scala
index e673309..4554899 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFiltersBase.scala
+++

[spark] branch master updated (9151a58 -> f540031)

2020-08-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9151a58  [SPARK-31608][CORE][WEBUI][TEST] Add test suites for 
HybridStore and HistoryServerMemoryManager
 add f540031  [SPARK-31000][PYTHON][SQL] Add ability to set table 
description via Catalog.createTable()

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/catalog.py  | 13 +++-
 python/pyspark/sql/tests/test_catalog.py   | 17 -
 .../org/apache/spark/sql/catalog/Catalog.scala | 79 ++
 .../apache/spark/sql/internal/CatalogImpl.scala| 42 +++-
 .../apache/spark/sql/internal/CatalogSuite.scala   |  4 ++
 5 files changed, 149 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f540031 -> cee48a9)

2020-08-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f540031  [SPARK-31000][PYTHON][SQL] Add ability to set table 
description via Catalog.createTable()
 add cee48a9  [SPARK-32646][SQL][TEST-HADOOP2.7][TEST-HIVE1.2] ORC 
predicate pushdown should work with case-insensitive analysis

No new revisions were added by this update.

Summary of changes:
 .../execution/datasources/orc/OrcFileFormat.scala  | 16 +++--
 .../execution/datasources/orc/OrcFiltersBase.scala | 10 +++-
 .../sql/execution/datasources/orc/OrcUtils.scala   | 14 +
 .../v2/orc/OrcPartitionReaderFactory.scala | 22 ++-
 .../sql/execution/datasources/v2/orc/OrcScan.scala |  2 +-
 .../datasources/v2/orc/OrcScanBuilder.scala|  5 --
 .../sql/execution/datasources/orc/OrcFilters.scala | 48 ---
 .../execution/datasources/orc/OrcFilterSuite.scala | 70 +-
 .../sql/execution/datasources/orc/OrcFilters.scala | 50 +---
 .../execution/datasources/orc/OrcFilterSuite.scala | 70 +-
 10 files changed, 245 insertions(+), 62 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32646][SQL][3.0][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate pushdown should work with case-insensitive analysis

2020-08-24 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 6c88d7c  [SPARK-32646][SQL][3.0][TEST-HADOOP2.7][TEST-HIVE1.2] ORC 
predicate pushdown should work with case-insensitive analysis
6c88d7c is described below

commit 6c88d7c1259ea9fe89f5c8190c683bba506d528e
Author: Liang-Chi Hsieh 
AuthorDate: Tue Aug 25 04:42:39 2020 +

[SPARK-32646][SQL][3.0][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate 
pushdown should work with case-insensitive analysis

### What changes were proposed in this pull request?

This PR proposes to fix ORC predicate pushdown under case-insensitive 
analysis case. The field names in pushed down predicates don't need to match in 
exact letter case with physical field names in ORC files, if we enable 
case-insensitive analysis.

### Why are the changes needed?

Currently ORC predicate pushdown doesn't work with case-insensitive 
analysis. A predicate "a < 0" cannot pushdown to ORC file with field name "A" 
under case-insensitive analysis.

But Parquet predicate pushdown works with this case. We should make ORC 
predicate pushdown work with case-insensitive analysis too.

### Does this PR introduce _any_ user-facing change?

Yes, after this PR, under case-insensitive analysis, ORC predicate pushdown 
will work.

### How was this patch tested?

Unit tests.

Closes #29513 from viirya/fix-orc-pushdown-3.0.

Authored-by: Liang-Chi Hsieh 
Signed-off-by: Wenchen Fan 
---
 .../execution/datasources/orc/OrcFileFormat.scala  | 16 +++--
 .../execution/datasources/orc/OrcFiltersBase.scala | 35 ++-
 .../sql/execution/datasources/orc/OrcUtils.scala   | 14 +
 .../v2/orc/OrcPartitionReaderFactory.scala | 22 ++-
 .../sql/execution/datasources/v2/orc/OrcScan.scala |  2 +-
 .../datasources/v2/orc/OrcScanBuilder.scala|  9 +--
 .../sql/execution/datasources/orc/OrcFilters.scala | 72 --
 .../execution/datasources/orc/OrcFilterSuite.scala | 49 ++-
 .../sql/execution/datasources/orc/OrcFilters.scala | 70 +++--
 .../execution/datasources/orc/OrcFilterSuite.scala | 49 ++-
 10 files changed, 253 insertions(+), 85 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
index 4dff1ec..69badb4 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
@@ -153,11 +153,6 @@ class OrcFileFormat
   filters: Seq[Filter],
   options: Map[String, String],
   hadoopConf: Configuration): (PartitionedFile) => Iterator[InternalRow] = 
{
-if (sparkSession.sessionState.conf.orcFilterPushDown) {
-  OrcFilters.createFilter(dataSchema, filters).foreach { f =>
-OrcInputFormat.setSearchArgument(hadoopConf, f, dataSchema.fieldNames)
-  }
-}
 
 val resultSchema = StructType(requiredSchema.fields ++ 
partitionSchema.fields)
 val sqlConf = sparkSession.sessionState.conf
@@ -169,6 +164,8 @@ class OrcFileFormat
 val broadcastedConf =
   sparkSession.sparkContext.broadcast(new 
SerializableConfiguration(hadoopConf))
 val isCaseSensitive = sparkSession.sessionState.conf.caseSensitiveAnalysis
+val orcFilterPushDown = sparkSession.sessionState.conf.orcFilterPushDown
+val ignoreCorruptFiles = sparkSession.sessionState.conf.ignoreCorruptFiles
 
 (file: PartitionedFile) => {
   val conf = broadcastedConf.value.value
@@ -186,6 +183,15 @@ class OrcFileFormat
   if (resultedColPruneInfo.isEmpty) {
 Iterator.empty
   } else {
+// ORC predicate pushdown
+if (orcFilterPushDown) {
+  OrcUtils.readCatalystSchema(filePath, conf, ignoreCorruptFiles).map 
{ fileSchema =>
+OrcFilters.createFilter(fileSchema, filters).foreach { f =>
+  OrcInputFormat.setSearchArgument(conf, f, fileSchema.fieldNames)
+}
+  }
+}
+
 val (requestedColIds, canPruneCols) = resultedColPruneInfo.get
 val resultSchemaString = OrcUtils.orcResultSchemaString(canPruneCols,
   dataSchema, resultSchema, partitionSchema, conf)
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFiltersBase.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFiltersBase.scala
index e673309..4554899 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFiltersBase.scala
+++

[spark] branch master updated (9151a58 -> f540031)

2020-08-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9151a58  [SPARK-31608][CORE][WEBUI][TEST] Add test suites for 
HybridStore and HistoryServerMemoryManager
 add f540031  [SPARK-31000][PYTHON][SQL] Add ability to set table 
description via Catalog.createTable()

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/catalog.py  | 13 +++-
 python/pyspark/sql/tests/test_catalog.py   | 17 -
 .../org/apache/spark/sql/catalog/Catalog.scala | 79 ++
 .../apache/spark/sql/internal/CatalogImpl.scala| 42 +++-
 .../apache/spark/sql/internal/CatalogSuite.scala   |  4 ++
 5 files changed, 149 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32646][SQL][3.0][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate pushdown should work with case-insensitive analysis

2020-08-24 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 6c88d7c  [SPARK-32646][SQL][3.0][TEST-HADOOP2.7][TEST-HIVE1.2] ORC 
predicate pushdown should work with case-insensitive analysis
6c88d7c is described below

commit 6c88d7c1259ea9fe89f5c8190c683bba506d528e
Author: Liang-Chi Hsieh 
AuthorDate: Tue Aug 25 04:42:39 2020 +

[SPARK-32646][SQL][3.0][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate 
pushdown should work with case-insensitive analysis

### What changes were proposed in this pull request?

This PR proposes to fix ORC predicate pushdown under case-insensitive 
analysis case. The field names in pushed down predicates don't need to match in 
exact letter case with physical field names in ORC files, if we enable 
case-insensitive analysis.

### Why are the changes needed?

Currently ORC predicate pushdown doesn't work with case-insensitive 
analysis. A predicate "a < 0" cannot pushdown to ORC file with field name "A" 
under case-insensitive analysis.

But Parquet predicate pushdown works with this case. We should make ORC 
predicate pushdown work with case-insensitive analysis too.

### Does this PR introduce _any_ user-facing change?

Yes, after this PR, under case-insensitive analysis, ORC predicate pushdown 
will work.

### How was this patch tested?

Unit tests.

Closes #29513 from viirya/fix-orc-pushdown-3.0.

Authored-by: Liang-Chi Hsieh 
Signed-off-by: Wenchen Fan 
---
 .../execution/datasources/orc/OrcFileFormat.scala  | 16 +++--
 .../execution/datasources/orc/OrcFiltersBase.scala | 35 ++-
 .../sql/execution/datasources/orc/OrcUtils.scala   | 14 +
 .../v2/orc/OrcPartitionReaderFactory.scala | 22 ++-
 .../sql/execution/datasources/v2/orc/OrcScan.scala |  2 +-
 .../datasources/v2/orc/OrcScanBuilder.scala|  9 +--
 .../sql/execution/datasources/orc/OrcFilters.scala | 72 --
 .../execution/datasources/orc/OrcFilterSuite.scala | 49 ++-
 .../sql/execution/datasources/orc/OrcFilters.scala | 70 +++--
 .../execution/datasources/orc/OrcFilterSuite.scala | 49 ++-
 10 files changed, 253 insertions(+), 85 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
index 4dff1ec..69badb4 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
@@ -153,11 +153,6 @@ class OrcFileFormat
   filters: Seq[Filter],
   options: Map[String, String],
   hadoopConf: Configuration): (PartitionedFile) => Iterator[InternalRow] = 
{
-if (sparkSession.sessionState.conf.orcFilterPushDown) {
-  OrcFilters.createFilter(dataSchema, filters).foreach { f =>
-OrcInputFormat.setSearchArgument(hadoopConf, f, dataSchema.fieldNames)
-  }
-}
 
 val resultSchema = StructType(requiredSchema.fields ++ 
partitionSchema.fields)
 val sqlConf = sparkSession.sessionState.conf
@@ -169,6 +164,8 @@ class OrcFileFormat
 val broadcastedConf =
   sparkSession.sparkContext.broadcast(new 
SerializableConfiguration(hadoopConf))
 val isCaseSensitive = sparkSession.sessionState.conf.caseSensitiveAnalysis
+val orcFilterPushDown = sparkSession.sessionState.conf.orcFilterPushDown
+val ignoreCorruptFiles = sparkSession.sessionState.conf.ignoreCorruptFiles
 
 (file: PartitionedFile) => {
   val conf = broadcastedConf.value.value
@@ -186,6 +183,15 @@ class OrcFileFormat
   if (resultedColPruneInfo.isEmpty) {
 Iterator.empty
   } else {
+// ORC predicate pushdown
+if (orcFilterPushDown) {
+  OrcUtils.readCatalystSchema(filePath, conf, ignoreCorruptFiles).map 
{ fileSchema =>
+OrcFilters.createFilter(fileSchema, filters).foreach { f =>
+  OrcInputFormat.setSearchArgument(conf, f, fileSchema.fieldNames)
+}
+  }
+}
+
 val (requestedColIds, canPruneCols) = resultedColPruneInfo.get
 val resultSchemaString = OrcUtils.orcResultSchemaString(canPruneCols,
   dataSchema, resultSchema, partitionSchema, conf)
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFiltersBase.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFiltersBase.scala
index e673309..4554899 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFiltersBase.scala
+++

[spark] branch master updated (9151a58 -> f540031)

2020-08-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9151a58  [SPARK-31608][CORE][WEBUI][TEST] Add test suites for 
HybridStore and HistoryServerMemoryManager
 add f540031  [SPARK-31000][PYTHON][SQL] Add ability to set table 
description via Catalog.createTable()

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/catalog.py  | 13 +++-
 python/pyspark/sql/tests/test_catalog.py   | 17 -
 .../org/apache/spark/sql/catalog/Catalog.scala | 79 ++
 .../apache/spark/sql/internal/CatalogImpl.scala| 42 +++-
 .../apache/spark/sql/internal/CatalogSuite.scala   |  4 ++
 5 files changed, 149 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32646][SQL][3.0][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate pushdown should work with case-insensitive analysis

2020-08-24 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 6c88d7c  [SPARK-32646][SQL][3.0][TEST-HADOOP2.7][TEST-HIVE1.2] ORC 
predicate pushdown should work with case-insensitive analysis
6c88d7c is described below

commit 6c88d7c1259ea9fe89f5c8190c683bba506d528e
Author: Liang-Chi Hsieh 
AuthorDate: Tue Aug 25 04:42:39 2020 +

[SPARK-32646][SQL][3.0][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate 
pushdown should work with case-insensitive analysis

### What changes were proposed in this pull request?

This PR proposes to fix ORC predicate pushdown under case-insensitive 
analysis case. The field names in pushed down predicates don't need to match in 
exact letter case with physical field names in ORC files, if we enable 
case-insensitive analysis.

### Why are the changes needed?

Currently ORC predicate pushdown doesn't work with case-insensitive 
analysis. A predicate "a < 0" cannot pushdown to ORC file with field name "A" 
under case-insensitive analysis.

But Parquet predicate pushdown works with this case. We should make ORC 
predicate pushdown work with case-insensitive analysis too.

### Does this PR introduce _any_ user-facing change?

Yes, after this PR, under case-insensitive analysis, ORC predicate pushdown 
will work.

### How was this patch tested?

Unit tests.

Closes #29513 from viirya/fix-orc-pushdown-3.0.

Authored-by: Liang-Chi Hsieh 
Signed-off-by: Wenchen Fan 
---
 .../execution/datasources/orc/OrcFileFormat.scala  | 16 +++--
 .../execution/datasources/orc/OrcFiltersBase.scala | 35 ++-
 .../sql/execution/datasources/orc/OrcUtils.scala   | 14 +
 .../v2/orc/OrcPartitionReaderFactory.scala | 22 ++-
 .../sql/execution/datasources/v2/orc/OrcScan.scala |  2 +-
 .../datasources/v2/orc/OrcScanBuilder.scala|  9 +--
 .../sql/execution/datasources/orc/OrcFilters.scala | 72 --
 .../execution/datasources/orc/OrcFilterSuite.scala | 49 ++-
 .../sql/execution/datasources/orc/OrcFilters.scala | 70 +++--
 .../execution/datasources/orc/OrcFilterSuite.scala | 49 ++-
 10 files changed, 253 insertions(+), 85 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
index 4dff1ec..69badb4 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
@@ -153,11 +153,6 @@ class OrcFileFormat
   filters: Seq[Filter],
   options: Map[String, String],
   hadoopConf: Configuration): (PartitionedFile) => Iterator[InternalRow] = 
{
-if (sparkSession.sessionState.conf.orcFilterPushDown) {
-  OrcFilters.createFilter(dataSchema, filters).foreach { f =>
-OrcInputFormat.setSearchArgument(hadoopConf, f, dataSchema.fieldNames)
-  }
-}
 
 val resultSchema = StructType(requiredSchema.fields ++ 
partitionSchema.fields)
 val sqlConf = sparkSession.sessionState.conf
@@ -169,6 +164,8 @@ class OrcFileFormat
 val broadcastedConf =
   sparkSession.sparkContext.broadcast(new 
SerializableConfiguration(hadoopConf))
 val isCaseSensitive = sparkSession.sessionState.conf.caseSensitiveAnalysis
+val orcFilterPushDown = sparkSession.sessionState.conf.orcFilterPushDown
+val ignoreCorruptFiles = sparkSession.sessionState.conf.ignoreCorruptFiles
 
 (file: PartitionedFile) => {
   val conf = broadcastedConf.value.value
@@ -186,6 +183,15 @@ class OrcFileFormat
   if (resultedColPruneInfo.isEmpty) {
 Iterator.empty
   } else {
+// ORC predicate pushdown
+if (orcFilterPushDown) {
+  OrcUtils.readCatalystSchema(filePath, conf, ignoreCorruptFiles).map 
{ fileSchema =>
+OrcFilters.createFilter(fileSchema, filters).foreach { f =>
+  OrcInputFormat.setSearchArgument(conf, f, fileSchema.fieldNames)
+}
+  }
+}
+
 val (requestedColIds, canPruneCols) = resultedColPruneInfo.get
 val resultSchemaString = OrcUtils.orcResultSchemaString(canPruneCols,
   dataSchema, resultSchema, partitionSchema, conf)
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFiltersBase.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFiltersBase.scala
index e673309..4554899 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFiltersBase.scala
+++

[spark] branch master updated (9151a58 -> f540031)

2020-08-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9151a58  [SPARK-31608][CORE][WEBUI][TEST] Add test suites for 
HybridStore and HistoryServerMemoryManager
 add f540031  [SPARK-31000][PYTHON][SQL] Add ability to set table 
description via Catalog.createTable()

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/catalog.py  | 13 +++-
 python/pyspark/sql/tests/test_catalog.py   | 17 -
 .../org/apache/spark/sql/catalog/Catalog.scala | 79 ++
 .../apache/spark/sql/internal/CatalogImpl.scala| 42 +++-
 .../apache/spark/sql/internal/CatalogSuite.scala   |  4 ++
 5 files changed, 149 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (3eee915 -> 9151a58)

2020-08-24 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3eee915  [MINOR][SQL] Add missing documentation for LongType mapping
 add 9151a58  [SPARK-31608][CORE][WEBUI][TEST] Add test suites for 
HybridStore and HistoryServerMemoryManager

No new revisions were added by this update.

Summary of changes:
 .../history/HistoryServerMemoryManager.scala   |   5 +-
 .../apache/spark/deploy/history/HybridStore.scala  |   6 +-
 .../deploy/history/FsHistoryProviderSuite.scala|  13 +-
 .../history/HistoryServerMemoryManagerSuite.scala  |  55 +
 .../spark/deploy/history/HybridStoreSuite.scala| 232 +
 5 files changed, 304 insertions(+), 7 deletions(-)
 create mode 100644 
core/src/test/scala/org/apache/spark/deploy/history/HistoryServerMemoryManagerSuite.scala
 create mode 100644 
core/src/test/scala/org/apache/spark/deploy/history/HybridStoreSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (3eee915 -> 9151a58)

2020-08-24 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3eee915  [MINOR][SQL] Add missing documentation for LongType mapping
 add 9151a58  [SPARK-31608][CORE][WEBUI][TEST] Add test suites for 
HybridStore and HistoryServerMemoryManager

No new revisions were added by this update.

Summary of changes:
 .../history/HistoryServerMemoryManager.scala   |   5 +-
 .../apache/spark/deploy/history/HybridStore.scala  |   6 +-
 .../deploy/history/FsHistoryProviderSuite.scala|  13 +-
 .../history/HistoryServerMemoryManagerSuite.scala  |  55 +
 .../spark/deploy/history/HybridStoreSuite.scala| 232 +
 5 files changed, 304 insertions(+), 7 deletions(-)
 create mode 100644 
core/src/test/scala/org/apache/spark/deploy/history/HistoryServerMemoryManagerSuite.scala
 create mode 100644 
core/src/test/scala/org/apache/spark/deploy/history/HybridStoreSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (3eee915 -> 9151a58)

2020-08-24 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3eee915  [MINOR][SQL] Add missing documentation for LongType mapping
 add 9151a58  [SPARK-31608][CORE][WEBUI][TEST] Add test suites for 
HybridStore and HistoryServerMemoryManager

No new revisions were added by this update.

Summary of changes:
 .../history/HistoryServerMemoryManager.scala   |   5 +-
 .../apache/spark/deploy/history/HybridStore.scala  |   6 +-
 .../deploy/history/FsHistoryProviderSuite.scala|  13 +-
 .../history/HistoryServerMemoryManagerSuite.scala  |  55 +
 .../spark/deploy/history/HybridStoreSuite.scala| 232 +
 5 files changed, 304 insertions(+), 7 deletions(-)
 create mode 100644 
core/src/test/scala/org/apache/spark/deploy/history/HistoryServerMemoryManagerSuite.scala
 create mode 100644 
core/src/test/scala/org/apache/spark/deploy/history/HybridStoreSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (3eee915 -> 9151a58)

2020-08-24 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3eee915  [MINOR][SQL] Add missing documentation for LongType mapping
 add 9151a58  [SPARK-31608][CORE][WEBUI][TEST] Add test suites for 
HybridStore and HistoryServerMemoryManager

No new revisions were added by this update.

Summary of changes:
 .../history/HistoryServerMemoryManager.scala   |   5 +-
 .../apache/spark/deploy/history/HybridStore.scala  |   6 +-
 .../deploy/history/FsHistoryProviderSuite.scala|  13 +-
 .../history/HistoryServerMemoryManagerSuite.scala  |  55 +
 .../spark/deploy/history/HybridStoreSuite.scala| 232 +
 5 files changed, 304 insertions(+), 7 deletions(-)
 create mode 100644 
core/src/test/scala/org/apache/spark/deploy/history/HistoryServerMemoryManagerSuite.scala
 create mode 100644 
core/src/test/scala/org/apache/spark/deploy/history/HybridStoreSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (3eee915 -> 9151a58)

2020-08-24 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3eee915  [MINOR][SQL] Add missing documentation for LongType mapping
 add 9151a58  [SPARK-31608][CORE][WEBUI][TEST] Add test suites for 
HybridStore and HistoryServerMemoryManager

No new revisions were added by this update.

Summary of changes:
 .../history/HistoryServerMemoryManager.scala   |   5 +-
 .../apache/spark/deploy/history/HybridStore.scala  |   6 +-
 .../deploy/history/FsHistoryProviderSuite.scala|  13 +-
 .../history/HistoryServerMemoryManagerSuite.scala  |  55 +
 .../spark/deploy/history/HybridStoreSuite.scala| 232 +
 5 files changed, 304 insertions(+), 7 deletions(-)
 create mode 100644 
core/src/test/scala/org/apache/spark/deploy/history/HistoryServerMemoryManagerSuite.scala
 create mode 100644 
core/src/test/scala/org/apache/spark/deploy/history/HybridStoreSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated (b3f7989 -> d7e1746)

2020-08-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b3f7989  [SPARK-32672][SQL] Fix data corruption in boolean bit set 
compression
 add d7e1746  [MINOR][SQL] Add missing documentation for LongType mapping

No new revisions were added by this update.

Summary of changes:
 sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala | 2 ++
 1 file changed, 2 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated (007acba -> 82aef3e)

2020-08-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 007acba  [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans
 add 82aef3e  [MINOR][SQL] Add missing documentation for LongType mapping

No new revisions were added by this update.

Summary of changes:
 sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala | 2 ++
 1 file changed, 2 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a30bb0c -> 3eee915)

2020-08-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a30bb0c  [SPARK-32550][SQL][FOLLOWUP] Eliminate negative impact on 
HyperLogLogSuite
 add 3eee915  [MINOR][SQL] Add missing documentation for LongType mapping

No new revisions were added by this update.

Summary of changes:
 sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala | 2 ++
 1 file changed, 2 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [MINOR][SQL] Add missing documentation for LongType mapping

2020-08-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new d7e1746  [MINOR][SQL] Add missing documentation for LongType mapping
d7e1746 is described below

commit d7e1746092d643728c81475f60d543cae0e0192c
Author: Yesheng Ma 
AuthorDate: Tue Aug 25 11:20:01 2020 +0900

[MINOR][SQL] Add missing documentation for LongType mapping

### What changes were proposed in this pull request?

Added Java docs for Long data types in the Row class.

### Why are the changes needed?

The Long datatype is somehow missing in Row.scala's `apply` and `get` 
methods.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing UTs.

Closes #29534 from yeshengm/docs-fix.

Authored-by: Yesheng Ma 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 3eee915b474c58cff9ea108f67073ed9c0c86224)
Signed-off-by: HyukjinKwon 
---
 sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala
index 180c2d1..d01b1ed 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala
@@ -146,6 +146,7 @@ trait Row extends Serializable {
*   ByteType -> java.lang.Byte
*   ShortType -> java.lang.Short
*   IntegerType -> java.lang.Integer
+   *   LongType -> java.lang.Long
*   FloatType -> java.lang.Float
*   DoubleType -> java.lang.Double
*   StringType -> String
@@ -171,6 +172,7 @@ trait Row extends Serializable {
*   ByteType -> java.lang.Byte
*   ShortType -> java.lang.Short
*   IntegerType -> java.lang.Integer
+   *   LongType -> java.lang.Long
*   FloatType -> java.lang.Float
*   DoubleType -> java.lang.Double
*   StringType -> String


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated (007acba -> 82aef3e)

2020-08-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 007acba  [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans
 add 82aef3e  [MINOR][SQL] Add missing documentation for LongType mapping

No new revisions were added by this update.

Summary of changes:
 sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala | 2 ++
 1 file changed, 2 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a30bb0c -> 3eee915)

2020-08-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a30bb0c  [SPARK-32550][SQL][FOLLOWUP] Eliminate negative impact on 
HyperLogLogSuite
 add 3eee915  [MINOR][SQL] Add missing documentation for LongType mapping

No new revisions were added by this update.

Summary of changes:
 sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala | 2 ++
 1 file changed, 2 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [MINOR][SQL] Add missing documentation for LongType mapping

2020-08-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new d7e1746  [MINOR][SQL] Add missing documentation for LongType mapping
d7e1746 is described below

commit d7e1746092d643728c81475f60d543cae0e0192c
Author: Yesheng Ma 
AuthorDate: Tue Aug 25 11:20:01 2020 +0900

[MINOR][SQL] Add missing documentation for LongType mapping

### What changes were proposed in this pull request?

Added Java docs for Long data types in the Row class.

### Why are the changes needed?

The Long datatype is somehow missing in Row.scala's `apply` and `get` 
methods.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing UTs.

Closes #29534 from yeshengm/docs-fix.

Authored-by: Yesheng Ma 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 3eee915b474c58cff9ea108f67073ed9c0c86224)
Signed-off-by: HyukjinKwon 
---
 sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala
index 180c2d1..d01b1ed 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala
@@ -146,6 +146,7 @@ trait Row extends Serializable {
*   ByteType -> java.lang.Byte
*   ShortType -> java.lang.Short
*   IntegerType -> java.lang.Integer
+   *   LongType -> java.lang.Long
*   FloatType -> java.lang.Float
*   DoubleType -> java.lang.Double
*   StringType -> String
@@ -171,6 +172,7 @@ trait Row extends Serializable {
*   ByteType -> java.lang.Byte
*   ShortType -> java.lang.Short
*   IntegerType -> java.lang.Integer
+   *   LongType -> java.lang.Long
*   FloatType -> java.lang.Float
*   DoubleType -> java.lang.Double
*   StringType -> String


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated (007acba -> 82aef3e)

2020-08-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 007acba  [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans
 add 82aef3e  [MINOR][SQL] Add missing documentation for LongType mapping

No new revisions were added by this update.

Summary of changes:
 sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala | 2 ++
 1 file changed, 2 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a30bb0c -> 3eee915)

2020-08-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a30bb0c  [SPARK-32550][SQL][FOLLOWUP] Eliminate negative impact on 
HyperLogLogSuite
 add 3eee915  [MINOR][SQL] Add missing documentation for LongType mapping

No new revisions were added by this update.

Summary of changes:
 sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala | 2 ++
 1 file changed, 2 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [MINOR][SQL] Add missing documentation for LongType mapping

2020-08-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new d7e1746  [MINOR][SQL] Add missing documentation for LongType mapping
d7e1746 is described below

commit d7e1746092d643728c81475f60d543cae0e0192c
Author: Yesheng Ma 
AuthorDate: Tue Aug 25 11:20:01 2020 +0900

[MINOR][SQL] Add missing documentation for LongType mapping

### What changes were proposed in this pull request?

Added Java docs for Long data types in the Row class.

### Why are the changes needed?

The Long datatype is somehow missing in Row.scala's `apply` and `get` 
methods.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing UTs.

Closes #29534 from yeshengm/docs-fix.

Authored-by: Yesheng Ma 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 3eee915b474c58cff9ea108f67073ed9c0c86224)
Signed-off-by: HyukjinKwon 
---
 sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala
index 180c2d1..d01b1ed 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala
@@ -146,6 +146,7 @@ trait Row extends Serializable {
*   ByteType -> java.lang.Byte
*   ShortType -> java.lang.Short
*   IntegerType -> java.lang.Integer
+   *   LongType -> java.lang.Long
*   FloatType -> java.lang.Float
*   DoubleType -> java.lang.Double
*   StringType -> String
@@ -171,6 +172,7 @@ trait Row extends Serializable {
*   ByteType -> java.lang.Byte
*   ShortType -> java.lang.Short
*   IntegerType -> java.lang.Integer
+   *   LongType -> java.lang.Long
*   FloatType -> java.lang.Float
*   DoubleType -> java.lang.Double
*   StringType -> String


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated (007acba -> 82aef3e)

2020-08-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 007acba  [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans
 add 82aef3e  [MINOR][SQL] Add missing documentation for LongType mapping

No new revisions were added by this update.

Summary of changes:
 sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala | 2 ++
 1 file changed, 2 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a30bb0c -> 3eee915)

2020-08-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a30bb0c  [SPARK-32550][SQL][FOLLOWUP] Eliminate negative impact on 
HyperLogLogSuite
 add 3eee915  [MINOR][SQL] Add missing documentation for LongType mapping

No new revisions were added by this update.

Summary of changes:
 sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala | 2 ++
 1 file changed, 2 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (41cf1d0 -> a30bb0c)

2020-08-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 41cf1d0  [SPARK-32686][PYTHON] Un-deprecate inferring DataFrame schema 
from list of dict
 add a30bb0c  [SPARK-32550][SQL][FOLLOWUP] Eliminate negative impact on 
HyperLogLogSuite

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/expressions/SpecificInternalRow.scala | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [MINOR][SQL] Add missing documentation for LongType mapping

2020-08-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new d7e1746  [MINOR][SQL] Add missing documentation for LongType mapping
d7e1746 is described below

commit d7e1746092d643728c81475f60d543cae0e0192c
Author: Yesheng Ma 
AuthorDate: Tue Aug 25 11:20:01 2020 +0900

[MINOR][SQL] Add missing documentation for LongType mapping

### What changes were proposed in this pull request?

Added Java docs for Long data types in the Row class.

### Why are the changes needed?

The Long datatype is somehow missing in Row.scala's `apply` and `get` 
methods.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing UTs.

Closes #29534 from yeshengm/docs-fix.

Authored-by: Yesheng Ma 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 3eee915b474c58cff9ea108f67073ed9c0c86224)
Signed-off-by: HyukjinKwon 
---
 sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala
index 180c2d1..d01b1ed 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala
@@ -146,6 +146,7 @@ trait Row extends Serializable {
*   ByteType -> java.lang.Byte
*   ShortType -> java.lang.Short
*   IntegerType -> java.lang.Integer
+   *   LongType -> java.lang.Long
*   FloatType -> java.lang.Float
*   DoubleType -> java.lang.Double
*   StringType -> String
@@ -171,6 +172,7 @@ trait Row extends Serializable {
*   ByteType -> java.lang.Byte
*   ShortType -> java.lang.Short
*   IntegerType -> java.lang.Integer
+   *   LongType -> java.lang.Long
*   FloatType -> java.lang.Float
*   DoubleType -> java.lang.Double
*   StringType -> String


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated (007acba -> 82aef3e)

2020-08-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 007acba  [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans
 add 82aef3e  [MINOR][SQL] Add missing documentation for LongType mapping

No new revisions were added by this update.

Summary of changes:
 sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala | 2 ++
 1 file changed, 2 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a30bb0c -> 3eee915)

2020-08-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a30bb0c  [SPARK-32550][SQL][FOLLOWUP] Eliminate negative impact on 
HyperLogLogSuite
 add 3eee915  [MINOR][SQL] Add missing documentation for LongType mapping

No new revisions were added by this update.

Summary of changes:
 sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala | 2 ++
 1 file changed, 2 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (41cf1d0 -> a30bb0c)

2020-08-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 41cf1d0  [SPARK-32686][PYTHON] Un-deprecate inferring DataFrame schema 
from list of dict
 add a30bb0c  [SPARK-32550][SQL][FOLLOWUP] Eliminate negative impact on 
HyperLogLogSuite

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/expressions/SpecificInternalRow.scala | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (41cf1d0 -> a30bb0c)

2020-08-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 41cf1d0  [SPARK-32686][PYTHON] Un-deprecate inferring DataFrame schema 
from list of dict
 add a30bb0c  [SPARK-32550][SQL][FOLLOWUP] Eliminate negative impact on 
HyperLogLogSuite

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/expressions/SpecificInternalRow.scala | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (41cf1d0 -> a30bb0c)

2020-08-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 41cf1d0  [SPARK-32686][PYTHON] Un-deprecate inferring DataFrame schema 
from list of dict
 add a30bb0c  [SPARK-32550][SQL][FOLLOWUP] Eliminate negative impact on 
HyperLogLogSuite

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/expressions/SpecificInternalRow.scala | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (41cf1d0 -> a30bb0c)

2020-08-24 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 41cf1d0  [SPARK-32686][PYTHON] Un-deprecate inferring DataFrame schema 
from list of dict
 add a30bb0c  [SPARK-32550][SQL][FOLLOWUP] Eliminate negative impact on 
HyperLogLogSuite

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/expressions/SpecificInternalRow.scala | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (e3a88a9 -> 41cf1d0)

2020-08-24 Thread cutlerb

This is an automated email from the ASF dual-hosted git repository.

cutlerb pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e3a88a9  [SPARK-32516][SQL] 'path' option cannot coexist with load()'s 
path parameters
 add 41cf1d0  [SPARK-32686][PYTHON] Un-deprecate inferring DataFrame schema 
from list of dict

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/session.py | 15 ---
 1 file changed, 4 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (e3a88a9 -> 41cf1d0)

2020-08-24 Thread cutlerb

This is an automated email from the ASF dual-hosted git repository.

cutlerb pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e3a88a9  [SPARK-32516][SQL] 'path' option cannot coexist with load()'s 
path parameters
 add 41cf1d0  [SPARK-32686][PYTHON] Un-deprecate inferring DataFrame schema 
from list of dict

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/session.py | 15 ---
 1 file changed, 4 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (e3a88a9 -> 41cf1d0)

2020-08-24 Thread cutlerb

This is an automated email from the ASF dual-hosted git repository.

cutlerb pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e3a88a9  [SPARK-32516][SQL] 'path' option cannot coexist with load()'s 
path parameters
 add 41cf1d0  [SPARK-32686][PYTHON] Un-deprecate inferring DataFrame schema 
from list of dict

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/session.py | 15 ---
 1 file changed, 4 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (e3a88a9 -> 41cf1d0)

2020-08-24 Thread cutlerb

This is an automated email from the ASF dual-hosted git repository.

cutlerb pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e3a88a9  [SPARK-32516][SQL] 'path' option cannot coexist with load()'s 
path parameters
 add 41cf1d0  [SPARK-32686][PYTHON] Un-deprecate inferring DataFrame schema 
from list of dict

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/session.py | 15 ---
 1 file changed, 4 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (e3a88a9 -> 41cf1d0)

2020-08-24 Thread cutlerb

This is an automated email from the ASF dual-hosted git repository.

cutlerb pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e3a88a9  [SPARK-32516][SQL] 'path' option cannot coexist with load()'s 
path parameters
 add 41cf1d0  [SPARK-32686][PYTHON] Un-deprecate inferring DataFrame schema 
from list of dict

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/session.py | 15 ---
 1 file changed, 4 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] asfgit closed pull request #284: Update committer guide

2020-08-24 Thread GitBox



asfgit closed pull request #284:
URL: https://github.com/apache/spark-website/pull/284


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark-website] branch asf-site updated: Update committer guide

2020-08-24 Thread holden

This is an automated email from the ASF dual-hosted git repository.

holden pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new a3f618b  Update committer guide
a3f618b is described below

commit a3f618bec99e5b4132c6586f12c1313b34cf5b13
Author: Holden Karau 
AuthorDate: Mon Aug 24 10:59:16 2020 -0700

Update committer guide

Update the committer guide with the new policy we voted on for when to 
commit.

Author: Holden Karau 

Closes #284 from holdenk/update-committer-guide.
---
 committers.md| 21 +
 site/committers.html | 42 +-
 2 files changed, 50 insertions(+), 13 deletions(-)

diff --git a/committers.md b/committers.md
index 2665e0a..fc6958c 100644
--- a/committers.md
+++ b/committers.md
@@ -132,6 +132,27 @@ In particular, if you are working on an area of the 
codebase you are unfamiliar
 Git history for that code to see who reviewed patches before. You can do this 
using 
 `git log --format=full `, by examining the "Commit" field to see who 
committed each patch.
 
+When to commit/merge a pull request
+
+PRs shall not be merged during active, on-topic discussion unless they address 
issues such as critical security fixes of a public vulnerability. Under 
extenuating circumstances, PRs may be merged during active, off-topic 
discussion and the discussion directed to a more appropriate venue. Time should 
be given prior to merging for those involved with the conversation to explain 
if they believe they are on-topic.
+
+Lazy consensus requires giving time for discussion to settle while 
understanding that people may not be working on Spark as their full-time job 
and may take holidays. It is believed that by doing this, we can limit how 
often people feel the need to exercise their veto.
+
+
+All -1s with justification merit discussion.  A -1 from a non-committer can be 
overridden only with input from multiple committers, and suitable time must be 
offered for any committer to raise concerns. A -1 from a committer who cannot 
be reached requires a consensus vote of the PMC under ASF voting rules to 
determine the next steps within the [ASF guidelines for code 
vetoes](https://www.apache.org/foundation/voting.html).
+
+
+These policies serve to reiterate the core principle that code must not be 
merged with a pending veto or before a consensus has been reached (lazy or 
otherwise).
+
+
+It is the PMC’s hope that vetoes continue to be infrequent, and when they 
occur, that all parties will take the time to build consensus prior to 
additional feature work.
+
+
+Being a committer means exercising your judgement while working in a community 
of people with diverse views. There is nothing wrong in getting a second (or 
third or fourth) opinion when you are uncertain. Thank you for your dedication 
to the Spark project; it is appreciated by the developers and users of Spark.
+
+
+It is hoped that these guidelines do not slow down development; rather, by 
removing some of the uncertainty, the goal is to make it easier for us to reach 
consensus. If you have ideas on how to improve these guidelines or other Spark 
project operating procedures, you should reach out on the dev@ list to start 
the discussion.
+
 How to Merge a Pull Request
 
 Changes pushed to the master branch on Apache cannot be removed; that is, we 
can't force-push to 
diff --git a/site/committers.html b/site/committers.html
index 91bc57b..ff09913 100644
--- a/site/committers.html
+++ b/site/committers.html
@@ -565,7 +565,23 @@ who have shown they understand and can help with these 
activities.
 Contributing to Spark. 
 In particular, if you are working on an area of the codebase you are 
unfamiliar with, look at the 
 Git history for that code to see who reviewed patches before. You can do this 
using 
-git log --format=full filename, 
by examining the Commit field to see who committed each patch.
+git log --format=full 
filename, by examining the Commit field to see who 
committed each patch.
+
+When to commit/merge a pull request
+
+PRs shall not be merged during active, on-topic discussion unless they 
address issues such as critical security fixes of a public vulnerability. Under 
extenuating circumstances, PRs may be merged during active, off-topic 
discussion and the discussion directed to a more appropriate venue. Time should 
be given prior to merging for those involved with the conversation to explain 
if they believe they are on-topic.
+
+Lazy consensus requires giving time for discussion to settle while 
understanding that people may not be working on Spark as their full-time job 
and may take holidays. It is believed that by doing this, we can limit how 
often people feel the need to exercise their veto.
+
+All -1s with justification merit discussion.  A -1 from a non-committer can 
be overridden only

[GitHub] [spark-website] dongjoon-hyun commented on pull request #284: Update committer guide

2020-08-24 Thread GitBox



dongjoon-hyun commented on pull request #284:
URL: https://github.com/apache/spark-website/pull/284#issuecomment-679263615


   Please merge this AS-IS.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] holdenk commented on pull request #284: Update committer guide

2020-08-24 Thread GitBox



holdenk commented on pull request #284:
URL: https://github.com/apache/spark-website/pull/284#issuecomment-679259675


   Ok I'm going to go ahead and commit, and if @dongjoon-hyun & @HyukjinKwon 
want to update with the 72 hours and can agree on the wording we can do that in 
a follow up PR.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (bc23bb7 -> e3a88a9)

2020-08-24 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from bc23bb7  [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in 
tests
 add e3a88a9  [SPARK-32516][SQL] 'path' option cannot coexist with load()'s 
path parameters

No new revisions were added by this update.

Summary of changes:
 docs/sql-migration-guide.md|  2 +
 .../org/apache/spark/sql/internal/SQLConf.scala| 12 +
 .../org/apache/spark/sql/DataFrameReader.scala | 30 ++--
 .../spark/sql/FileBasedDataSourceSuite.scala   | 21 
 .../sql/test/DataFrameReaderWriterSuite.scala  | 56 +-
 5 files changed, 93 insertions(+), 28 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (bc23bb7 -> e3a88a9)

2020-08-24 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from bc23bb7  [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in 
tests
 add e3a88a9  [SPARK-32516][SQL] 'path' option cannot coexist with load()'s 
path parameters

No new revisions were added by this update.

Summary of changes:
 docs/sql-migration-guide.md|  2 +
 .../org/apache/spark/sql/internal/SQLConf.scala| 12 +
 .../org/apache/spark/sql/DataFrameReader.scala | 30 ++--
 .../spark/sql/FileBasedDataSourceSuite.scala   | 21 
 .../sql/test/DataFrameReaderWriterSuite.scala  | 56 +-
 5 files changed, 93 insertions(+), 28 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (bc23bb7 -> e3a88a9)

2020-08-24 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from bc23bb7  [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in 
tests
 add e3a88a9  [SPARK-32516][SQL] 'path' option cannot coexist with load()'s 
path parameters

No new revisions were added by this update.

Summary of changes:
 docs/sql-migration-guide.md|  2 +
 .../org/apache/spark/sql/internal/SQLConf.scala| 12 +
 .../org/apache/spark/sql/DataFrameReader.scala | 30 ++--
 .../spark/sql/FileBasedDataSourceSuite.scala   | 21 
 .../sql/test/DataFrameReaderWriterSuite.scala  | 56 +-
 5 files changed, 93 insertions(+), 28 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (bc23bb7 -> e3a88a9)

2020-08-24 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from bc23bb7  [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in 
tests
 add e3a88a9  [SPARK-32516][SQL] 'path' option cannot coexist with load()'s 
path parameters

No new revisions were added by this update.

Summary of changes:
 docs/sql-migration-guide.md|  2 +
 .../org/apache/spark/sql/internal/SQLConf.scala| 12 +
 .../org/apache/spark/sql/DataFrameReader.scala | 30 ++--
 .../spark/sql/FileBasedDataSourceSuite.scala   | 21 
 .../sql/test/DataFrameReaderWriterSuite.scala  | 56 +-
 5 files changed, 93 insertions(+), 28 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (bc23bb7 -> e3a88a9)

2020-08-24 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from bc23bb7  [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in 
tests
 add e3a88a9  [SPARK-32516][SQL] 'path' option cannot coexist with load()'s 
path parameters

No new revisions were added by this update.

Summary of changes:
 docs/sql-migration-guide.md|  2 +
 .../org/apache/spark/sql/internal/SQLConf.scala| 12 +
 .../org/apache/spark/sql/DataFrameReader.scala | 30 ++--
 .../spark/sql/FileBasedDataSourceSuite.scala   | 21 
 .../sql/test/DataFrameReaderWriterSuite.scala  | 56 +-
 5 files changed, 93 insertions(+), 28 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated (4a67f1e -> 007acba)

2020-08-24 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4a67f1e  [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in 
tests
 add 007acba  [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/clustering/BisectingKMeans.scala  | 33 ++-
 .../org/apache/spark/ml/clustering/KMeans.scala| 33 ++-
 .../spark/mllib/clustering/BisectingKMeans.scala   | 47 ++
 .../org/apache/spark/mllib/clustering/KMeans.scala | 33 +++
 4 files changed, 60 insertions(+), 86 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated (4a67f1e -> 007acba)

2020-08-24 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4a67f1e  [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in 
tests
 add 007acba  [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/clustering/BisectingKMeans.scala  | 33 ++-
 .../org/apache/spark/ml/clustering/KMeans.scala| 33 ++-
 .../spark/mllib/clustering/BisectingKMeans.scala   | 47 ++
 .../org/apache/spark/mllib/clustering/KMeans.scala | 33 +++
 4 files changed, 60 insertions(+), 86 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated (4a67f1e -> 007acba)

2020-08-24 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4a67f1e  [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in 
tests
 add 007acba  [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/clustering/BisectingKMeans.scala  | 33 ++-
 .../org/apache/spark/ml/clustering/KMeans.scala| 33 ++-
 .../spark/mllib/clustering/BisectingKMeans.scala   | 47 ++
 .../org/apache/spark/mllib/clustering/KMeans.scala | 33 +++
 4 files changed, 60 insertions(+), 86 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated (4a67f1e -> 007acba)

2020-08-24 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4a67f1e  [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in 
tests
 add 007acba  [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/clustering/BisectingKMeans.scala  | 33 ++-
 .../org/apache/spark/ml/clustering/KMeans.scala| 33 ++-
 .../spark/mllib/clustering/BisectingKMeans.scala   | 47 ++
 .../org/apache/spark/mllib/clustering/KMeans.scala | 33 +++
 4 files changed, 60 insertions(+), 86 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans

2020-08-24 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 007acba  [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans
007acba is described below

commit 007acba6e3b0e45e334bed5942692dd88c61b3ea
Author: Huaxin Gao 
AuthorDate: Mon Aug 24 08:47:01 2020 -0700

[SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans

### What changes were proposed in this pull request?
backporting https://github.com/apache/spark/pull/29501

### Why are the changes needed?
avoid double caching

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
Existing tests

Closes #29528 from huaxingao/kmeans_3.0.

Authored-by: Huaxin Gao 
Signed-off-by: Huaxin Gao 
---
 .../spark/ml/clustering/BisectingKMeans.scala  | 33 ++-
 .../org/apache/spark/ml/clustering/KMeans.scala| 33 ++-
 .../spark/mllib/clustering/BisectingKMeans.scala   | 47 ++
 .../org/apache/spark/mllib/clustering/KMeans.scala | 33 +++
 4 files changed, 60 insertions(+), 86 deletions(-)

diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala 
b/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala
index b649b1d..b3f2d22 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala
@@ -28,9 +28,8 @@ import org.apache.spark.ml.util._
 import org.apache.spark.ml.util.Instrumentation.instrumented
 import org.apache.spark.mllib.clustering.{BisectingKMeans => 
MLlibBisectingKMeans,
   BisectingKMeansModel => MLlibBisectingKMeansModel}
-import org.apache.spark.mllib.linalg.{Vector => OldVector, Vectors => 
OldVectors}
+import org.apache.spark.mllib.linalg.{Vectors => OldVectors}
 import org.apache.spark.mllib.linalg.VectorImplicits._
-import org.apache.spark.rdd.RDD
 import org.apache.spark.sql.{DataFrame, Dataset, Row}
 import org.apache.spark.sql.functions._
 import org.apache.spark.sql.types.{DoubleType, IntegerType, StructType}
@@ -275,21 +274,6 @@ class BisectingKMeans @Since("2.0.0") (
   override def fit(dataset: Dataset[_]): BisectingKMeansModel = instrumented { 
instr =>
 transformSchema(dataset.schema, logging = true)
 
-val handlePersistence = dataset.storageLevel == StorageLevel.NONE
-val w = if (isDefined(weightCol) && $(weightCol).nonEmpty) {
-  col($(weightCol)).cast(DoubleType)
-} else {
-  lit(1.0)
-}
-
-val instances: RDD[(OldVector, Double)] = dataset
-  .select(DatasetUtils.columnToVector(dataset, getFeaturesCol), w).rdd.map 
{
-  case Row(point: Vector, weight: Double) => (OldVectors.fromML(point), 
weight)
-}
-if (handlePersistence) {
-  instances.persist(StorageLevel.MEMORY_AND_DISK)
-}
-
 instr.logPipelineStage(this)
 instr.logDataset(dataset)
 instr.logParams(this, featuresCol, predictionCol, k, maxIter, seed,
@@ -301,11 +285,18 @@ class BisectingKMeans @Since("2.0.0") (
   .setMinDivisibleClusterSize($(minDivisibleClusterSize))
   .setSeed($(seed))
   .setDistanceMeasure($(distanceMeasure))
-val parentModel = bkm.runWithWeight(instances, Some(instr))
-val model = copyValues(new BisectingKMeansModel(uid, 
parentModel).setParent(this))
-if (handlePersistence) {
-  instances.unpersist()
+
+val w = if (isDefined(weightCol) && $(weightCol).nonEmpty) {
+  col($(weightCol)).cast(DoubleType)
+} else {
+  lit(1.0)
 }
+val instances = dataset.select(DatasetUtils.columnToVector(dataset, 
getFeaturesCol), w)
+  .rdd.map { case Row(point: Vector, weight: Double) => 
(OldVectors.fromML(point), weight) }
+
+val handlePersistence = dataset.storageLevel == StorageLevel.NONE
+val parentModel = bkm.runWithWeight(instances, handlePersistence, 
Some(instr))
+val model = copyValues(new BisectingKMeansModel(uid, 
parentModel).setParent(this))
 
 val summary = new BisectingKMeansSummary(
   model.transform(dataset),
diff --git a/mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
b/mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala
index 5370318..e182f3d 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala
@@ -31,7 +31,6 @@ import org.apache.spark.ml.util.Instrumentation.instrumented
 import org.apache.spark.mllib.clustering.{DistanceMeasure, KMeans => 
MLlibKMeans, KMeansModel => MLlibKMeansModel}
 import org.apache.spark.mllib.linalg.{Vector => OldVector, Vectors => 
OldVectors}
 import org.apache.spark.mllib.linalg.VectorImplicits._
-import org.apache.spark.rdd.RDD
 import

[spark] branch branch-3.0 updated (8aa644e -> 4a67f1e)

2020-08-24 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8aa644e  [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code
 add 4a67f1e  [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in 
tests

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/storage/BlockManagerSuite.scala   | 26 +++--
 .../apache/spark/storage/MemoryStoreSuite.scala| 29 +--
 .../org/apache/spark/util/SizeEstimatorSuite.scala | 43 ++
 3 files changed, 75 insertions(+), 23 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated (8aa644e -> 4a67f1e)

2020-08-24 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8aa644e  [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code
 add 4a67f1e  [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in 
tests

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/storage/BlockManagerSuite.scala   | 26 +++--
 .../apache/spark/storage/MemoryStoreSuite.scala| 29 +--
 .../org/apache/spark/util/SizeEstimatorSuite.scala | 43 ++
 3 files changed, 75 insertions(+), 23 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (08b951b -> bc23bb7)

2020-08-24 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 08b951b  [SPARK-32649][SQL] Optimize BHJ/SHJ inner/semi join with 
empty hashed relation
 add bc23bb7  [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in 
tests

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/storage/BlockManagerSuite.scala   | 26 +++--
 .../apache/spark/storage/MemoryStoreSuite.scala| 29 +--
 .../org/apache/spark/util/SizeEstimatorSuite.scala | 43 ++
 3 files changed, 75 insertions(+), 23 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated (8aa644e -> 4a67f1e)

2020-08-24 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8aa644e  [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code
 add 4a67f1e  [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in 
tests

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/storage/BlockManagerSuite.scala   | 26 +++--
 .../apache/spark/storage/MemoryStoreSuite.scala| 29 +--
 .../org/apache/spark/util/SizeEstimatorSuite.scala | 43 ++
 3 files changed, 75 insertions(+), 23 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (08b951b -> bc23bb7)

2020-08-24 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 08b951b  [SPARK-32649][SQL] Optimize BHJ/SHJ inner/semi join with 
empty hashed relation
 add bc23bb7  [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in 
tests

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/storage/BlockManagerSuite.scala   | 26 +++--
 .../apache/spark/storage/MemoryStoreSuite.scala| 29 +--
 .../org/apache/spark/util/SizeEstimatorSuite.scala | 43 ++
 3 files changed, 75 insertions(+), 23 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated (8aa644e -> 4a67f1e)

2020-08-24 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8aa644e  [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code
 add 4a67f1e  [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in 
tests

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/storage/BlockManagerSuite.scala   | 26 +++--
 .../apache/spark/storage/MemoryStoreSuite.scala| 29 +--
 .../org/apache/spark/util/SizeEstimatorSuite.scala | 43 ++
 3 files changed, 75 insertions(+), 23 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (08b951b -> bc23bb7)

2020-08-24 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 08b951b  [SPARK-32649][SQL] Optimize BHJ/SHJ inner/semi join with 
empty hashed relation
 add bc23bb7  [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in 
tests

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/storage/BlockManagerSuite.scala   | 26 +++--
 .../apache/spark/storage/MemoryStoreSuite.scala| 29 +--
 .../org/apache/spark/util/SizeEstimatorSuite.scala | 43 ++
 3 files changed, 75 insertions(+), 23 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated (8aa644e -> 4a67f1e)

2020-08-24 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8aa644e  [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code
 add 4a67f1e  [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in 
tests

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/storage/BlockManagerSuite.scala   | 26 +++--
 .../apache/spark/storage/MemoryStoreSuite.scala| 29 +--
 .../org/apache/spark/util/SizeEstimatorSuite.scala | 43 ++
 3 files changed, 75 insertions(+), 23 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (08b951b -> bc23bb7)

2020-08-24 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 08b951b  [SPARK-32649][SQL] Optimize BHJ/SHJ inner/semi join with 
empty hashed relation
 add bc23bb7  [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in 
tests

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/storage/BlockManagerSuite.scala   | 26 +++--
 .../apache/spark/storage/MemoryStoreSuite.scala| 29 +--
 .../org/apache/spark/util/SizeEstimatorSuite.scala | 43 ++
 3 files changed, 75 insertions(+), 23 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (08b951b -> bc23bb7)

2020-08-24 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 08b951b  [SPARK-32649][SQL] Optimize BHJ/SHJ inner/semi join with 
empty hashed relation
 add bc23bb7  [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in 
tests

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/storage/BlockManagerSuite.scala   | 26 +++--
 .../apache/spark/storage/MemoryStoreSuite.scala| 29 +--
 .../org/apache/spark/util/SizeEstimatorSuite.scala | 43 ++
 3 files changed, 75 insertions(+), 23 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (11c6a23 -> 08b951b)

2020-08-24 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 11c6a23  [SPARK-32352][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] 
Exclude partition columns from data columns
 add 08b951b  [SPARK-32649][SQL] Optimize BHJ/SHJ inner/semi join with 
empty hashed relation

No new revisions were added by this update.

Summary of changes:
 .../execution/adaptive/AdaptiveSparkPlanExec.scala |  2 +-
 .../adaptive/EliminateJoinToEmptyRelation.scala| 57 ++
 .../adaptive/EliminateNullAwareAntiJoin.scala  | 41 
 .../spark/sql/execution/joins/HashJoin.scala   | 28 ---
 .../scala/org/apache/spark/sql/JoinSuite.scala | 54 +++-
 .../adaptive/AdaptiveQueryExecSuite.scala  | 28 ++-
 6 files changed, 157 insertions(+), 53 deletions(-)
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/EliminateJoinToEmptyRelation.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/EliminateNullAwareAntiJoin.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (11c6a23 -> 08b951b)

2020-08-24 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 11c6a23  [SPARK-32352][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] 
Exclude partition columns from data columns
 add 08b951b  [SPARK-32649][SQL] Optimize BHJ/SHJ inner/semi join with 
empty hashed relation

No new revisions were added by this update.

Summary of changes:
 .../execution/adaptive/AdaptiveSparkPlanExec.scala |  2 +-
 .../adaptive/EliminateJoinToEmptyRelation.scala| 57 ++
 .../adaptive/EliminateNullAwareAntiJoin.scala  | 41 
 .../spark/sql/execution/joins/HashJoin.scala   | 28 ---
 .../scala/org/apache/spark/sql/JoinSuite.scala | 54 +++-
 .../adaptive/AdaptiveQueryExecSuite.scala  | 28 ++-
 6 files changed, 157 insertions(+), 53 deletions(-)
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/EliminateJoinToEmptyRelation.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/EliminateNullAwareAntiJoin.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (11c6a23 -> 08b951b)

2020-08-24 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 11c6a23  [SPARK-32352][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] 
Exclude partition columns from data columns
 add 08b951b  [SPARK-32649][SQL] Optimize BHJ/SHJ inner/semi join with 
empty hashed relation

No new revisions were added by this update.

Summary of changes:
 .../execution/adaptive/AdaptiveSparkPlanExec.scala |  2 +-
 .../adaptive/EliminateJoinToEmptyRelation.scala| 57 ++
 .../adaptive/EliminateNullAwareAntiJoin.scala  | 41 
 .../spark/sql/execution/joins/HashJoin.scala   | 28 ---
 .../scala/org/apache/spark/sql/JoinSuite.scala | 54 +++-
 .../adaptive/AdaptiveQueryExecSuite.scala  | 28 ++-
 6 files changed, 157 insertions(+), 53 deletions(-)
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/EliminateJoinToEmptyRelation.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/EliminateNullAwareAntiJoin.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (11c6a23 -> 08b951b)

2020-08-24 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 11c6a23  [SPARK-32352][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] 
Exclude partition columns from data columns
 add 08b951b  [SPARK-32649][SQL] Optimize BHJ/SHJ inner/semi join with 
empty hashed relation

No new revisions were added by this update.

Summary of changes:
 .../execution/adaptive/AdaptiveSparkPlanExec.scala |  2 +-
 .../adaptive/EliminateJoinToEmptyRelation.scala| 57 ++
 .../adaptive/EliminateNullAwareAntiJoin.scala  | 41 
 .../spark/sql/execution/joins/HashJoin.scala   | 28 ---
 .../scala/org/apache/spark/sql/JoinSuite.scala | 54 +++-
 .../adaptive/AdaptiveQueryExecSuite.scala  | 28 ++-
 6 files changed, 157 insertions(+), 53 deletions(-)
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/EliminateJoinToEmptyRelation.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/EliminateNullAwareAntiJoin.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (11c6a23 -> 08b951b)

2020-08-24 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 11c6a23  [SPARK-32352][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] 
Exclude partition columns from data columns
 add 08b951b  [SPARK-32649][SQL] Optimize BHJ/SHJ inner/semi join with 
empty hashed relation

No new revisions were added by this update.

Summary of changes:
 .../execution/adaptive/AdaptiveSparkPlanExec.scala |  2 +-
 .../adaptive/EliminateJoinToEmptyRelation.scala| 57 ++
 .../adaptive/EliminateNullAwareAntiJoin.scala  | 41 
 .../spark/sql/execution/joins/HashJoin.scala   | 28 ---
 .../scala/org/apache/spark/sql/JoinSuite.scala | 54 +++-
 .../adaptive/AdaptiveQueryExecSuite.scala  | 28 ++-
 6 files changed, 157 insertions(+), 53 deletions(-)
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/EliminateJoinToEmptyRelation.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/EliminateNullAwareAntiJoin.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

80 matches

Mail list logo