[spark] branch master updated (3f1e56d -> c26a976)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3f1e56d [SPARK-32641][SQL] withField + getField should return null if original struct was null add c26a976 Revert "[SPARK-32412][SQL] Unify error handling for spark thrift serv… No new revisions were added by this update. Summary of changes: .../SparkExecuteStatementOperation.scala | 56 +--- .../sql/hive/thriftserver/SparkOperation.scala | 35 +++-- .../ThriftServerWithSparkContextSuite.scala| 61 +++--- 3 files changed, 75 insertions(+), 77 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (cee48a9 -> 3f1e56d)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from cee48a9 [SPARK-32646][SQL][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate pushdown should work with case-insensitive analysis add 3f1e56d [SPARK-32641][SQL] withField + getField should return null if original struct was null No new revisions were added by this update. Summary of changes: .../sql/catalyst/optimizer/ComplexTypes.scala | 3 +- .../sql/catalyst/optimizer/complexTypesSuite.scala | 96 -- .../apache/spark/sql/ColumnExpressionSuite.scala | 85 +++ 3 files changed, 156 insertions(+), 28 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (cee48a9 -> 3f1e56d)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from cee48a9 [SPARK-32646][SQL][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate pushdown should work with case-insensitive analysis add 3f1e56d [SPARK-32641][SQL] withField + getField should return null if original struct was null No new revisions were added by this update. Summary of changes: .../sql/catalyst/optimizer/ComplexTypes.scala | 3 +- .../sql/catalyst/optimizer/complexTypesSuite.scala | 96 -- .../apache/spark/sql/ColumnExpressionSuite.scala | 85 +++ 3 files changed, 156 insertions(+), 28 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (cee48a9 -> 3f1e56d)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from cee48a9 [SPARK-32646][SQL][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate pushdown should work with case-insensitive analysis add 3f1e56d [SPARK-32641][SQL] withField + getField should return null if original struct was null No new revisions were added by this update. Summary of changes: .../sql/catalyst/optimizer/ComplexTypes.scala | 3 +- .../sql/catalyst/optimizer/complexTypesSuite.scala | 96 -- .../apache/spark/sql/ColumnExpressionSuite.scala | 85 +++ 3 files changed, 156 insertions(+), 28 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f540031 -> cee48a9)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f540031 [SPARK-31000][PYTHON][SQL] Add ability to set table description via Catalog.createTable() add cee48a9 [SPARK-32646][SQL][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate pushdown should work with case-insensitive analysis No new revisions were added by this update. Summary of changes: .../execution/datasources/orc/OrcFileFormat.scala | 16 +++-- .../execution/datasources/orc/OrcFiltersBase.scala | 10 +++- .../sql/execution/datasources/orc/OrcUtils.scala | 14 + .../v2/orc/OrcPartitionReaderFactory.scala | 22 ++- .../sql/execution/datasources/v2/orc/OrcScan.scala | 2 +- .../datasources/v2/orc/OrcScanBuilder.scala| 5 -- .../sql/execution/datasources/orc/OrcFilters.scala | 48 --- .../execution/datasources/orc/OrcFilterSuite.scala | 70 +- .../sql/execution/datasources/orc/OrcFilters.scala | 50 +--- .../execution/datasources/orc/OrcFilterSuite.scala | 70 +- 10 files changed, 245 insertions(+), 62 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (cee48a9 -> 3f1e56d)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from cee48a9 [SPARK-32646][SQL][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate pushdown should work with case-insensitive analysis add 3f1e56d [SPARK-32641][SQL] withField + getField should return null if original struct was null No new revisions were added by this update. Summary of changes: .../sql/catalyst/optimizer/ComplexTypes.scala | 3 +- .../sql/catalyst/optimizer/complexTypesSuite.scala | 96 -- .../apache/spark/sql/ColumnExpressionSuite.scala | 85 +++ 3 files changed, 156 insertions(+), 28 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f540031 -> cee48a9)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f540031 [SPARK-31000][PYTHON][SQL] Add ability to set table description via Catalog.createTable() add cee48a9 [SPARK-32646][SQL][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate pushdown should work with case-insensitive analysis No new revisions were added by this update. Summary of changes: .../execution/datasources/orc/OrcFileFormat.scala | 16 +++-- .../execution/datasources/orc/OrcFiltersBase.scala | 10 +++- .../sql/execution/datasources/orc/OrcUtils.scala | 14 + .../v2/orc/OrcPartitionReaderFactory.scala | 22 ++- .../sql/execution/datasources/v2/orc/OrcScan.scala | 2 +- .../datasources/v2/orc/OrcScanBuilder.scala| 5 -- .../sql/execution/datasources/orc/OrcFilters.scala | 48 --- .../execution/datasources/orc/OrcFilterSuite.scala | 70 +- .../sql/execution/datasources/orc/OrcFilters.scala | 50 +--- .../execution/datasources/orc/OrcFilterSuite.scala | 70 +- 10 files changed, 245 insertions(+), 62 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (cee48a9 -> 3f1e56d)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from cee48a9 [SPARK-32646][SQL][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate pushdown should work with case-insensitive analysis add 3f1e56d [SPARK-32641][SQL] withField + getField should return null if original struct was null No new revisions were added by this update. Summary of changes: .../sql/catalyst/optimizer/ComplexTypes.scala | 3 +- .../sql/catalyst/optimizer/complexTypesSuite.scala | 96 -- .../apache/spark/sql/ColumnExpressionSuite.scala | 85 +++ 3 files changed, 156 insertions(+), 28 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f540031 -> cee48a9)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f540031 [SPARK-31000][PYTHON][SQL] Add ability to set table description via Catalog.createTable() add cee48a9 [SPARK-32646][SQL][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate pushdown should work with case-insensitive analysis No new revisions were added by this update. Summary of changes: .../execution/datasources/orc/OrcFileFormat.scala | 16 +++-- .../execution/datasources/orc/OrcFiltersBase.scala | 10 +++- .../sql/execution/datasources/orc/OrcUtils.scala | 14 + .../v2/orc/OrcPartitionReaderFactory.scala | 22 ++- .../sql/execution/datasources/v2/orc/OrcScan.scala | 2 +- .../datasources/v2/orc/OrcScanBuilder.scala| 5 -- .../sql/execution/datasources/orc/OrcFilters.scala | 48 --- .../execution/datasources/orc/OrcFilterSuite.scala | 70 +- .../sql/execution/datasources/orc/OrcFilters.scala | 50 +--- .../execution/datasources/orc/OrcFilterSuite.scala | 70 +- 10 files changed, 245 insertions(+), 62 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32646][SQL][3.0][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate pushdown should work with case-insensitive analysis
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 6c88d7c [SPARK-32646][SQL][3.0][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate pushdown should work with case-insensitive analysis 6c88d7c is described below commit 6c88d7c1259ea9fe89f5c8190c683bba506d528e Author: Liang-Chi Hsieh AuthorDate: Tue Aug 25 04:42:39 2020 + [SPARK-32646][SQL][3.0][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate pushdown should work with case-insensitive analysis ### What changes were proposed in this pull request? This PR proposes to fix ORC predicate pushdown under case-insensitive analysis case. The field names in pushed down predicates don't need to match in exact letter case with physical field names in ORC files, if we enable case-insensitive analysis. ### Why are the changes needed? Currently ORC predicate pushdown doesn't work with case-insensitive analysis. A predicate "a < 0" cannot pushdown to ORC file with field name "A" under case-insensitive analysis. But Parquet predicate pushdown works with this case. We should make ORC predicate pushdown work with case-insensitive analysis too. ### Does this PR introduce _any_ user-facing change? Yes, after this PR, under case-insensitive analysis, ORC predicate pushdown will work. ### How was this patch tested? Unit tests. Closes #29513 from viirya/fix-orc-pushdown-3.0. Authored-by: Liang-Chi Hsieh Signed-off-by: Wenchen Fan --- .../execution/datasources/orc/OrcFileFormat.scala | 16 +++-- .../execution/datasources/orc/OrcFiltersBase.scala | 35 ++- .../sql/execution/datasources/orc/OrcUtils.scala | 14 + .../v2/orc/OrcPartitionReaderFactory.scala | 22 ++- .../sql/execution/datasources/v2/orc/OrcScan.scala | 2 +- .../datasources/v2/orc/OrcScanBuilder.scala| 9 +-- .../sql/execution/datasources/orc/OrcFilters.scala | 72 -- .../execution/datasources/orc/OrcFilterSuite.scala | 49 ++- .../sql/execution/datasources/orc/OrcFilters.scala | 70 +++-- .../execution/datasources/orc/OrcFilterSuite.scala | 49 ++- 10 files changed, 253 insertions(+), 85 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala index 4dff1ec..69badb4 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala @@ -153,11 +153,6 @@ class OrcFileFormat filters: Seq[Filter], options: Map[String, String], hadoopConf: Configuration): (PartitionedFile) => Iterator[InternalRow] = { -if (sparkSession.sessionState.conf.orcFilterPushDown) { - OrcFilters.createFilter(dataSchema, filters).foreach { f => -OrcInputFormat.setSearchArgument(hadoopConf, f, dataSchema.fieldNames) - } -} val resultSchema = StructType(requiredSchema.fields ++ partitionSchema.fields) val sqlConf = sparkSession.sessionState.conf @@ -169,6 +164,8 @@ class OrcFileFormat val broadcastedConf = sparkSession.sparkContext.broadcast(new SerializableConfiguration(hadoopConf)) val isCaseSensitive = sparkSession.sessionState.conf.caseSensitiveAnalysis +val orcFilterPushDown = sparkSession.sessionState.conf.orcFilterPushDown +val ignoreCorruptFiles = sparkSession.sessionState.conf.ignoreCorruptFiles (file: PartitionedFile) => { val conf = broadcastedConf.value.value @@ -186,6 +183,15 @@ class OrcFileFormat if (resultedColPruneInfo.isEmpty) { Iterator.empty } else { +// ORC predicate pushdown +if (orcFilterPushDown) { + OrcUtils.readCatalystSchema(filePath, conf, ignoreCorruptFiles).map { fileSchema => +OrcFilters.createFilter(fileSchema, filters).foreach { f => + OrcInputFormat.setSearchArgument(conf, f, fileSchema.fieldNames) +} + } +} + val (requestedColIds, canPruneCols) = resultedColPruneInfo.get val resultSchemaString = OrcUtils.orcResultSchemaString(canPruneCols, dataSchema, resultSchema, partitionSchema, conf) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFiltersBase.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFiltersBase.scala index e673309..4554899 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFiltersBase.scala +++
[spark] branch master updated (9151a58 -> f540031)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9151a58 [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager add f540031 [SPARK-31000][PYTHON][SQL] Add ability to set table description via Catalog.createTable() No new revisions were added by this update. Summary of changes: python/pyspark/sql/catalog.py | 13 +++- python/pyspark/sql/tests/test_catalog.py | 17 - .../org/apache/spark/sql/catalog/Catalog.scala | 79 ++ .../apache/spark/sql/internal/CatalogImpl.scala| 42 +++- .../apache/spark/sql/internal/CatalogSuite.scala | 4 ++ 5 files changed, 149 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f540031 -> cee48a9)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f540031 [SPARK-31000][PYTHON][SQL] Add ability to set table description via Catalog.createTable() add cee48a9 [SPARK-32646][SQL][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate pushdown should work with case-insensitive analysis No new revisions were added by this update. Summary of changes: .../execution/datasources/orc/OrcFileFormat.scala | 16 +++-- .../execution/datasources/orc/OrcFiltersBase.scala | 10 +++- .../sql/execution/datasources/orc/OrcUtils.scala | 14 + .../v2/orc/OrcPartitionReaderFactory.scala | 22 ++- .../sql/execution/datasources/v2/orc/OrcScan.scala | 2 +- .../datasources/v2/orc/OrcScanBuilder.scala| 5 -- .../sql/execution/datasources/orc/OrcFilters.scala | 48 --- .../execution/datasources/orc/OrcFilterSuite.scala | 70 +- .../sql/execution/datasources/orc/OrcFilters.scala | 50 +--- .../execution/datasources/orc/OrcFilterSuite.scala | 70 +- 10 files changed, 245 insertions(+), 62 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32646][SQL][3.0][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate pushdown should work with case-insensitive analysis
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 6c88d7c [SPARK-32646][SQL][3.0][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate pushdown should work with case-insensitive analysis 6c88d7c is described below commit 6c88d7c1259ea9fe89f5c8190c683bba506d528e Author: Liang-Chi Hsieh AuthorDate: Tue Aug 25 04:42:39 2020 + [SPARK-32646][SQL][3.0][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate pushdown should work with case-insensitive analysis ### What changes were proposed in this pull request? This PR proposes to fix ORC predicate pushdown under case-insensitive analysis case. The field names in pushed down predicates don't need to match in exact letter case with physical field names in ORC files, if we enable case-insensitive analysis. ### Why are the changes needed? Currently ORC predicate pushdown doesn't work with case-insensitive analysis. A predicate "a < 0" cannot pushdown to ORC file with field name "A" under case-insensitive analysis. But Parquet predicate pushdown works with this case. We should make ORC predicate pushdown work with case-insensitive analysis too. ### Does this PR introduce _any_ user-facing change? Yes, after this PR, under case-insensitive analysis, ORC predicate pushdown will work. ### How was this patch tested? Unit tests. Closes #29513 from viirya/fix-orc-pushdown-3.0. Authored-by: Liang-Chi Hsieh Signed-off-by: Wenchen Fan --- .../execution/datasources/orc/OrcFileFormat.scala | 16 +++-- .../execution/datasources/orc/OrcFiltersBase.scala | 35 ++- .../sql/execution/datasources/orc/OrcUtils.scala | 14 + .../v2/orc/OrcPartitionReaderFactory.scala | 22 ++- .../sql/execution/datasources/v2/orc/OrcScan.scala | 2 +- .../datasources/v2/orc/OrcScanBuilder.scala| 9 +-- .../sql/execution/datasources/orc/OrcFilters.scala | 72 -- .../execution/datasources/orc/OrcFilterSuite.scala | 49 ++- .../sql/execution/datasources/orc/OrcFilters.scala | 70 +++-- .../execution/datasources/orc/OrcFilterSuite.scala | 49 ++- 10 files changed, 253 insertions(+), 85 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala index 4dff1ec..69badb4 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala @@ -153,11 +153,6 @@ class OrcFileFormat filters: Seq[Filter], options: Map[String, String], hadoopConf: Configuration): (PartitionedFile) => Iterator[InternalRow] = { -if (sparkSession.sessionState.conf.orcFilterPushDown) { - OrcFilters.createFilter(dataSchema, filters).foreach { f => -OrcInputFormat.setSearchArgument(hadoopConf, f, dataSchema.fieldNames) - } -} val resultSchema = StructType(requiredSchema.fields ++ partitionSchema.fields) val sqlConf = sparkSession.sessionState.conf @@ -169,6 +164,8 @@ class OrcFileFormat val broadcastedConf = sparkSession.sparkContext.broadcast(new SerializableConfiguration(hadoopConf)) val isCaseSensitive = sparkSession.sessionState.conf.caseSensitiveAnalysis +val orcFilterPushDown = sparkSession.sessionState.conf.orcFilterPushDown +val ignoreCorruptFiles = sparkSession.sessionState.conf.ignoreCorruptFiles (file: PartitionedFile) => { val conf = broadcastedConf.value.value @@ -186,6 +183,15 @@ class OrcFileFormat if (resultedColPruneInfo.isEmpty) { Iterator.empty } else { +// ORC predicate pushdown +if (orcFilterPushDown) { + OrcUtils.readCatalystSchema(filePath, conf, ignoreCorruptFiles).map { fileSchema => +OrcFilters.createFilter(fileSchema, filters).foreach { f => + OrcInputFormat.setSearchArgument(conf, f, fileSchema.fieldNames) +} + } +} + val (requestedColIds, canPruneCols) = resultedColPruneInfo.get val resultSchemaString = OrcUtils.orcResultSchemaString(canPruneCols, dataSchema, resultSchema, partitionSchema, conf) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFiltersBase.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFiltersBase.scala index e673309..4554899 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFiltersBase.scala +++
[spark] branch master updated (9151a58 -> f540031)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9151a58 [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager add f540031 [SPARK-31000][PYTHON][SQL] Add ability to set table description via Catalog.createTable() No new revisions were added by this update. Summary of changes: python/pyspark/sql/catalog.py | 13 +++- python/pyspark/sql/tests/test_catalog.py | 17 - .../org/apache/spark/sql/catalog/Catalog.scala | 79 ++ .../apache/spark/sql/internal/CatalogImpl.scala| 42 +++- .../apache/spark/sql/internal/CatalogSuite.scala | 4 ++ 5 files changed, 149 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f540031 -> cee48a9)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f540031 [SPARK-31000][PYTHON][SQL] Add ability to set table description via Catalog.createTable() add cee48a9 [SPARK-32646][SQL][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate pushdown should work with case-insensitive analysis No new revisions were added by this update. Summary of changes: .../execution/datasources/orc/OrcFileFormat.scala | 16 +++-- .../execution/datasources/orc/OrcFiltersBase.scala | 10 +++- .../sql/execution/datasources/orc/OrcUtils.scala | 14 + .../v2/orc/OrcPartitionReaderFactory.scala | 22 ++- .../sql/execution/datasources/v2/orc/OrcScan.scala | 2 +- .../datasources/v2/orc/OrcScanBuilder.scala| 5 -- .../sql/execution/datasources/orc/OrcFilters.scala | 48 --- .../execution/datasources/orc/OrcFilterSuite.scala | 70 +- .../sql/execution/datasources/orc/OrcFilters.scala | 50 +--- .../execution/datasources/orc/OrcFilterSuite.scala | 70 +- 10 files changed, 245 insertions(+), 62 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32646][SQL][3.0][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate pushdown should work with case-insensitive analysis
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 6c88d7c [SPARK-32646][SQL][3.0][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate pushdown should work with case-insensitive analysis 6c88d7c is described below commit 6c88d7c1259ea9fe89f5c8190c683bba506d528e Author: Liang-Chi Hsieh AuthorDate: Tue Aug 25 04:42:39 2020 + [SPARK-32646][SQL][3.0][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate pushdown should work with case-insensitive analysis ### What changes were proposed in this pull request? This PR proposes to fix ORC predicate pushdown under case-insensitive analysis case. The field names in pushed down predicates don't need to match in exact letter case with physical field names in ORC files, if we enable case-insensitive analysis. ### Why are the changes needed? Currently ORC predicate pushdown doesn't work with case-insensitive analysis. A predicate "a < 0" cannot pushdown to ORC file with field name "A" under case-insensitive analysis. But Parquet predicate pushdown works with this case. We should make ORC predicate pushdown work with case-insensitive analysis too. ### Does this PR introduce _any_ user-facing change? Yes, after this PR, under case-insensitive analysis, ORC predicate pushdown will work. ### How was this patch tested? Unit tests. Closes #29513 from viirya/fix-orc-pushdown-3.0. Authored-by: Liang-Chi Hsieh Signed-off-by: Wenchen Fan --- .../execution/datasources/orc/OrcFileFormat.scala | 16 +++-- .../execution/datasources/orc/OrcFiltersBase.scala | 35 ++- .../sql/execution/datasources/orc/OrcUtils.scala | 14 + .../v2/orc/OrcPartitionReaderFactory.scala | 22 ++- .../sql/execution/datasources/v2/orc/OrcScan.scala | 2 +- .../datasources/v2/orc/OrcScanBuilder.scala| 9 +-- .../sql/execution/datasources/orc/OrcFilters.scala | 72 -- .../execution/datasources/orc/OrcFilterSuite.scala | 49 ++- .../sql/execution/datasources/orc/OrcFilters.scala | 70 +++-- .../execution/datasources/orc/OrcFilterSuite.scala | 49 ++- 10 files changed, 253 insertions(+), 85 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala index 4dff1ec..69badb4 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala @@ -153,11 +153,6 @@ class OrcFileFormat filters: Seq[Filter], options: Map[String, String], hadoopConf: Configuration): (PartitionedFile) => Iterator[InternalRow] = { -if (sparkSession.sessionState.conf.orcFilterPushDown) { - OrcFilters.createFilter(dataSchema, filters).foreach { f => -OrcInputFormat.setSearchArgument(hadoopConf, f, dataSchema.fieldNames) - } -} val resultSchema = StructType(requiredSchema.fields ++ partitionSchema.fields) val sqlConf = sparkSession.sessionState.conf @@ -169,6 +164,8 @@ class OrcFileFormat val broadcastedConf = sparkSession.sparkContext.broadcast(new SerializableConfiguration(hadoopConf)) val isCaseSensitive = sparkSession.sessionState.conf.caseSensitiveAnalysis +val orcFilterPushDown = sparkSession.sessionState.conf.orcFilterPushDown +val ignoreCorruptFiles = sparkSession.sessionState.conf.ignoreCorruptFiles (file: PartitionedFile) => { val conf = broadcastedConf.value.value @@ -186,6 +183,15 @@ class OrcFileFormat if (resultedColPruneInfo.isEmpty) { Iterator.empty } else { +// ORC predicate pushdown +if (orcFilterPushDown) { + OrcUtils.readCatalystSchema(filePath, conf, ignoreCorruptFiles).map { fileSchema => +OrcFilters.createFilter(fileSchema, filters).foreach { f => + OrcInputFormat.setSearchArgument(conf, f, fileSchema.fieldNames) +} + } +} + val (requestedColIds, canPruneCols) = resultedColPruneInfo.get val resultSchemaString = OrcUtils.orcResultSchemaString(canPruneCols, dataSchema, resultSchema, partitionSchema, conf) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFiltersBase.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFiltersBase.scala index e673309..4554899 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFiltersBase.scala +++
[spark] branch master updated (9151a58 -> f540031)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9151a58 [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager add f540031 [SPARK-31000][PYTHON][SQL] Add ability to set table description via Catalog.createTable() No new revisions were added by this update. Summary of changes: python/pyspark/sql/catalog.py | 13 +++- python/pyspark/sql/tests/test_catalog.py | 17 - .../org/apache/spark/sql/catalog/Catalog.scala | 79 ++ .../apache/spark/sql/internal/CatalogImpl.scala| 42 +++- .../apache/spark/sql/internal/CatalogSuite.scala | 4 ++ 5 files changed, 149 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32646][SQL][3.0][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate pushdown should work with case-insensitive analysis
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 6c88d7c [SPARK-32646][SQL][3.0][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate pushdown should work with case-insensitive analysis 6c88d7c is described below commit 6c88d7c1259ea9fe89f5c8190c683bba506d528e Author: Liang-Chi Hsieh AuthorDate: Tue Aug 25 04:42:39 2020 + [SPARK-32646][SQL][3.0][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate pushdown should work with case-insensitive analysis ### What changes were proposed in this pull request? This PR proposes to fix ORC predicate pushdown under case-insensitive analysis case. The field names in pushed down predicates don't need to match in exact letter case with physical field names in ORC files, if we enable case-insensitive analysis. ### Why are the changes needed? Currently ORC predicate pushdown doesn't work with case-insensitive analysis. A predicate "a < 0" cannot pushdown to ORC file with field name "A" under case-insensitive analysis. But Parquet predicate pushdown works with this case. We should make ORC predicate pushdown work with case-insensitive analysis too. ### Does this PR introduce _any_ user-facing change? Yes, after this PR, under case-insensitive analysis, ORC predicate pushdown will work. ### How was this patch tested? Unit tests. Closes #29513 from viirya/fix-orc-pushdown-3.0. Authored-by: Liang-Chi Hsieh Signed-off-by: Wenchen Fan --- .../execution/datasources/orc/OrcFileFormat.scala | 16 +++-- .../execution/datasources/orc/OrcFiltersBase.scala | 35 ++- .../sql/execution/datasources/orc/OrcUtils.scala | 14 + .../v2/orc/OrcPartitionReaderFactory.scala | 22 ++- .../sql/execution/datasources/v2/orc/OrcScan.scala | 2 +- .../datasources/v2/orc/OrcScanBuilder.scala| 9 +-- .../sql/execution/datasources/orc/OrcFilters.scala | 72 -- .../execution/datasources/orc/OrcFilterSuite.scala | 49 ++- .../sql/execution/datasources/orc/OrcFilters.scala | 70 +++-- .../execution/datasources/orc/OrcFilterSuite.scala | 49 ++- 10 files changed, 253 insertions(+), 85 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala index 4dff1ec..69badb4 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala @@ -153,11 +153,6 @@ class OrcFileFormat filters: Seq[Filter], options: Map[String, String], hadoopConf: Configuration): (PartitionedFile) => Iterator[InternalRow] = { -if (sparkSession.sessionState.conf.orcFilterPushDown) { - OrcFilters.createFilter(dataSchema, filters).foreach { f => -OrcInputFormat.setSearchArgument(hadoopConf, f, dataSchema.fieldNames) - } -} val resultSchema = StructType(requiredSchema.fields ++ partitionSchema.fields) val sqlConf = sparkSession.sessionState.conf @@ -169,6 +164,8 @@ class OrcFileFormat val broadcastedConf = sparkSession.sparkContext.broadcast(new SerializableConfiguration(hadoopConf)) val isCaseSensitive = sparkSession.sessionState.conf.caseSensitiveAnalysis +val orcFilterPushDown = sparkSession.sessionState.conf.orcFilterPushDown +val ignoreCorruptFiles = sparkSession.sessionState.conf.ignoreCorruptFiles (file: PartitionedFile) => { val conf = broadcastedConf.value.value @@ -186,6 +183,15 @@ class OrcFileFormat if (resultedColPruneInfo.isEmpty) { Iterator.empty } else { +// ORC predicate pushdown +if (orcFilterPushDown) { + OrcUtils.readCatalystSchema(filePath, conf, ignoreCorruptFiles).map { fileSchema => +OrcFilters.createFilter(fileSchema, filters).foreach { f => + OrcInputFormat.setSearchArgument(conf, f, fileSchema.fieldNames) +} + } +} + val (requestedColIds, canPruneCols) = resultedColPruneInfo.get val resultSchemaString = OrcUtils.orcResultSchemaString(canPruneCols, dataSchema, resultSchema, partitionSchema, conf) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFiltersBase.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFiltersBase.scala index e673309..4554899 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFiltersBase.scala +++
[spark] branch master updated (9151a58 -> f540031)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9151a58 [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager add f540031 [SPARK-31000][PYTHON][SQL] Add ability to set table description via Catalog.createTable() No new revisions were added by this update. Summary of changes: python/pyspark/sql/catalog.py | 13 +++- python/pyspark/sql/tests/test_catalog.py | 17 - .../org/apache/spark/sql/catalog/Catalog.scala | 79 ++ .../apache/spark/sql/internal/CatalogImpl.scala| 42 +++- .../apache/spark/sql/internal/CatalogSuite.scala | 4 ++ 5 files changed, 149 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32646][SQL][3.0][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate pushdown should work with case-insensitive analysis
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 6c88d7c [SPARK-32646][SQL][3.0][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate pushdown should work with case-insensitive analysis 6c88d7c is described below commit 6c88d7c1259ea9fe89f5c8190c683bba506d528e Author: Liang-Chi Hsieh AuthorDate: Tue Aug 25 04:42:39 2020 + [SPARK-32646][SQL][3.0][TEST-HADOOP2.7][TEST-HIVE1.2] ORC predicate pushdown should work with case-insensitive analysis ### What changes were proposed in this pull request? This PR proposes to fix ORC predicate pushdown under case-insensitive analysis case. The field names in pushed down predicates don't need to match in exact letter case with physical field names in ORC files, if we enable case-insensitive analysis. ### Why are the changes needed? Currently ORC predicate pushdown doesn't work with case-insensitive analysis. A predicate "a < 0" cannot pushdown to ORC file with field name "A" under case-insensitive analysis. But Parquet predicate pushdown works with this case. We should make ORC predicate pushdown work with case-insensitive analysis too. ### Does this PR introduce _any_ user-facing change? Yes, after this PR, under case-insensitive analysis, ORC predicate pushdown will work. ### How was this patch tested? Unit tests. Closes #29513 from viirya/fix-orc-pushdown-3.0. Authored-by: Liang-Chi Hsieh Signed-off-by: Wenchen Fan --- .../execution/datasources/orc/OrcFileFormat.scala | 16 +++-- .../execution/datasources/orc/OrcFiltersBase.scala | 35 ++- .../sql/execution/datasources/orc/OrcUtils.scala | 14 + .../v2/orc/OrcPartitionReaderFactory.scala | 22 ++- .../sql/execution/datasources/v2/orc/OrcScan.scala | 2 +- .../datasources/v2/orc/OrcScanBuilder.scala| 9 +-- .../sql/execution/datasources/orc/OrcFilters.scala | 72 -- .../execution/datasources/orc/OrcFilterSuite.scala | 49 ++- .../sql/execution/datasources/orc/OrcFilters.scala | 70 +++-- .../execution/datasources/orc/OrcFilterSuite.scala | 49 ++- 10 files changed, 253 insertions(+), 85 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala index 4dff1ec..69badb4 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala @@ -153,11 +153,6 @@ class OrcFileFormat filters: Seq[Filter], options: Map[String, String], hadoopConf: Configuration): (PartitionedFile) => Iterator[InternalRow] = { -if (sparkSession.sessionState.conf.orcFilterPushDown) { - OrcFilters.createFilter(dataSchema, filters).foreach { f => -OrcInputFormat.setSearchArgument(hadoopConf, f, dataSchema.fieldNames) - } -} val resultSchema = StructType(requiredSchema.fields ++ partitionSchema.fields) val sqlConf = sparkSession.sessionState.conf @@ -169,6 +164,8 @@ class OrcFileFormat val broadcastedConf = sparkSession.sparkContext.broadcast(new SerializableConfiguration(hadoopConf)) val isCaseSensitive = sparkSession.sessionState.conf.caseSensitiveAnalysis +val orcFilterPushDown = sparkSession.sessionState.conf.orcFilterPushDown +val ignoreCorruptFiles = sparkSession.sessionState.conf.ignoreCorruptFiles (file: PartitionedFile) => { val conf = broadcastedConf.value.value @@ -186,6 +183,15 @@ class OrcFileFormat if (resultedColPruneInfo.isEmpty) { Iterator.empty } else { +// ORC predicate pushdown +if (orcFilterPushDown) { + OrcUtils.readCatalystSchema(filePath, conf, ignoreCorruptFiles).map { fileSchema => +OrcFilters.createFilter(fileSchema, filters).foreach { f => + OrcInputFormat.setSearchArgument(conf, f, fileSchema.fieldNames) +} + } +} + val (requestedColIds, canPruneCols) = resultedColPruneInfo.get val resultSchemaString = OrcUtils.orcResultSchemaString(canPruneCols, dataSchema, resultSchema, partitionSchema, conf) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFiltersBase.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFiltersBase.scala index e673309..4554899 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFiltersBase.scala +++
[spark] branch master updated (9151a58 -> f540031)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9151a58 [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager add f540031 [SPARK-31000][PYTHON][SQL] Add ability to set table description via Catalog.createTable() No new revisions were added by this update. Summary of changes: python/pyspark/sql/catalog.py | 13 +++- python/pyspark/sql/tests/test_catalog.py | 17 - .../org/apache/spark/sql/catalog/Catalog.scala | 79 ++ .../apache/spark/sql/internal/CatalogImpl.scala| 42 +++- .../apache/spark/sql/internal/CatalogSuite.scala | 4 ++ 5 files changed, 149 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3eee915 -> 9151a58)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3eee915 [MINOR][SQL] Add missing documentation for LongType mapping add 9151a58 [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager No new revisions were added by this update. Summary of changes: .../history/HistoryServerMemoryManager.scala | 5 +- .../apache/spark/deploy/history/HybridStore.scala | 6 +- .../deploy/history/FsHistoryProviderSuite.scala| 13 +- .../history/HistoryServerMemoryManagerSuite.scala | 55 + .../spark/deploy/history/HybridStoreSuite.scala| 232 + 5 files changed, 304 insertions(+), 7 deletions(-) create mode 100644 core/src/test/scala/org/apache/spark/deploy/history/HistoryServerMemoryManagerSuite.scala create mode 100644 core/src/test/scala/org/apache/spark/deploy/history/HybridStoreSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3eee915 -> 9151a58)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3eee915 [MINOR][SQL] Add missing documentation for LongType mapping add 9151a58 [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager No new revisions were added by this update. Summary of changes: .../history/HistoryServerMemoryManager.scala | 5 +- .../apache/spark/deploy/history/HybridStore.scala | 6 +- .../deploy/history/FsHistoryProviderSuite.scala| 13 +- .../history/HistoryServerMemoryManagerSuite.scala | 55 + .../spark/deploy/history/HybridStoreSuite.scala| 232 + 5 files changed, 304 insertions(+), 7 deletions(-) create mode 100644 core/src/test/scala/org/apache/spark/deploy/history/HistoryServerMemoryManagerSuite.scala create mode 100644 core/src/test/scala/org/apache/spark/deploy/history/HybridStoreSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3eee915 -> 9151a58)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3eee915 [MINOR][SQL] Add missing documentation for LongType mapping add 9151a58 [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager No new revisions were added by this update. Summary of changes: .../history/HistoryServerMemoryManager.scala | 5 +- .../apache/spark/deploy/history/HybridStore.scala | 6 +- .../deploy/history/FsHistoryProviderSuite.scala| 13 +- .../history/HistoryServerMemoryManagerSuite.scala | 55 + .../spark/deploy/history/HybridStoreSuite.scala| 232 + 5 files changed, 304 insertions(+), 7 deletions(-) create mode 100644 core/src/test/scala/org/apache/spark/deploy/history/HistoryServerMemoryManagerSuite.scala create mode 100644 core/src/test/scala/org/apache/spark/deploy/history/HybridStoreSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3eee915 -> 9151a58)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3eee915 [MINOR][SQL] Add missing documentation for LongType mapping add 9151a58 [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager No new revisions were added by this update. Summary of changes: .../history/HistoryServerMemoryManager.scala | 5 +- .../apache/spark/deploy/history/HybridStore.scala | 6 +- .../deploy/history/FsHistoryProviderSuite.scala| 13 +- .../history/HistoryServerMemoryManagerSuite.scala | 55 + .../spark/deploy/history/HybridStoreSuite.scala| 232 + 5 files changed, 304 insertions(+), 7 deletions(-) create mode 100644 core/src/test/scala/org/apache/spark/deploy/history/HistoryServerMemoryManagerSuite.scala create mode 100644 core/src/test/scala/org/apache/spark/deploy/history/HybridStoreSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3eee915 -> 9151a58)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3eee915 [MINOR][SQL] Add missing documentation for LongType mapping add 9151a58 [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager No new revisions were added by this update. Summary of changes: .../history/HistoryServerMemoryManager.scala | 5 +- .../apache/spark/deploy/history/HybridStore.scala | 6 +- .../deploy/history/FsHistoryProviderSuite.scala| 13 +- .../history/HistoryServerMemoryManagerSuite.scala | 55 + .../spark/deploy/history/HybridStoreSuite.scala| 232 + 5 files changed, 304 insertions(+), 7 deletions(-) create mode 100644 core/src/test/scala/org/apache/spark/deploy/history/HistoryServerMemoryManagerSuite.scala create mode 100644 core/src/test/scala/org/apache/spark/deploy/history/HybridStoreSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated (b3f7989 -> d7e1746)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git. from b3f7989 [SPARK-32672][SQL] Fix data corruption in boolean bit set compression add d7e1746 [MINOR][SQL] Add missing documentation for LongType mapping No new revisions were added by this update. Summary of changes: sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala | 2 ++ 1 file changed, 2 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (007acba -> 82aef3e)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 007acba [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans add 82aef3e [MINOR][SQL] Add missing documentation for LongType mapping No new revisions were added by this update. Summary of changes: sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala | 2 ++ 1 file changed, 2 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a30bb0c -> 3eee915)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a30bb0c [SPARK-32550][SQL][FOLLOWUP] Eliminate negative impact on HyperLogLogSuite add 3eee915 [MINOR][SQL] Add missing documentation for LongType mapping No new revisions were added by this update. Summary of changes: sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala | 2 ++ 1 file changed, 2 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [MINOR][SQL] Add missing documentation for LongType mapping
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new d7e1746 [MINOR][SQL] Add missing documentation for LongType mapping d7e1746 is described below commit d7e1746092d643728c81475f60d543cae0e0192c Author: Yesheng Ma AuthorDate: Tue Aug 25 11:20:01 2020 +0900 [MINOR][SQL] Add missing documentation for LongType mapping ### What changes were proposed in this pull request? Added Java docs for Long data types in the Row class. ### Why are the changes needed? The Long datatype is somehow missing in Row.scala's `apply` and `get` methods. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing UTs. Closes #29534 from yeshengm/docs-fix. Authored-by: Yesheng Ma Signed-off-by: HyukjinKwon (cherry picked from commit 3eee915b474c58cff9ea108f67073ed9c0c86224) Signed-off-by: HyukjinKwon --- sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala index 180c2d1..d01b1ed 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala @@ -146,6 +146,7 @@ trait Row extends Serializable { * ByteType -> java.lang.Byte * ShortType -> java.lang.Short * IntegerType -> java.lang.Integer + * LongType -> java.lang.Long * FloatType -> java.lang.Float * DoubleType -> java.lang.Double * StringType -> String @@ -171,6 +172,7 @@ trait Row extends Serializable { * ByteType -> java.lang.Byte * ShortType -> java.lang.Short * IntegerType -> java.lang.Integer + * LongType -> java.lang.Long * FloatType -> java.lang.Float * DoubleType -> java.lang.Double * StringType -> String - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (007acba -> 82aef3e)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 007acba [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans add 82aef3e [MINOR][SQL] Add missing documentation for LongType mapping No new revisions were added by this update. Summary of changes: sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala | 2 ++ 1 file changed, 2 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a30bb0c -> 3eee915)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a30bb0c [SPARK-32550][SQL][FOLLOWUP] Eliminate negative impact on HyperLogLogSuite add 3eee915 [MINOR][SQL] Add missing documentation for LongType mapping No new revisions were added by this update. Summary of changes: sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala | 2 ++ 1 file changed, 2 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [MINOR][SQL] Add missing documentation for LongType mapping
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new d7e1746 [MINOR][SQL] Add missing documentation for LongType mapping d7e1746 is described below commit d7e1746092d643728c81475f60d543cae0e0192c Author: Yesheng Ma AuthorDate: Tue Aug 25 11:20:01 2020 +0900 [MINOR][SQL] Add missing documentation for LongType mapping ### What changes were proposed in this pull request? Added Java docs for Long data types in the Row class. ### Why are the changes needed? The Long datatype is somehow missing in Row.scala's `apply` and `get` methods. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing UTs. Closes #29534 from yeshengm/docs-fix. Authored-by: Yesheng Ma Signed-off-by: HyukjinKwon (cherry picked from commit 3eee915b474c58cff9ea108f67073ed9c0c86224) Signed-off-by: HyukjinKwon --- sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala index 180c2d1..d01b1ed 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala @@ -146,6 +146,7 @@ trait Row extends Serializable { * ByteType -> java.lang.Byte * ShortType -> java.lang.Short * IntegerType -> java.lang.Integer + * LongType -> java.lang.Long * FloatType -> java.lang.Float * DoubleType -> java.lang.Double * StringType -> String @@ -171,6 +172,7 @@ trait Row extends Serializable { * ByteType -> java.lang.Byte * ShortType -> java.lang.Short * IntegerType -> java.lang.Integer + * LongType -> java.lang.Long * FloatType -> java.lang.Float * DoubleType -> java.lang.Double * StringType -> String - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (007acba -> 82aef3e)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 007acba [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans add 82aef3e [MINOR][SQL] Add missing documentation for LongType mapping No new revisions were added by this update. Summary of changes: sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala | 2 ++ 1 file changed, 2 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a30bb0c -> 3eee915)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a30bb0c [SPARK-32550][SQL][FOLLOWUP] Eliminate negative impact on HyperLogLogSuite add 3eee915 [MINOR][SQL] Add missing documentation for LongType mapping No new revisions were added by this update. Summary of changes: sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala | 2 ++ 1 file changed, 2 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [MINOR][SQL] Add missing documentation for LongType mapping
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new d7e1746 [MINOR][SQL] Add missing documentation for LongType mapping d7e1746 is described below commit d7e1746092d643728c81475f60d543cae0e0192c Author: Yesheng Ma AuthorDate: Tue Aug 25 11:20:01 2020 +0900 [MINOR][SQL] Add missing documentation for LongType mapping ### What changes were proposed in this pull request? Added Java docs for Long data types in the Row class. ### Why are the changes needed? The Long datatype is somehow missing in Row.scala's `apply` and `get` methods. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing UTs. Closes #29534 from yeshengm/docs-fix. Authored-by: Yesheng Ma Signed-off-by: HyukjinKwon (cherry picked from commit 3eee915b474c58cff9ea108f67073ed9c0c86224) Signed-off-by: HyukjinKwon --- sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala index 180c2d1..d01b1ed 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala @@ -146,6 +146,7 @@ trait Row extends Serializable { * ByteType -> java.lang.Byte * ShortType -> java.lang.Short * IntegerType -> java.lang.Integer + * LongType -> java.lang.Long * FloatType -> java.lang.Float * DoubleType -> java.lang.Double * StringType -> String @@ -171,6 +172,7 @@ trait Row extends Serializable { * ByteType -> java.lang.Byte * ShortType -> java.lang.Short * IntegerType -> java.lang.Integer + * LongType -> java.lang.Long * FloatType -> java.lang.Float * DoubleType -> java.lang.Double * StringType -> String - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (007acba -> 82aef3e)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 007acba [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans add 82aef3e [MINOR][SQL] Add missing documentation for LongType mapping No new revisions were added by this update. Summary of changes: sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala | 2 ++ 1 file changed, 2 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a30bb0c -> 3eee915)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a30bb0c [SPARK-32550][SQL][FOLLOWUP] Eliminate negative impact on HyperLogLogSuite add 3eee915 [MINOR][SQL] Add missing documentation for LongType mapping No new revisions were added by this update. Summary of changes: sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala | 2 ++ 1 file changed, 2 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (41cf1d0 -> a30bb0c)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 41cf1d0 [SPARK-32686][PYTHON] Un-deprecate inferring DataFrame schema from list of dict add a30bb0c [SPARK-32550][SQL][FOLLOWUP] Eliminate negative impact on HyperLogLogSuite No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/expressions/SpecificInternalRow.scala | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [MINOR][SQL] Add missing documentation for LongType mapping
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new d7e1746 [MINOR][SQL] Add missing documentation for LongType mapping d7e1746 is described below commit d7e1746092d643728c81475f60d543cae0e0192c Author: Yesheng Ma AuthorDate: Tue Aug 25 11:20:01 2020 +0900 [MINOR][SQL] Add missing documentation for LongType mapping ### What changes were proposed in this pull request? Added Java docs for Long data types in the Row class. ### Why are the changes needed? The Long datatype is somehow missing in Row.scala's `apply` and `get` methods. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing UTs. Closes #29534 from yeshengm/docs-fix. Authored-by: Yesheng Ma Signed-off-by: HyukjinKwon (cherry picked from commit 3eee915b474c58cff9ea108f67073ed9c0c86224) Signed-off-by: HyukjinKwon --- sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala index 180c2d1..d01b1ed 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala @@ -146,6 +146,7 @@ trait Row extends Serializable { * ByteType -> java.lang.Byte * ShortType -> java.lang.Short * IntegerType -> java.lang.Integer + * LongType -> java.lang.Long * FloatType -> java.lang.Float * DoubleType -> java.lang.Double * StringType -> String @@ -171,6 +172,7 @@ trait Row extends Serializable { * ByteType -> java.lang.Byte * ShortType -> java.lang.Short * IntegerType -> java.lang.Integer + * LongType -> java.lang.Long * FloatType -> java.lang.Float * DoubleType -> java.lang.Double * StringType -> String - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (007acba -> 82aef3e)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 007acba [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans add 82aef3e [MINOR][SQL] Add missing documentation for LongType mapping No new revisions were added by this update. Summary of changes: sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala | 2 ++ 1 file changed, 2 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a30bb0c -> 3eee915)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a30bb0c [SPARK-32550][SQL][FOLLOWUP] Eliminate negative impact on HyperLogLogSuite add 3eee915 [MINOR][SQL] Add missing documentation for LongType mapping No new revisions were added by this update. Summary of changes: sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala | 2 ++ 1 file changed, 2 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (41cf1d0 -> a30bb0c)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 41cf1d0 [SPARK-32686][PYTHON] Un-deprecate inferring DataFrame schema from list of dict add a30bb0c [SPARK-32550][SQL][FOLLOWUP] Eliminate negative impact on HyperLogLogSuite No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/expressions/SpecificInternalRow.scala | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (41cf1d0 -> a30bb0c)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 41cf1d0 [SPARK-32686][PYTHON] Un-deprecate inferring DataFrame schema from list of dict add a30bb0c [SPARK-32550][SQL][FOLLOWUP] Eliminate negative impact on HyperLogLogSuite No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/expressions/SpecificInternalRow.scala | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (41cf1d0 -> a30bb0c)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 41cf1d0 [SPARK-32686][PYTHON] Un-deprecate inferring DataFrame schema from list of dict add a30bb0c [SPARK-32550][SQL][FOLLOWUP] Eliminate negative impact on HyperLogLogSuite No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/expressions/SpecificInternalRow.scala | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (41cf1d0 -> a30bb0c)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 41cf1d0 [SPARK-32686][PYTHON] Un-deprecate inferring DataFrame schema from list of dict add a30bb0c [SPARK-32550][SQL][FOLLOWUP] Eliminate negative impact on HyperLogLogSuite No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/expressions/SpecificInternalRow.scala | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e3a88a9 -> 41cf1d0)
This is an automated email from the ASF dual-hosted git repository. cutlerb pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e3a88a9 [SPARK-32516][SQL] 'path' option cannot coexist with load()'s path parameters add 41cf1d0 [SPARK-32686][PYTHON] Un-deprecate inferring DataFrame schema from list of dict No new revisions were added by this update. Summary of changes: python/pyspark/sql/session.py | 15 --- 1 file changed, 4 insertions(+), 11 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e3a88a9 -> 41cf1d0)
This is an automated email from the ASF dual-hosted git repository. cutlerb pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e3a88a9 [SPARK-32516][SQL] 'path' option cannot coexist with load()'s path parameters add 41cf1d0 [SPARK-32686][PYTHON] Un-deprecate inferring DataFrame schema from list of dict No new revisions were added by this update. Summary of changes: python/pyspark/sql/session.py | 15 --- 1 file changed, 4 insertions(+), 11 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e3a88a9 -> 41cf1d0)
This is an automated email from the ASF dual-hosted git repository. cutlerb pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e3a88a9 [SPARK-32516][SQL] 'path' option cannot coexist with load()'s path parameters add 41cf1d0 [SPARK-32686][PYTHON] Un-deprecate inferring DataFrame schema from list of dict No new revisions were added by this update. Summary of changes: python/pyspark/sql/session.py | 15 --- 1 file changed, 4 insertions(+), 11 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e3a88a9 -> 41cf1d0)
This is an automated email from the ASF dual-hosted git repository. cutlerb pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e3a88a9 [SPARK-32516][SQL] 'path' option cannot coexist with load()'s path parameters add 41cf1d0 [SPARK-32686][PYTHON] Un-deprecate inferring DataFrame schema from list of dict No new revisions were added by this update. Summary of changes: python/pyspark/sql/session.py | 15 --- 1 file changed, 4 insertions(+), 11 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e3a88a9 -> 41cf1d0)
This is an automated email from the ASF dual-hosted git repository. cutlerb pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e3a88a9 [SPARK-32516][SQL] 'path' option cannot coexist with load()'s path parameters add 41cf1d0 [SPARK-32686][PYTHON] Un-deprecate inferring DataFrame schema from list of dict No new revisions were added by this update. Summary of changes: python/pyspark/sql/session.py | 15 --- 1 file changed, 4 insertions(+), 11 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] asfgit closed pull request #284: Update committer guide
asfgit closed pull request #284: URL: https://github.com/apache/spark-website/pull/284 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-website] branch asf-site updated: Update committer guide
This is an automated email from the ASF dual-hosted git repository. holden pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new a3f618b Update committer guide a3f618b is described below commit a3f618bec99e5b4132c6586f12c1313b34cf5b13 Author: Holden Karau AuthorDate: Mon Aug 24 10:59:16 2020 -0700 Update committer guide Update the committer guide with the new policy we voted on for when to commit. Author: Holden Karau Closes #284 from holdenk/update-committer-guide. --- committers.md| 21 + site/committers.html | 42 +- 2 files changed, 50 insertions(+), 13 deletions(-) diff --git a/committers.md b/committers.md index 2665e0a..fc6958c 100644 --- a/committers.md +++ b/committers.md @@ -132,6 +132,27 @@ In particular, if you are working on an area of the codebase you are unfamiliar Git history for that code to see who reviewed patches before. You can do this using `git log --format=full `, by examining the "Commit" field to see who committed each patch. +When to commit/merge a pull request + +PRs shall not be merged during active, on-topic discussion unless they address issues such as critical security fixes of a public vulnerability. Under extenuating circumstances, PRs may be merged during active, off-topic discussion and the discussion directed to a more appropriate venue. Time should be given prior to merging for those involved with the conversation to explain if they believe they are on-topic. + +Lazy consensus requires giving time for discussion to settle while understanding that people may not be working on Spark as their full-time job and may take holidays. It is believed that by doing this, we can limit how often people feel the need to exercise their veto. + + +All -1s with justification merit discussion. A -1 from a non-committer can be overridden only with input from multiple committers, and suitable time must be offered for any committer to raise concerns. A -1 from a committer who cannot be reached requires a consensus vote of the PMC under ASF voting rules to determine the next steps within the [ASF guidelines for code vetoes](https://www.apache.org/foundation/voting.html). + + +These policies serve to reiterate the core principle that code must not be merged with a pending veto or before a consensus has been reached (lazy or otherwise). + + +It is the PMC’s hope that vetoes continue to be infrequent, and when they occur, that all parties will take the time to build consensus prior to additional feature work. + + +Being a committer means exercising your judgement while working in a community of people with diverse views. There is nothing wrong in getting a second (or third or fourth) opinion when you are uncertain. Thank you for your dedication to the Spark project; it is appreciated by the developers and users of Spark. + + +It is hoped that these guidelines do not slow down development; rather, by removing some of the uncertainty, the goal is to make it easier for us to reach consensus. If you have ideas on how to improve these guidelines or other Spark project operating procedures, you should reach out on the dev@ list to start the discussion. + How to Merge a Pull Request Changes pushed to the master branch on Apache cannot be removed; that is, we can't force-push to diff --git a/site/committers.html b/site/committers.html index 91bc57b..ff09913 100644 --- a/site/committers.html +++ b/site/committers.html @@ -565,7 +565,23 @@ who have shown they understand and can help with these activities. Contributing to Spark. In particular, if you are working on an area of the codebase you are unfamiliar with, look at the Git history for that code to see who reviewed patches before. You can do this using -git log --format=full filename, by examining the Commit field to see who committed each patch. +git log --format=full filename, by examining the Commit field to see who committed each patch. + +When to commit/merge a pull request + +PRs shall not be merged during active, on-topic discussion unless they address issues such as critical security fixes of a public vulnerability. Under extenuating circumstances, PRs may be merged during active, off-topic discussion and the discussion directed to a more appropriate venue. Time should be given prior to merging for those involved with the conversation to explain if they believe they are on-topic. + +Lazy consensus requires giving time for discussion to settle while understanding that people may not be working on Spark as their full-time job and may take holidays. It is believed that by doing this, we can limit how often people feel the need to exercise their veto. + +All -1s with justification merit discussion. A -1 from a non-committer can be overridden only
[GitHub] [spark-website] dongjoon-hyun commented on pull request #284: Update committer guide
dongjoon-hyun commented on pull request #284: URL: https://github.com/apache/spark-website/pull/284#issuecomment-679263615 Please merge this AS-IS. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] holdenk commented on pull request #284: Update committer guide
holdenk commented on pull request #284: URL: https://github.com/apache/spark-website/pull/284#issuecomment-679259675 Ok I'm going to go ahead and commit, and if @dongjoon-hyun & @HyukjinKwon want to update with the 72 hours and can agree on the wording we can do that in a follow up PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (bc23bb7 -> e3a88a9)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from bc23bb7 [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in tests add e3a88a9 [SPARK-32516][SQL] 'path' option cannot coexist with load()'s path parameters No new revisions were added by this update. Summary of changes: docs/sql-migration-guide.md| 2 + .../org/apache/spark/sql/internal/SQLConf.scala| 12 + .../org/apache/spark/sql/DataFrameReader.scala | 30 ++-- .../spark/sql/FileBasedDataSourceSuite.scala | 21 .../sql/test/DataFrameReaderWriterSuite.scala | 56 +- 5 files changed, 93 insertions(+), 28 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (bc23bb7 -> e3a88a9)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from bc23bb7 [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in tests add e3a88a9 [SPARK-32516][SQL] 'path' option cannot coexist with load()'s path parameters No new revisions were added by this update. Summary of changes: docs/sql-migration-guide.md| 2 + .../org/apache/spark/sql/internal/SQLConf.scala| 12 + .../org/apache/spark/sql/DataFrameReader.scala | 30 ++-- .../spark/sql/FileBasedDataSourceSuite.scala | 21 .../sql/test/DataFrameReaderWriterSuite.scala | 56 +- 5 files changed, 93 insertions(+), 28 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (bc23bb7 -> e3a88a9)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from bc23bb7 [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in tests add e3a88a9 [SPARK-32516][SQL] 'path' option cannot coexist with load()'s path parameters No new revisions were added by this update. Summary of changes: docs/sql-migration-guide.md| 2 + .../org/apache/spark/sql/internal/SQLConf.scala| 12 + .../org/apache/spark/sql/DataFrameReader.scala | 30 ++-- .../spark/sql/FileBasedDataSourceSuite.scala | 21 .../sql/test/DataFrameReaderWriterSuite.scala | 56 +- 5 files changed, 93 insertions(+), 28 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (bc23bb7 -> e3a88a9)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from bc23bb7 [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in tests add e3a88a9 [SPARK-32516][SQL] 'path' option cannot coexist with load()'s path parameters No new revisions were added by this update. Summary of changes: docs/sql-migration-guide.md| 2 + .../org/apache/spark/sql/internal/SQLConf.scala| 12 + .../org/apache/spark/sql/DataFrameReader.scala | 30 ++-- .../spark/sql/FileBasedDataSourceSuite.scala | 21 .../sql/test/DataFrameReaderWriterSuite.scala | 56 +- 5 files changed, 93 insertions(+), 28 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (bc23bb7 -> e3a88a9)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from bc23bb7 [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in tests add e3a88a9 [SPARK-32516][SQL] 'path' option cannot coexist with load()'s path parameters No new revisions were added by this update. Summary of changes: docs/sql-migration-guide.md| 2 + .../org/apache/spark/sql/internal/SQLConf.scala| 12 + .../org/apache/spark/sql/DataFrameReader.scala | 30 ++-- .../spark/sql/FileBasedDataSourceSuite.scala | 21 .../sql/test/DataFrameReaderWriterSuite.scala | 56 +- 5 files changed, 93 insertions(+), 28 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (4a67f1e -> 007acba)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 4a67f1e [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in tests add 007acba [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans No new revisions were added by this update. Summary of changes: .../spark/ml/clustering/BisectingKMeans.scala | 33 ++- .../org/apache/spark/ml/clustering/KMeans.scala| 33 ++- .../spark/mllib/clustering/BisectingKMeans.scala | 47 ++ .../org/apache/spark/mllib/clustering/KMeans.scala | 33 +++ 4 files changed, 60 insertions(+), 86 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (4a67f1e -> 007acba)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 4a67f1e [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in tests add 007acba [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans No new revisions were added by this update. Summary of changes: .../spark/ml/clustering/BisectingKMeans.scala | 33 ++- .../org/apache/spark/ml/clustering/KMeans.scala| 33 ++- .../spark/mllib/clustering/BisectingKMeans.scala | 47 ++ .../org/apache/spark/mllib/clustering/KMeans.scala | 33 +++ 4 files changed, 60 insertions(+), 86 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (4a67f1e -> 007acba)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 4a67f1e [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in tests add 007acba [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans No new revisions were added by this update. Summary of changes: .../spark/ml/clustering/BisectingKMeans.scala | 33 ++- .../org/apache/spark/ml/clustering/KMeans.scala| 33 ++- .../spark/mllib/clustering/BisectingKMeans.scala | 47 ++ .../org/apache/spark/mllib/clustering/KMeans.scala | 33 +++ 4 files changed, 60 insertions(+), 86 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (4a67f1e -> 007acba)
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 4a67f1e [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in tests add 007acba [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans No new revisions were added by this update. Summary of changes: .../spark/ml/clustering/BisectingKMeans.scala | 33 ++- .../org/apache/spark/ml/clustering/KMeans.scala| 33 ++- .../spark/mllib/clustering/BisectingKMeans.scala | 47 ++ .../org/apache/spark/mllib/clustering/KMeans.scala | 33 +++ 4 files changed, 60 insertions(+), 86 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans
This is an automated email from the ASF dual-hosted git repository. huaxingao pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 007acba [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans 007acba is described below commit 007acba6e3b0e45e334bed5942692dd88c61b3ea Author: Huaxin Gao AuthorDate: Mon Aug 24 08:47:01 2020 -0700 [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans ### What changes were proposed in this pull request? backporting https://github.com/apache/spark/pull/29501 ### Why are the changes needed? avoid double caching ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? Existing tests Closes #29528 from huaxingao/kmeans_3.0. Authored-by: Huaxin Gao Signed-off-by: Huaxin Gao --- .../spark/ml/clustering/BisectingKMeans.scala | 33 ++- .../org/apache/spark/ml/clustering/KMeans.scala| 33 ++- .../spark/mllib/clustering/BisectingKMeans.scala | 47 ++ .../org/apache/spark/mllib/clustering/KMeans.scala | 33 +++ 4 files changed, 60 insertions(+), 86 deletions(-) diff --git a/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala b/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala index b649b1d..b3f2d22 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala @@ -28,9 +28,8 @@ import org.apache.spark.ml.util._ import org.apache.spark.ml.util.Instrumentation.instrumented import org.apache.spark.mllib.clustering.{BisectingKMeans => MLlibBisectingKMeans, BisectingKMeansModel => MLlibBisectingKMeansModel} -import org.apache.spark.mllib.linalg.{Vector => OldVector, Vectors => OldVectors} +import org.apache.spark.mllib.linalg.{Vectors => OldVectors} import org.apache.spark.mllib.linalg.VectorImplicits._ -import org.apache.spark.rdd.RDD import org.apache.spark.sql.{DataFrame, Dataset, Row} import org.apache.spark.sql.functions._ import org.apache.spark.sql.types.{DoubleType, IntegerType, StructType} @@ -275,21 +274,6 @@ class BisectingKMeans @Since("2.0.0") ( override def fit(dataset: Dataset[_]): BisectingKMeansModel = instrumented { instr => transformSchema(dataset.schema, logging = true) -val handlePersistence = dataset.storageLevel == StorageLevel.NONE -val w = if (isDefined(weightCol) && $(weightCol).nonEmpty) { - col($(weightCol)).cast(DoubleType) -} else { - lit(1.0) -} - -val instances: RDD[(OldVector, Double)] = dataset - .select(DatasetUtils.columnToVector(dataset, getFeaturesCol), w).rdd.map { - case Row(point: Vector, weight: Double) => (OldVectors.fromML(point), weight) -} -if (handlePersistence) { - instances.persist(StorageLevel.MEMORY_AND_DISK) -} - instr.logPipelineStage(this) instr.logDataset(dataset) instr.logParams(this, featuresCol, predictionCol, k, maxIter, seed, @@ -301,11 +285,18 @@ class BisectingKMeans @Since("2.0.0") ( .setMinDivisibleClusterSize($(minDivisibleClusterSize)) .setSeed($(seed)) .setDistanceMeasure($(distanceMeasure)) -val parentModel = bkm.runWithWeight(instances, Some(instr)) -val model = copyValues(new BisectingKMeansModel(uid, parentModel).setParent(this)) -if (handlePersistence) { - instances.unpersist() + +val w = if (isDefined(weightCol) && $(weightCol).nonEmpty) { + col($(weightCol)).cast(DoubleType) +} else { + lit(1.0) } +val instances = dataset.select(DatasetUtils.columnToVector(dataset, getFeaturesCol), w) + .rdd.map { case Row(point: Vector, weight: Double) => (OldVectors.fromML(point), weight) } + +val handlePersistence = dataset.storageLevel == StorageLevel.NONE +val parentModel = bkm.runWithWeight(instances, handlePersistence, Some(instr)) +val model = copyValues(new BisectingKMeansModel(uid, parentModel).setParent(this)) val summary = new BisectingKMeansSummary( model.transform(dataset), diff --git a/mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala b/mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala index 5370318..e182f3d 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala @@ -31,7 +31,6 @@ import org.apache.spark.ml.util.Instrumentation.instrumented import org.apache.spark.mllib.clustering.{DistanceMeasure, KMeans => MLlibKMeans, KMeansModel => MLlibKMeansModel} import org.apache.spark.mllib.linalg.{Vector => OldVector, Vectors => OldVectors} import org.apache.spark.mllib.linalg.VectorImplicits._ -import org.apache.spark.rdd.RDD import
[spark] branch branch-3.0 updated (8aa644e -> 4a67f1e)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 8aa644e [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code add 4a67f1e [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in tests No new revisions were added by this update. Summary of changes: .../apache/spark/storage/BlockManagerSuite.scala | 26 +++-- .../apache/spark/storage/MemoryStoreSuite.scala| 29 +-- .../org/apache/spark/util/SizeEstimatorSuite.scala | 43 ++ 3 files changed, 75 insertions(+), 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (8aa644e -> 4a67f1e)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 8aa644e [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code add 4a67f1e [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in tests No new revisions were added by this update. Summary of changes: .../apache/spark/storage/BlockManagerSuite.scala | 26 +++-- .../apache/spark/storage/MemoryStoreSuite.scala| 29 +-- .../org/apache/spark/util/SizeEstimatorSuite.scala | 43 ++ 3 files changed, 75 insertions(+), 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (08b951b -> bc23bb7)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 08b951b [SPARK-32649][SQL] Optimize BHJ/SHJ inner/semi join with empty hashed relation add bc23bb7 [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in tests No new revisions were added by this update. Summary of changes: .../apache/spark/storage/BlockManagerSuite.scala | 26 +++-- .../apache/spark/storage/MemoryStoreSuite.scala| 29 +-- .../org/apache/spark/util/SizeEstimatorSuite.scala | 43 ++ 3 files changed, 75 insertions(+), 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (8aa644e -> 4a67f1e)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 8aa644e [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code add 4a67f1e [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in tests No new revisions were added by this update. Summary of changes: .../apache/spark/storage/BlockManagerSuite.scala | 26 +++-- .../apache/spark/storage/MemoryStoreSuite.scala| 29 +-- .../org/apache/spark/util/SizeEstimatorSuite.scala | 43 ++ 3 files changed, 75 insertions(+), 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (08b951b -> bc23bb7)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 08b951b [SPARK-32649][SQL] Optimize BHJ/SHJ inner/semi join with empty hashed relation add bc23bb7 [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in tests No new revisions were added by this update. Summary of changes: .../apache/spark/storage/BlockManagerSuite.scala | 26 +++-- .../apache/spark/storage/MemoryStoreSuite.scala| 29 +-- .../org/apache/spark/util/SizeEstimatorSuite.scala | 43 ++ 3 files changed, 75 insertions(+), 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (8aa644e -> 4a67f1e)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 8aa644e [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code add 4a67f1e [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in tests No new revisions were added by this update. Summary of changes: .../apache/spark/storage/BlockManagerSuite.scala | 26 +++-- .../apache/spark/storage/MemoryStoreSuite.scala| 29 +-- .../org/apache/spark/util/SizeEstimatorSuite.scala | 43 ++ 3 files changed, 75 insertions(+), 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (08b951b -> bc23bb7)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 08b951b [SPARK-32649][SQL] Optimize BHJ/SHJ inner/semi join with empty hashed relation add bc23bb7 [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in tests No new revisions were added by this update. Summary of changes: .../apache/spark/storage/BlockManagerSuite.scala | 26 +++-- .../apache/spark/storage/MemoryStoreSuite.scala| 29 +-- .../org/apache/spark/util/SizeEstimatorSuite.scala | 43 ++ 3 files changed, 75 insertions(+), 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (8aa644e -> 4a67f1e)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 8aa644e [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code add 4a67f1e [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in tests No new revisions were added by this update. Summary of changes: .../apache/spark/storage/BlockManagerSuite.scala | 26 +++-- .../apache/spark/storage/MemoryStoreSuite.scala| 29 +-- .../org/apache/spark/util/SizeEstimatorSuite.scala | 43 ++ 3 files changed, 75 insertions(+), 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (08b951b -> bc23bb7)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 08b951b [SPARK-32649][SQL] Optimize BHJ/SHJ inner/semi join with empty hashed relation add bc23bb7 [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in tests No new revisions were added by this update. Summary of changes: .../apache/spark/storage/BlockManagerSuite.scala | 26 +++-- .../apache/spark/storage/MemoryStoreSuite.scala| 29 +-- .../org/apache/spark/util/SizeEstimatorSuite.scala | 43 ++ 3 files changed, 75 insertions(+), 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (08b951b -> bc23bb7)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 08b951b [SPARK-32649][SQL] Optimize BHJ/SHJ inner/semi join with empty hashed relation add bc23bb7 [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in tests No new revisions were added by this update. Summary of changes: .../apache/spark/storage/BlockManagerSuite.scala | 26 +++-- .../apache/spark/storage/MemoryStoreSuite.scala| 29 +-- .../org/apache/spark/util/SizeEstimatorSuite.scala | 43 ++ 3 files changed, 75 insertions(+), 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (11c6a23 -> 08b951b)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 11c6a23 [SPARK-32352][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Exclude partition columns from data columns add 08b951b [SPARK-32649][SQL] Optimize BHJ/SHJ inner/semi join with empty hashed relation No new revisions were added by this update. Summary of changes: .../execution/adaptive/AdaptiveSparkPlanExec.scala | 2 +- .../adaptive/EliminateJoinToEmptyRelation.scala| 57 ++ .../adaptive/EliminateNullAwareAntiJoin.scala | 41 .../spark/sql/execution/joins/HashJoin.scala | 28 --- .../scala/org/apache/spark/sql/JoinSuite.scala | 54 +++- .../adaptive/AdaptiveQueryExecSuite.scala | 28 ++- 6 files changed, 157 insertions(+), 53 deletions(-) create mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/EliminateJoinToEmptyRelation.scala delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/EliminateNullAwareAntiJoin.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (11c6a23 -> 08b951b)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 11c6a23 [SPARK-32352][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Exclude partition columns from data columns add 08b951b [SPARK-32649][SQL] Optimize BHJ/SHJ inner/semi join with empty hashed relation No new revisions were added by this update. Summary of changes: .../execution/adaptive/AdaptiveSparkPlanExec.scala | 2 +- .../adaptive/EliminateJoinToEmptyRelation.scala| 57 ++ .../adaptive/EliminateNullAwareAntiJoin.scala | 41 .../spark/sql/execution/joins/HashJoin.scala | 28 --- .../scala/org/apache/spark/sql/JoinSuite.scala | 54 +++- .../adaptive/AdaptiveQueryExecSuite.scala | 28 ++- 6 files changed, 157 insertions(+), 53 deletions(-) create mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/EliminateJoinToEmptyRelation.scala delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/EliminateNullAwareAntiJoin.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (11c6a23 -> 08b951b)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 11c6a23 [SPARK-32352][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Exclude partition columns from data columns add 08b951b [SPARK-32649][SQL] Optimize BHJ/SHJ inner/semi join with empty hashed relation No new revisions were added by this update. Summary of changes: .../execution/adaptive/AdaptiveSparkPlanExec.scala | 2 +- .../adaptive/EliminateJoinToEmptyRelation.scala| 57 ++ .../adaptive/EliminateNullAwareAntiJoin.scala | 41 .../spark/sql/execution/joins/HashJoin.scala | 28 --- .../scala/org/apache/spark/sql/JoinSuite.scala | 54 +++- .../adaptive/AdaptiveQueryExecSuite.scala | 28 ++- 6 files changed, 157 insertions(+), 53 deletions(-) create mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/EliminateJoinToEmptyRelation.scala delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/EliminateNullAwareAntiJoin.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (11c6a23 -> 08b951b)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 11c6a23 [SPARK-32352][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Exclude partition columns from data columns add 08b951b [SPARK-32649][SQL] Optimize BHJ/SHJ inner/semi join with empty hashed relation No new revisions were added by this update. Summary of changes: .../execution/adaptive/AdaptiveSparkPlanExec.scala | 2 +- .../adaptive/EliminateJoinToEmptyRelation.scala| 57 ++ .../adaptive/EliminateNullAwareAntiJoin.scala | 41 .../spark/sql/execution/joins/HashJoin.scala | 28 --- .../scala/org/apache/spark/sql/JoinSuite.scala | 54 +++- .../adaptive/AdaptiveQueryExecSuite.scala | 28 ++- 6 files changed, 157 insertions(+), 53 deletions(-) create mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/EliminateJoinToEmptyRelation.scala delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/EliminateNullAwareAntiJoin.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (11c6a23 -> 08b951b)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 11c6a23 [SPARK-32352][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Exclude partition columns from data columns add 08b951b [SPARK-32649][SQL] Optimize BHJ/SHJ inner/semi join with empty hashed relation No new revisions were added by this update. Summary of changes: .../execution/adaptive/AdaptiveSparkPlanExec.scala | 2 +- .../adaptive/EliminateJoinToEmptyRelation.scala| 57 ++ .../adaptive/EliminateNullAwareAntiJoin.scala | 41 .../spark/sql/execution/joins/HashJoin.scala | 28 --- .../scala/org/apache/spark/sql/JoinSuite.scala | 54 +++- .../adaptive/AdaptiveQueryExecSuite.scala | 28 ++- 6 files changed, 157 insertions(+), 53 deletions(-) create mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/EliminateJoinToEmptyRelation.scala delete mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/EliminateNullAwareAntiJoin.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org