[spark] branch master updated (657e39a -> 7fdb571)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 657e39a [SPARK-32897][PYTHON] Don't show a deprecation warning at SparkSession.builder.getOrCreate add 7fdb571 [SPARK-32890][SQL] Pass all `sql/hive` module UTs in Scala 2.13 No new revisions were added by this update. Summary of changes: .../resources/regression-test-SPARK-8489/test-2.13.jar | Bin 0 -> 19579 bytes .../spark/sql/hive/HiveSchemaInferenceSuite.scala | 2 +- .../apache/spark/sql/hive/HiveSparkSubmitSuite.scala| 2 +- .../org/apache/spark/sql/hive/StatisticsSuite.scala | 2 +- .../apache/spark/sql/hive/execution/HiveDDLSuite.scala | 2 +- 5 files changed, 4 insertions(+), 4 deletions(-) create mode 100644 sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.13.jar - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (316242b -> 6f36db1)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 316242b [SPARK-32874][SQL][TEST] Enhance result set meta data check for execute statement operation with thrift server add 6f36db1 [SPARK-31448][PYTHON] Fix storage level used in persist() in dataframe.py No new revisions were added by this update. Summary of changes: python/pyspark/sql/dataframe.py | 7 --- python/pyspark/storagelevel.py | 1 + 2 files changed, 5 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (316242b -> 6f36db1)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 316242b [SPARK-32874][SQL][TEST] Enhance result set meta data check for execute statement operation with thrift server add 6f36db1 [SPARK-31448][PYTHON] Fix storage level used in persist() in dataframe.py No new revisions were added by this update. Summary of changes: python/pyspark/sql/dataframe.py | 7 --- python/pyspark/storagelevel.py | 1 + 2 files changed, 5 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (316242b -> 6f36db1)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 316242b [SPARK-32874][SQL][TEST] Enhance result set meta data check for execute statement operation with thrift server add 6f36db1 [SPARK-31448][PYTHON] Fix storage level used in persist() in dataframe.py No new revisions were added by this update. Summary of changes: python/pyspark/sql/dataframe.py | 7 --- python/pyspark/storagelevel.py | 1 + 2 files changed, 5 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (316242b -> 6f36db1)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 316242b [SPARK-32874][SQL][TEST] Enhance result set meta data check for execute statement operation with thrift server add 6f36db1 [SPARK-31448][PYTHON] Fix storage level used in persist() in dataframe.py No new revisions were added by this update. Summary of changes: python/pyspark/sql/dataframe.py | 7 --- python/pyspark/storagelevel.py | 1 + 2 files changed, 5 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (316242b -> 6f36db1)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 316242b [SPARK-32874][SQL][TEST] Enhance result set meta data check for execute statement operation with thrift server add 6f36db1 [SPARK-31448][PYTHON] Fix storage level used in persist() in dataframe.py No new revisions were added by this update. Summary of changes: python/pyspark/sql/dataframe.py | 7 --- python/pyspark/storagelevel.py | 1 + 2 files changed, 5 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (bbbd907 -> 3be552c)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from bbbd907 [SPARK-32804][LAUNCHER] Fix run-example command builder bug add 3be552c [SPARK-30090][SHELL] Adapt Spark REPL to Scala 2.13 No new revisions were added by this update. Summary of changes: .../org/apache/spark/repl/Main.scala | 0 .../org/apache/spark/repl/SparkILoop.scala | 0 .../org/apache/spark/repl/Main.scala | 22 ++- .../org/apache/spark/repl/SparkILoop.scala | 149 ++ .../org/apache/spark/repl/Repl2Suite.scala | 58 +++ .../apache/spark/repl/SingletonRepl2Suite.scala| 171 + .../org/apache/spark/repl/Repl2Suite.scala | 53 +++ .../apache/spark/repl/SingletonRepl2Suite.scala| 171 + .../scala/org/apache/spark/repl/ReplSuite.scala| 27 .../org/apache/spark/repl/SingletonReplSuite.scala | 61 .../sql/catalyst/util/CaseInsensitiveMap.scala | 2 +- 11 files changed, 618 insertions(+), 96 deletions(-) copy repl/src/main/{scala => scala-2.12}/org/apache/spark/repl/Main.scala (100%) rename repl/src/main/{scala => scala-2.12}/org/apache/spark/repl/SparkILoop.scala (100%) rename repl/src/main/{scala => scala-2.13}/org/apache/spark/repl/Main.scala (89%) create mode 100644 repl/src/main/scala-2.13/org/apache/spark/repl/SparkILoop.scala create mode 100644 repl/src/test/scala-2.12/org/apache/spark/repl/Repl2Suite.scala create mode 100644 repl/src/test/scala-2.12/org/apache/spark/repl/SingletonRepl2Suite.scala create mode 100644 repl/src/test/scala-2.13/org/apache/spark/repl/Repl2Suite.scala create mode 100644 repl/src/test/scala-2.13/org/apache/spark/repl/SingletonRepl2Suite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (bbbd907 -> 3be552c)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from bbbd907 [SPARK-32804][LAUNCHER] Fix run-example command builder bug add 3be552c [SPARK-30090][SHELL] Adapt Spark REPL to Scala 2.13 No new revisions were added by this update. Summary of changes: .../org/apache/spark/repl/Main.scala | 0 .../org/apache/spark/repl/SparkILoop.scala | 0 .../org/apache/spark/repl/Main.scala | 22 ++- .../org/apache/spark/repl/SparkILoop.scala | 149 ++ .../org/apache/spark/repl/Repl2Suite.scala | 58 +++ .../apache/spark/repl/SingletonRepl2Suite.scala| 171 + .../org/apache/spark/repl/Repl2Suite.scala | 53 +++ .../apache/spark/repl/SingletonRepl2Suite.scala| 171 + .../scala/org/apache/spark/repl/ReplSuite.scala| 27 .../org/apache/spark/repl/SingletonReplSuite.scala | 61 .../sql/catalyst/util/CaseInsensitiveMap.scala | 2 +- 11 files changed, 618 insertions(+), 96 deletions(-) copy repl/src/main/{scala => scala-2.12}/org/apache/spark/repl/Main.scala (100%) rename repl/src/main/{scala => scala-2.12}/org/apache/spark/repl/SparkILoop.scala (100%) rename repl/src/main/{scala => scala-2.13}/org/apache/spark/repl/Main.scala (89%) create mode 100644 repl/src/main/scala-2.13/org/apache/spark/repl/SparkILoop.scala create mode 100644 repl/src/test/scala-2.12/org/apache/spark/repl/Repl2Suite.scala create mode 100644 repl/src/test/scala-2.12/org/apache/spark/repl/SingletonRepl2Suite.scala create mode 100644 repl/src/test/scala-2.13/org/apache/spark/repl/Repl2Suite.scala create mode 100644 repl/src/test/scala-2.13/org/apache/spark/repl/SingletonRepl2Suite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (bbbd907 -> 3be552c)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from bbbd907 [SPARK-32804][LAUNCHER] Fix run-example command builder bug add 3be552c [SPARK-30090][SHELL] Adapt Spark REPL to Scala 2.13 No new revisions were added by this update. Summary of changes: .../org/apache/spark/repl/Main.scala | 0 .../org/apache/spark/repl/SparkILoop.scala | 0 .../org/apache/spark/repl/Main.scala | 22 ++- .../org/apache/spark/repl/SparkILoop.scala | 149 ++ .../org/apache/spark/repl/Repl2Suite.scala | 58 +++ .../apache/spark/repl/SingletonRepl2Suite.scala| 171 + .../org/apache/spark/repl/Repl2Suite.scala | 53 +++ .../apache/spark/repl/SingletonRepl2Suite.scala| 171 + .../scala/org/apache/spark/repl/ReplSuite.scala| 27 .../org/apache/spark/repl/SingletonReplSuite.scala | 61 .../sql/catalyst/util/CaseInsensitiveMap.scala | 2 +- 11 files changed, 618 insertions(+), 96 deletions(-) copy repl/src/main/{scala => scala-2.12}/org/apache/spark/repl/Main.scala (100%) rename repl/src/main/{scala => scala-2.12}/org/apache/spark/repl/SparkILoop.scala (100%) rename repl/src/main/{scala => scala-2.13}/org/apache/spark/repl/Main.scala (89%) create mode 100644 repl/src/main/scala-2.13/org/apache/spark/repl/SparkILoop.scala create mode 100644 repl/src/test/scala-2.12/org/apache/spark/repl/Repl2Suite.scala create mode 100644 repl/src/test/scala-2.12/org/apache/spark/repl/SingletonRepl2Suite.scala create mode 100644 repl/src/test/scala-2.13/org/apache/spark/repl/Repl2Suite.scala create mode 100644 repl/src/test/scala-2.13/org/apache/spark/repl/SingletonRepl2Suite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (bbbd907 -> 3be552c)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from bbbd907 [SPARK-32804][LAUNCHER] Fix run-example command builder bug add 3be552c [SPARK-30090][SHELL] Adapt Spark REPL to Scala 2.13 No new revisions were added by this update. Summary of changes: .../org/apache/spark/repl/Main.scala | 0 .../org/apache/spark/repl/SparkILoop.scala | 0 .../org/apache/spark/repl/Main.scala | 22 ++- .../org/apache/spark/repl/SparkILoop.scala | 149 ++ .../org/apache/spark/repl/Repl2Suite.scala | 58 +++ .../apache/spark/repl/SingletonRepl2Suite.scala| 171 + .../org/apache/spark/repl/Repl2Suite.scala | 53 +++ .../apache/spark/repl/SingletonRepl2Suite.scala| 171 + .../scala/org/apache/spark/repl/ReplSuite.scala| 27 .../org/apache/spark/repl/SingletonReplSuite.scala | 61 .../sql/catalyst/util/CaseInsensitiveMap.scala | 2 +- 11 files changed, 618 insertions(+), 96 deletions(-) copy repl/src/main/{scala => scala-2.12}/org/apache/spark/repl/Main.scala (100%) rename repl/src/main/{scala => scala-2.12}/org/apache/spark/repl/SparkILoop.scala (100%) rename repl/src/main/{scala => scala-2.13}/org/apache/spark/repl/Main.scala (89%) create mode 100644 repl/src/main/scala-2.13/org/apache/spark/repl/SparkILoop.scala create mode 100644 repl/src/test/scala-2.12/org/apache/spark/repl/Repl2Suite.scala create mode 100644 repl/src/test/scala-2.12/org/apache/spark/repl/SingletonRepl2Suite.scala create mode 100644 repl/src/test/scala-2.13/org/apache/spark/repl/Repl2Suite.scala create mode 100644 repl/src/test/scala-2.13/org/apache/spark/repl/SingletonRepl2Suite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (bbbd907 -> 3be552c)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from bbbd907 [SPARK-32804][LAUNCHER] Fix run-example command builder bug add 3be552c [SPARK-30090][SHELL] Adapt Spark REPL to Scala 2.13 No new revisions were added by this update. Summary of changes: .../org/apache/spark/repl/Main.scala | 0 .../org/apache/spark/repl/SparkILoop.scala | 0 .../org/apache/spark/repl/Main.scala | 22 ++- .../org/apache/spark/repl/SparkILoop.scala | 149 ++ .../org/apache/spark/repl/Repl2Suite.scala | 58 +++ .../apache/spark/repl/SingletonRepl2Suite.scala| 171 + .../org/apache/spark/repl/Repl2Suite.scala | 53 +++ .../apache/spark/repl/SingletonRepl2Suite.scala| 171 + .../scala/org/apache/spark/repl/ReplSuite.scala| 27 .../org/apache/spark/repl/SingletonReplSuite.scala | 61 .../sql/catalyst/util/CaseInsensitiveMap.scala | 2 +- 11 files changed, 618 insertions(+), 96 deletions(-) copy repl/src/main/{scala => scala-2.12}/org/apache/spark/repl/Main.scala (100%) rename repl/src/main/{scala => scala-2.12}/org/apache/spark/repl/SparkILoop.scala (100%) rename repl/src/main/{scala => scala-2.13}/org/apache/spark/repl/Main.scala (89%) create mode 100644 repl/src/main/scala-2.13/org/apache/spark/repl/SparkILoop.scala create mode 100644 repl/src/test/scala-2.12/org/apache/spark/repl/Repl2Suite.scala create mode 100644 repl/src/test/scala-2.12/org/apache/spark/repl/SingletonRepl2Suite.scala create mode 100644 repl/src/test/scala-2.13/org/apache/spark/repl/Repl2Suite.scala create mode 100644 repl/src/test/scala-2.13/org/apache/spark/repl/SingletonRepl2Suite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (2009f95 -> bbbd907)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2009f95 [SPARK-32779][SQL][FOLLOW-UP] Delete Unused code add bbbd907 [SPARK-32804][LAUNCHER] Fix run-example command builder bug No new revisions were added by this update. Summary of changes: .../spark/launcher/SparkSubmitCommandBuilder.java | 15 +-- .../spark/launcher/SparkSubmitCommandBuilderSuite.java | 18 ++ 2 files changed, 31 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (2009f95 -> bbbd907)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2009f95 [SPARK-32779][SQL][FOLLOW-UP] Delete Unused code add bbbd907 [SPARK-32804][LAUNCHER] Fix run-example command builder bug No new revisions were added by this update. Summary of changes: .../spark/launcher/SparkSubmitCommandBuilder.java | 15 +-- .../spark/launcher/SparkSubmitCommandBuilderSuite.java | 18 ++ 2 files changed, 31 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (2009f95 -> bbbd907)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2009f95 [SPARK-32779][SQL][FOLLOW-UP] Delete Unused code add bbbd907 [SPARK-32804][LAUNCHER] Fix run-example command builder bug No new revisions were added by this update. Summary of changes: .../spark/launcher/SparkSubmitCommandBuilder.java | 15 +-- .../spark/launcher/SparkSubmitCommandBuilderSuite.java | 18 ++ 2 files changed, 31 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (2009f95 -> bbbd907)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2009f95 [SPARK-32779][SQL][FOLLOW-UP] Delete Unused code add bbbd907 [SPARK-32804][LAUNCHER] Fix run-example command builder bug No new revisions were added by this update. Summary of changes: .../spark/launcher/SparkSubmitCommandBuilder.java | 15 +-- .../spark/launcher/SparkSubmitCommandBuilderSuite.java | 18 ++ 2 files changed, 31 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (2009f95 -> bbbd907)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2009f95 [SPARK-32779][SQL][FOLLOW-UP] Delete Unused code add bbbd907 [SPARK-32804][LAUNCHER] Fix run-example command builder bug No new revisions were added by this update. Summary of changes: .../spark/launcher/SparkSubmitCommandBuilder.java | 15 +-- .../spark/launcher/SparkSubmitCommandBuilderSuite.java | 18 ++ 2 files changed, 31 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (94cac59 -> f6322d1)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 94cac59 [SPARK-32730][SQL][FOLLOW-UP] Improve LeftAnti SortMergeJoin right side buffering add f6322d1 [SPARK-32180][PYTHON][DOCS] Installation page of Getting Started in PySpark documentation No new revisions were added by this update. Summary of changes: python/docs/source/getting_started/index.rst | 3 + .../docs/source/getting_started/installation.rst | 114 + 2 files changed, 117 insertions(+) create mode 100644 python/docs/source/getting_started/installation.rst - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (94cac59 -> f6322d1)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 94cac59 [SPARK-32730][SQL][FOLLOW-UP] Improve LeftAnti SortMergeJoin right side buffering add f6322d1 [SPARK-32180][PYTHON][DOCS] Installation page of Getting Started in PySpark documentation No new revisions were added by this update. Summary of changes: python/docs/source/getting_started/index.rst | 3 + .../docs/source/getting_started/installation.rst | 114 + 2 files changed, 117 insertions(+) create mode 100644 python/docs/source/getting_started/installation.rst - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (94cac59 -> f6322d1)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 94cac59 [SPARK-32730][SQL][FOLLOW-UP] Improve LeftAnti SortMergeJoin right side buffering add f6322d1 [SPARK-32180][PYTHON][DOCS] Installation page of Getting Started in PySpark documentation No new revisions were added by this update. Summary of changes: python/docs/source/getting_started/index.rst | 3 + .../docs/source/getting_started/installation.rst | 114 + 2 files changed, 117 insertions(+) create mode 100644 python/docs/source/getting_started/installation.rst - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (94cac59 -> f6322d1)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 94cac59 [SPARK-32730][SQL][FOLLOW-UP] Improve LeftAnti SortMergeJoin right side buffering add f6322d1 [SPARK-32180][PYTHON][DOCS] Installation page of Getting Started in PySpark documentation No new revisions were added by this update. Summary of changes: python/docs/source/getting_started/index.rst | 3 + .../docs/source/getting_started/installation.rst | 114 + 2 files changed, 117 insertions(+) create mode 100644 python/docs/source/getting_started/installation.rst - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (94cac59 -> f6322d1)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 94cac59 [SPARK-32730][SQL][FOLLOW-UP] Improve LeftAnti SortMergeJoin right side buffering add f6322d1 [SPARK-32180][PYTHON][DOCS] Installation page of Getting Started in PySpark documentation No new revisions were added by this update. Summary of changes: python/docs/source/getting_started/index.rst | 3 + .../docs/source/getting_started/installation.rst | 114 + 2 files changed, 117 insertions(+) create mode 100644 python/docs/source/getting_started/installation.rst - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (328d81a -> fe2ab25)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 328d81a [SPARK-32677][SQL][DOCS][MINOR] Improve code comment in CreateFunctionCommand add fe2ab25 [MINOR][SQL] Fix a typo at 'spark.sql.sources.fileCompressionFactor' error message in SQLConf No new revisions were added by this update. Summary of changes: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (328d81a -> fe2ab25)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 328d81a [SPARK-32677][SQL][DOCS][MINOR] Improve code comment in CreateFunctionCommand add fe2ab25 [MINOR][SQL] Fix a typo at 'spark.sql.sources.fileCompressionFactor' error message in SQLConf No new revisions were added by this update. Summary of changes: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (328d81a -> fe2ab25)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 328d81a [SPARK-32677][SQL][DOCS][MINOR] Improve code comment in CreateFunctionCommand add fe2ab25 [MINOR][SQL] Fix a typo at 'spark.sql.sources.fileCompressionFactor' error message in SQLConf No new revisions were added by this update. Summary of changes: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (328d81a -> fe2ab25)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 328d81a [SPARK-32677][SQL][DOCS][MINOR] Improve code comment in CreateFunctionCommand add fe2ab25 [MINOR][SQL] Fix a typo at 'spark.sql.sources.fileCompressionFactor' error message in SQLConf No new revisions were added by this update. Summary of changes: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (328d81a -> fe2ab25)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 328d81a [SPARK-32677][SQL][DOCS][MINOR] Improve code comment in CreateFunctionCommand add fe2ab25 [MINOR][SQL] Fix a typo at 'spark.sql.sources.fileCompressionFactor' error message in SQLConf No new revisions were added by this update. Summary of changes: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (794b48c -> 513d51a)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 794b48c [SPARK-32204][SPARK-32182][DOCS][FOLLOW-UP] Use IPython instead of ipython to check if installed in dev/lint-python add 513d51a [SPARK-32808][SQL] Fix some test cases of `sql/core` module in scala 2.13 No new revisions were added by this update. Summary of changes: .../storage/ShuffleBlockFetcherIterator.scala | 2 +- .../sql/catalyst/expressions/objects/objects.scala | 4 +- .../spark/sql/catalyst/plans/QueryPlan.scala | 4 +- .../spark/sql/catalyst/util/GenericArrayData.scala | 8 +++- .../scala/org/apache/spark/sql/types/Decimal.scala | 4 +- .../spark/sql/RelationalGroupedDataset.scala | 6 ++- .../apache/spark/sql/execution/GenerateExec.scala | 2 +- .../sql-functions/sql-expression-schema.md | 46 +++--- .../org/apache/spark/sql/DataFrameStatSuite.scala | 4 +- .../apache/spark/sql/ExpressionsSchemaSuite.scala | 4 +- .../test/scala/org/apache/spark/sql/UDFSuite.scala | 2 +- .../execution/datasources/orc/OrcQuerySuite.scala | 6 +-- 12 files changed, 51 insertions(+), 41 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (794b48c -> 513d51a)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 794b48c [SPARK-32204][SPARK-32182][DOCS][FOLLOW-UP] Use IPython instead of ipython to check if installed in dev/lint-python add 513d51a [SPARK-32808][SQL] Fix some test cases of `sql/core` module in scala 2.13 No new revisions were added by this update. Summary of changes: .../storage/ShuffleBlockFetcherIterator.scala | 2 +- .../sql/catalyst/expressions/objects/objects.scala | 4 +- .../spark/sql/catalyst/plans/QueryPlan.scala | 4 +- .../spark/sql/catalyst/util/GenericArrayData.scala | 8 +++- .../scala/org/apache/spark/sql/types/Decimal.scala | 4 +- .../spark/sql/RelationalGroupedDataset.scala | 6 ++- .../apache/spark/sql/execution/GenerateExec.scala | 2 +- .../sql-functions/sql-expression-schema.md | 46 +++--- .../org/apache/spark/sql/DataFrameStatSuite.scala | 4 +- .../apache/spark/sql/ExpressionsSchemaSuite.scala | 4 +- .../test/scala/org/apache/spark/sql/UDFSuite.scala | 2 +- .../execution/datasources/orc/OrcQuerySuite.scala | 6 +-- 12 files changed, 51 insertions(+), 41 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (794b48c -> 513d51a)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 794b48c [SPARK-32204][SPARK-32182][DOCS][FOLLOW-UP] Use IPython instead of ipython to check if installed in dev/lint-python add 513d51a [SPARK-32808][SQL] Fix some test cases of `sql/core` module in scala 2.13 No new revisions were added by this update. Summary of changes: .../storage/ShuffleBlockFetcherIterator.scala | 2 +- .../sql/catalyst/expressions/objects/objects.scala | 4 +- .../spark/sql/catalyst/plans/QueryPlan.scala | 4 +- .../spark/sql/catalyst/util/GenericArrayData.scala | 8 +++- .../scala/org/apache/spark/sql/types/Decimal.scala | 4 +- .../spark/sql/RelationalGroupedDataset.scala | 6 ++- .../apache/spark/sql/execution/GenerateExec.scala | 2 +- .../sql-functions/sql-expression-schema.md | 46 +++--- .../org/apache/spark/sql/DataFrameStatSuite.scala | 4 +- .../apache/spark/sql/ExpressionsSchemaSuite.scala | 4 +- .../test/scala/org/apache/spark/sql/UDFSuite.scala | 2 +- .../execution/datasources/orc/OrcQuerySuite.scala | 6 +-- 12 files changed, 51 insertions(+), 41 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (794b48c -> 513d51a)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 794b48c [SPARK-32204][SPARK-32182][DOCS][FOLLOW-UP] Use IPython instead of ipython to check if installed in dev/lint-python add 513d51a [SPARK-32808][SQL] Fix some test cases of `sql/core` module in scala 2.13 No new revisions were added by this update. Summary of changes: .../storage/ShuffleBlockFetcherIterator.scala | 2 +- .../sql/catalyst/expressions/objects/objects.scala | 4 +- .../spark/sql/catalyst/plans/QueryPlan.scala | 4 +- .../spark/sql/catalyst/util/GenericArrayData.scala | 8 +++- .../scala/org/apache/spark/sql/types/Decimal.scala | 4 +- .../spark/sql/RelationalGroupedDataset.scala | 6 ++- .../apache/spark/sql/execution/GenerateExec.scala | 2 +- .../sql-functions/sql-expression-schema.md | 46 +++--- .../org/apache/spark/sql/DataFrameStatSuite.scala | 4 +- .../apache/spark/sql/ExpressionsSchemaSuite.scala | 4 +- .../test/scala/org/apache/spark/sql/UDFSuite.scala | 2 +- .../execution/datasources/orc/OrcQuerySuite.scala | 6 +-- 12 files changed, 51 insertions(+), 41 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-32808][SQL] Fix some test cases of `sql/core` module in scala 2.13
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 513d51a [SPARK-32808][SQL] Fix some test cases of `sql/core` module in scala 2.13 513d51a is described below commit 513d51a2c5dd2c7ff2c2fadc26ec122883372be1 Author: yangjie01 AuthorDate: Wed Sep 9 08:53:44 2020 -0500 [SPARK-32808][SQL] Fix some test cases of `sql/core` module in scala 2.13 ### What changes were proposed in this pull request? The purpose of this pr is to partial resolve [SPARK-32808](https://issues.apache.org/jira/browse/SPARK-32808), total of 26 failed test cases were fixed, the related suite as follow: - `StreamingAggregationSuite` related test cases (2 FAILED -> Pass) - `GeneratorFunctionSuite` related test cases (2 FAILED -> Pass) - `UDFSuite` related test cases (2 FAILED -> Pass) - `SQLQueryTestSuite` related test cases (5 FAILED -> Pass) - `WholeStageCodegenSuite` related test cases (1 FAILED -> Pass) - `DataFrameSuite` related test cases (3 FAILED -> Pass) - `OrcV1QuerySuite\OrcV2QuerySuite` related test cases (4 FAILED -> Pass) - `ExpressionsSchemaSuite` related test cases (1 FAILED -> Pass) - `DataFrameStatSuite` related test cases (1 FAILED -> Pass) - `JsonV1Suite\JsonV2Suite\JsonLegacyTimeParserSuite` related test cases (6 FAILED -> Pass) The main change of this pr as following: - Fix Scala 2.13 compilation problems in `ShuffleBlockFetcherIterator` and `Analyzer` - Specified `Seq` to `scala.collection.Seq` in `objects.scala` and `GenericArrayData` because internal use `Seq` maybe `mutable.ArraySeq` and not easy to call `.toSeq` - Should specified `Seq` to `scala.collection.Seq` when we call `Row.getAs[Seq]` and `Row.get(i).asInstanceOf[Seq]` because the data maybe `mutable.ArraySeq` but `Seq` is `immutable.Seq` in Scala 2.13 - Use a compatible way to let `+` and `-` method of `Decimal` having the same behavior in Scala 2.12 and Scala 2.13 - Call `toList` in `RelationalGroupedDataset.toDF` method when `groupingExprs` is `Stream` type because `Stream` can't serialize in Scala 2.13 - Add a manual sort to `classFunsMap` in `ExpressionsSchemaSuite` because `Iterable.groupBy` in Scala 2.13 has different result with `TraversableLike.groupBy` in Scala 2.12 ### Why are the changes needed? We need to support a Scala 2.13 build. ### Does this PR introduce _any_ user-facing change? Should specified `Seq` to `scala.collection.Seq` when we call `Row.getAs[Seq]` and `Row.get(i).asInstanceOf[Seq]` because the data maybe `mutable.ArraySeq` but the `Seq` is `immutable.Seq` in Scala 2.13 ### How was this patch tested? - Scala 2.12: Pass the Jenkins or GitHub Action - Scala 2.13: Do the following: ``` dev/change-scala-version.sh 2.13 mvn clean install -DskipTests -pl sql/core -Pscala-2.13 -am mvn test -pl sql/core -Pscala-2.13 ``` **Before** ``` Tests: succeeded 8166, failed 319, canceled 1, ignored 52, pending 0 *** 319 TESTS FAILED *** ``` **After** ``` Tests: succeeded 8204, failed 286, canceled 1, ignored 52, pending 0 *** 286 TESTS FAILED *** ``` Closes #29660 from LuciferYang/SPARK-32808. Authored-by: yangjie01 Signed-off-by: Sean Owen --- .../storage/ShuffleBlockFetcherIterator.scala | 2 +- .../sql/catalyst/expressions/objects/objects.scala | 4 +- .../spark/sql/catalyst/plans/QueryPlan.scala | 4 +- .../spark/sql/catalyst/util/GenericArrayData.scala | 8 +++- .../scala/org/apache/spark/sql/types/Decimal.scala | 4 +- .../spark/sql/RelationalGroupedDataset.scala | 6 ++- .../apache/spark/sql/execution/GenerateExec.scala | 2 +- .../sql-functions/sql-expression-schema.md | 46 +++--- .../org/apache/spark/sql/DataFrameStatSuite.scala | 4 +- .../apache/spark/sql/ExpressionsSchemaSuite.scala | 4 +- .../test/scala/org/apache/spark/sql/UDFSuite.scala | 2 +- .../execution/datasources/orc/OrcQuerySuite.scala | 6 +-- 12 files changed, 51 insertions(+), 41 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala b/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala index 57b6a38..e3b3fc5 100644 --- a/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala +++ b/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala @@ -495,7 +495,7 @@ final class ShuffleBlockFetcherIterator( hostLocalDirManager.getHostLocalDirs(host, port, bmIds.map(_.executorId)) {
[spark-website] branch asf-site updated: Update doc related to gpg key exports
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new be6e744 Update doc related to gpg key exports be6e744 is described below commit be6e744d336bef26beb7c22da2e01a18f19587db Author: zhengruifeng AuthorDate: Fri Sep 4 08:22:13 2020 -0500 Update doc related to gpg key exports When preparing for 3.0.1-rc, I encounted issues related to gpg keys: 1, locally: I generated keys and used `gpg --export` to export it; 2, on an AWS EC2 instance: then imported keys by `gpg --import` commands and then run the `do-release-docker.sh`. I found that the script can not find the key. That is because: according to [export-secret-key](https://infra.apache.org/openpgp.html#export-secret-key) > To ensure that you do not accidentally expose private keys, the GnuPG --export operation exports only public keys. `gpg --export` only exports **public** keys, while `do-release-docker.sh` needs a **secret/private** key. So we should use `gpg --export-secret-keys` instead `gpg --export`. ![image](https://user-images.githubusercontent.com/7322292/92091702-afcd4780-ee03-11ea-87cf-8edcf0889215.png) Author: zhengruifeng Closes #288 from zhengruifeng/fix_gpg_exports. --- release-process.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/release-process.md b/release-process.md index 2b38b0b..db40a50 100644 --- a/release-process.md +++ b/release-process.md @@ -43,8 +43,8 @@ After generating the gpg key, you need to upload your key to a public key server https://www.apache.org/dev/openpgp.html#generate-key;>https://www.apache.org/dev/openpgp.html#generate-key for details. -If you want to do the release on another machine, you can transfer your gpg key to that machine -via the `gpg --export` and `gpg --import` commands. +If you want to do the release on another machine, you can transfer your secret key to that machine +via the `gpg --export-secret-keys` and `gpg --import` commands. The last step is to update the KEYS file with your code signing key https://www.apache.org/dev/openpgp.html#export-public-key;>https://www.apache.org/dev/openpgp.html#export-public-key - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3cde392 -> 7511e43)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3cde392 [SPARK-31831][SQL][FOLLOWUP] Make the GetCatalogsOperationMock for HiveSessionImplSuite compile with the proper Hive version add 7511e43 [SPARK-32756][SQL] Fix CaseInsensitiveMap usage for Scala 2.13 No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/catalyst/util/CaseInsensitiveMap.scala | 2 ++ .../src/main/scala/org/apache/spark/sql/DataFrameReader.scala| 2 +- .../src/main/scala/org/apache/spark/sql/DataFrameWriter.scala| 9 + .../spark/sql/execution/datasources/orc/OrcFiltersBase.scala | 2 +- .../spark/sql/execution/datasources/v2/FileDataSourceV2.scala| 2 +- 5 files changed, 10 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3cde392 -> 7511e43)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3cde392 [SPARK-31831][SQL][FOLLOWUP] Make the GetCatalogsOperationMock for HiveSessionImplSuite compile with the proper Hive version add 7511e43 [SPARK-32756][SQL] Fix CaseInsensitiveMap usage for Scala 2.13 No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/catalyst/util/CaseInsensitiveMap.scala | 2 ++ .../src/main/scala/org/apache/spark/sql/DataFrameReader.scala| 2 +- .../src/main/scala/org/apache/spark/sql/DataFrameWriter.scala| 9 + .../spark/sql/execution/datasources/orc/OrcFiltersBase.scala | 2 +- .../spark/sql/execution/datasources/v2/FileDataSourceV2.scala| 2 +- 5 files changed, 10 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3cde392 -> 7511e43)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3cde392 [SPARK-31831][SQL][FOLLOWUP] Make the GetCatalogsOperationMock for HiveSessionImplSuite compile with the proper Hive version add 7511e43 [SPARK-32756][SQL] Fix CaseInsensitiveMap usage for Scala 2.13 No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/catalyst/util/CaseInsensitiveMap.scala | 2 ++ .../src/main/scala/org/apache/spark/sql/DataFrameReader.scala| 2 +- .../src/main/scala/org/apache/spark/sql/DataFrameWriter.scala| 9 + .../spark/sql/execution/datasources/orc/OrcFiltersBase.scala | 2 +- .../spark/sql/execution/datasources/v2/FileDataSourceV2.scala| 2 +- 5 files changed, 10 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3cde392 -> 7511e43)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3cde392 [SPARK-31831][SQL][FOLLOWUP] Make the GetCatalogsOperationMock for HiveSessionImplSuite compile with the proper Hive version add 7511e43 [SPARK-32756][SQL] Fix CaseInsensitiveMap usage for Scala 2.13 No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/catalyst/util/CaseInsensitiveMap.scala | 2 ++ .../src/main/scala/org/apache/spark/sql/DataFrameReader.scala| 2 +- .../src/main/scala/org/apache/spark/sql/DataFrameWriter.scala| 9 + .../spark/sql/execution/datasources/orc/OrcFiltersBase.scala | 2 +- .../spark/sql/execution/datasources/v2/FileDataSourceV2.scala| 2 +- 5 files changed, 10 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3cde392 -> 7511e43)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3cde392 [SPARK-31831][SQL][FOLLOWUP] Make the GetCatalogsOperationMock for HiveSessionImplSuite compile with the proper Hive version add 7511e43 [SPARK-32756][SQL] Fix CaseInsensitiveMap usage for Scala 2.13 No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/catalyst/util/CaseInsensitiveMap.scala | 2 ++ .../src/main/scala/org/apache/spark/sql/DataFrameReader.scala| 2 +- .../src/main/scala/org/apache/spark/sql/DataFrameWriter.scala| 9 + .../spark/sql/execution/datasources/orc/OrcFiltersBase.scala | 2 +- .../spark/sql/execution/datasources/v2/FileDataSourceV2.scala| 2 +- 5 files changed, 10 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-website] branch asf-site updated: Adds Kotlin to the list of third-party language bindings
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new 36e19d7 Adds Kotlin to the list of third-party language bindings 36e19d7 is described below commit 36e19d72f38a806b3086636ec1266bbffb8dfaf2 Author: MKhalusova AuthorDate: Thu Aug 27 17:47:46 2020 -0500 Adds Kotlin to the list of third-party language bindings This PR adds a link to Kotlin for Apache Spark repo as a third-party language binding. Author: MKhalusova Closes #287 from MKhalusova/kotlin-third-party. --- site/third-party-projects.html | 6 ++ third-party-projects.md| 4 2 files changed, 10 insertions(+) diff --git a/site/third-party-projects.html b/site/third-party-projects.html index 5bd1524..bed5d61 100644 --- a/site/third-party-projects.html +++ b/site/third-party-projects.html @@ -298,6 +298,12 @@ transforming, and analyzing genomic data using Apache Spark https://github.com/dfdx/Spark.jl;>Spark.jl +Kotlin + + + https://github.com/JetBrains/kotlin-spark-api;>Kotlin for Apache Spark + + diff --git a/third-party-projects.md b/third-party-projects.md index 6176ff6..8f29bbb 100644 --- a/third-party-projects.md +++ b/third-party-projects.md @@ -90,3 +90,7 @@ transforming, and analyzing genomic data using Apache Spark Julia - https://github.com/dfdx/Spark.jl;>Spark.jl + +Kotlin + +- https://github.com/JetBrains/kotlin-spark-api;>Kotlin for Apache Spark - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32701][CORE][DOCS] mapreduce.fileoutputcommitter.algorithm.version default value
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 60f4856 [SPARK-32701][CORE][DOCS] mapreduce.fileoutputcommitter.algorithm.version default value 60f4856 is described below commit 60f485671a07a93ae8a8506ed2c0999cfe6ded7b Author: waleedfateem AuthorDate: Thu Aug 27 09:05:50 2020 -0500 [SPARK-32701][CORE][DOCS] mapreduce.fileoutputcommitter.algorithm.version default value The current documentation states that the default value of spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version is 1 which is not entirely true since this configuration isn't set anywhere in Spark but rather inherited from the Hadoop FileOutputCommitter class. ### What changes were proposed in this pull request? I'm submitting this change, to clarify that the default value will entirely depend on the Hadoop version of the runtime environment. ### Why are the changes needed? An application would end up using algorithm version 1 on certain environments but without any changes the same exact application will use version 2 on environments running Hadoop 3.0 and later. This can have pretty bad consequences in certain scenarios, for example, two tasks can partially overwrite their output if speculation is enabled. Also, please refer to the following JIRA: https://issues.apache.org/jira/browse/MAPREDUCE-7282 ### Does this PR introduce _any_ user-facing change? Yes. Configuration page content was modified where previously we explicitly highlighted that the default version for the FileOutputCommitter algorithm was v1, this now has changed to "Dependent on environment" with additional information in the description column to elaborate. ### How was this patch tested? Checked changes locally in browser Closes #29541 from waleedfateem/SPARK-32701. Authored-by: waleedfateem Signed-off-by: Sean Owen (cherry picked from commit 8749b2b6fae5ee0ce7b48aae6d859ed71e98491d) Signed-off-by: Sean Owen --- docs/configuration.md | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/docs/configuration.md b/docs/configuration.md index 2701fdb..95ff282 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -1761,11 +1761,16 @@ Apart from these, the following properties are also available, and may be useful spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version - 1 + Dependent on environment The file output committer algorithm version, valid algorithm version number: 1 or 2. Version 2 may have better performance, but version 1 may handle failures better in certain situations, as per https://issues.apache.org/jira/browse/MAPREDUCE-4815;>MAPREDUCE-4815. +The default value depends on the Hadoop version used in an environment: +1 for Hadoop versions lower than 3.0 +2 for Hadoop versions 3.0 and higher +It's important to note that this can change back to 1 again in the future once https://issues.apache.org/jira/browse/MAPREDUCE-7282;>MAPREDUCE-7282 +is fixed and merged. 2.2.0 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ed51a7f -> 8749b2b)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ed51a7f [SPARK-30654] Bootstrap4 docs upgrade add 8749b2b [SPARK-32701][CORE][DOCS] mapreduce.fileoutputcommitter.algorithm.version default value No new revisions were added by this update. Summary of changes: docs/configuration.md | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32701][CORE][DOCS] mapreduce.fileoutputcommitter.algorithm.version default value
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 60f4856 [SPARK-32701][CORE][DOCS] mapreduce.fileoutputcommitter.algorithm.version default value 60f4856 is described below commit 60f485671a07a93ae8a8506ed2c0999cfe6ded7b Author: waleedfateem AuthorDate: Thu Aug 27 09:05:50 2020 -0500 [SPARK-32701][CORE][DOCS] mapreduce.fileoutputcommitter.algorithm.version default value The current documentation states that the default value of spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version is 1 which is not entirely true since this configuration isn't set anywhere in Spark but rather inherited from the Hadoop FileOutputCommitter class. ### What changes were proposed in this pull request? I'm submitting this change, to clarify that the default value will entirely depend on the Hadoop version of the runtime environment. ### Why are the changes needed? An application would end up using algorithm version 1 on certain environments but without any changes the same exact application will use version 2 on environments running Hadoop 3.0 and later. This can have pretty bad consequences in certain scenarios, for example, two tasks can partially overwrite their output if speculation is enabled. Also, please refer to the following JIRA: https://issues.apache.org/jira/browse/MAPREDUCE-7282 ### Does this PR introduce _any_ user-facing change? Yes. Configuration page content was modified where previously we explicitly highlighted that the default version for the FileOutputCommitter algorithm was v1, this now has changed to "Dependent on environment" with additional information in the description column to elaborate. ### How was this patch tested? Checked changes locally in browser Closes #29541 from waleedfateem/SPARK-32701. Authored-by: waleedfateem Signed-off-by: Sean Owen (cherry picked from commit 8749b2b6fae5ee0ce7b48aae6d859ed71e98491d) Signed-off-by: Sean Owen --- docs/configuration.md | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/docs/configuration.md b/docs/configuration.md index 2701fdb..95ff282 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -1761,11 +1761,16 @@ Apart from these, the following properties are also available, and may be useful spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version - 1 + Dependent on environment The file output committer algorithm version, valid algorithm version number: 1 or 2. Version 2 may have better performance, but version 1 may handle failures better in certain situations, as per https://issues.apache.org/jira/browse/MAPREDUCE-4815;>MAPREDUCE-4815. +The default value depends on the Hadoop version used in an environment: +1 for Hadoop versions lower than 3.0 +2 for Hadoop versions 3.0 and higher +It's important to note that this can change back to 1 again in the future once https://issues.apache.org/jira/browse/MAPREDUCE-7282;>MAPREDUCE-7282 +is fixed and merged. 2.2.0 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ed51a7f -> 8749b2b)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ed51a7f [SPARK-30654] Bootstrap4 docs upgrade add 8749b2b [SPARK-32701][CORE][DOCS] mapreduce.fileoutputcommitter.algorithm.version default value No new revisions were added by this update. Summary of changes: docs/configuration.md | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f14f374 -> ed51a7f)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f14f374 [SPARK-32696][SQL][TEST-HIVE1.2][TEST-HADOOP2.7] Get columns operation should handle interval column properly add ed51a7f [SPARK-30654] Bootstrap4 docs upgrade No new revisions were added by this update. Summary of changes: docs/_layouts/global.html | 128 +- docs/css/bootstrap-responsive.css | 1040 docs/css/bootstrap-responsive.min.css |9 - docs/css/bootstrap.css | 5624 docs/css/bootstrap.min.css | 14 +- .../ui/static => docs/css}/bootstrap.min.css.map |0 docs/css/main.css | 150 +- docs/js/main.js| 34 +- .../js/vendor}/bootstrap.bundle.min.js |0 .../js/vendor}/bootstrap.bundle.min.js.map |0 docs/js/vendor/bootstrap.js| 2027 --- docs/js/vendor/bootstrap.min.js|6 - 12 files changed, 222 insertions(+), 8810 deletions(-) delete mode 100644 docs/css/bootstrap-responsive.css delete mode 100644 docs/css/bootstrap-responsive.min.css delete mode 100644 docs/css/bootstrap.css copy {core/src/main/resources/org/apache/spark/ui/static => docs/css}/bootstrap.min.css.map (100%) copy {core/src/main/resources/org/apache/spark/ui/static => docs/js/vendor}/bootstrap.bundle.min.js (100%) copy {core/src/main/resources/org/apache/spark/ui/static => docs/js/vendor}/bootstrap.bundle.min.js.map (100%) delete mode 100755 docs/js/vendor/bootstrap.js delete mode 100755 docs/js/vendor/bootstrap.min.js - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32701][CORE][DOCS] mapreduce.fileoutputcommitter.algorithm.version default value
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 60f4856 [SPARK-32701][CORE][DOCS] mapreduce.fileoutputcommitter.algorithm.version default value 60f4856 is described below commit 60f485671a07a93ae8a8506ed2c0999cfe6ded7b Author: waleedfateem AuthorDate: Thu Aug 27 09:05:50 2020 -0500 [SPARK-32701][CORE][DOCS] mapreduce.fileoutputcommitter.algorithm.version default value The current documentation states that the default value of spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version is 1 which is not entirely true since this configuration isn't set anywhere in Spark but rather inherited from the Hadoop FileOutputCommitter class. ### What changes were proposed in this pull request? I'm submitting this change, to clarify that the default value will entirely depend on the Hadoop version of the runtime environment. ### Why are the changes needed? An application would end up using algorithm version 1 on certain environments but without any changes the same exact application will use version 2 on environments running Hadoop 3.0 and later. This can have pretty bad consequences in certain scenarios, for example, two tasks can partially overwrite their output if speculation is enabled. Also, please refer to the following JIRA: https://issues.apache.org/jira/browse/MAPREDUCE-7282 ### Does this PR introduce _any_ user-facing change? Yes. Configuration page content was modified where previously we explicitly highlighted that the default version for the FileOutputCommitter algorithm was v1, this now has changed to "Dependent on environment" with additional information in the description column to elaborate. ### How was this patch tested? Checked changes locally in browser Closes #29541 from waleedfateem/SPARK-32701. Authored-by: waleedfateem Signed-off-by: Sean Owen (cherry picked from commit 8749b2b6fae5ee0ce7b48aae6d859ed71e98491d) Signed-off-by: Sean Owen --- docs/configuration.md | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/docs/configuration.md b/docs/configuration.md index 2701fdb..95ff282 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -1761,11 +1761,16 @@ Apart from these, the following properties are also available, and may be useful spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version - 1 + Dependent on environment The file output committer algorithm version, valid algorithm version number: 1 or 2. Version 2 may have better performance, but version 1 may handle failures better in certain situations, as per https://issues.apache.org/jira/browse/MAPREDUCE-4815;>MAPREDUCE-4815. +The default value depends on the Hadoop version used in an environment: +1 for Hadoop versions lower than 3.0 +2 for Hadoop versions 3.0 and higher +It's important to note that this can change back to 1 again in the future once https://issues.apache.org/jira/browse/MAPREDUCE-7282;>MAPREDUCE-7282 +is fixed and merged. 2.2.0 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ed51a7f -> 8749b2b)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ed51a7f [SPARK-30654] Bootstrap4 docs upgrade add 8749b2b [SPARK-32701][CORE][DOCS] mapreduce.fileoutputcommitter.algorithm.version default value No new revisions were added by this update. Summary of changes: docs/configuration.md | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f14f374 -> ed51a7f)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f14f374 [SPARK-32696][SQL][TEST-HIVE1.2][TEST-HADOOP2.7] Get columns operation should handle interval column properly add ed51a7f [SPARK-30654] Bootstrap4 docs upgrade No new revisions were added by this update. Summary of changes: docs/_layouts/global.html | 128 +- docs/css/bootstrap-responsive.css | 1040 docs/css/bootstrap-responsive.min.css |9 - docs/css/bootstrap.css | 5624 docs/css/bootstrap.min.css | 14 +- .../ui/static => docs/css}/bootstrap.min.css.map |0 docs/css/main.css | 150 +- docs/js/main.js| 34 +- .../js/vendor}/bootstrap.bundle.min.js |0 .../js/vendor}/bootstrap.bundle.min.js.map |0 docs/js/vendor/bootstrap.js| 2027 --- docs/js/vendor/bootstrap.min.js|6 - 12 files changed, 222 insertions(+), 8810 deletions(-) delete mode 100644 docs/css/bootstrap-responsive.css delete mode 100644 docs/css/bootstrap-responsive.min.css delete mode 100644 docs/css/bootstrap.css copy {core/src/main/resources/org/apache/spark/ui/static => docs/css}/bootstrap.min.css.map (100%) copy {core/src/main/resources/org/apache/spark/ui/static => docs/js/vendor}/bootstrap.bundle.min.js (100%) copy {core/src/main/resources/org/apache/spark/ui/static => docs/js/vendor}/bootstrap.bundle.min.js.map (100%) delete mode 100755 docs/js/vendor/bootstrap.js delete mode 100755 docs/js/vendor/bootstrap.min.js - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32701][CORE][DOCS] mapreduce.fileoutputcommitter.algorithm.version default value
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 60f4856 [SPARK-32701][CORE][DOCS] mapreduce.fileoutputcommitter.algorithm.version default value 60f4856 is described below commit 60f485671a07a93ae8a8506ed2c0999cfe6ded7b Author: waleedfateem AuthorDate: Thu Aug 27 09:05:50 2020 -0500 [SPARK-32701][CORE][DOCS] mapreduce.fileoutputcommitter.algorithm.version default value The current documentation states that the default value of spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version is 1 which is not entirely true since this configuration isn't set anywhere in Spark but rather inherited from the Hadoop FileOutputCommitter class. ### What changes were proposed in this pull request? I'm submitting this change, to clarify that the default value will entirely depend on the Hadoop version of the runtime environment. ### Why are the changes needed? An application would end up using algorithm version 1 on certain environments but without any changes the same exact application will use version 2 on environments running Hadoop 3.0 and later. This can have pretty bad consequences in certain scenarios, for example, two tasks can partially overwrite their output if speculation is enabled. Also, please refer to the following JIRA: https://issues.apache.org/jira/browse/MAPREDUCE-7282 ### Does this PR introduce _any_ user-facing change? Yes. Configuration page content was modified where previously we explicitly highlighted that the default version for the FileOutputCommitter algorithm was v1, this now has changed to "Dependent on environment" with additional information in the description column to elaborate. ### How was this patch tested? Checked changes locally in browser Closes #29541 from waleedfateem/SPARK-32701. Authored-by: waleedfateem Signed-off-by: Sean Owen (cherry picked from commit 8749b2b6fae5ee0ce7b48aae6d859ed71e98491d) Signed-off-by: Sean Owen --- docs/configuration.md | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/docs/configuration.md b/docs/configuration.md index 2701fdb..95ff282 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -1761,11 +1761,16 @@ Apart from these, the following properties are also available, and may be useful spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version - 1 + Dependent on environment The file output committer algorithm version, valid algorithm version number: 1 or 2. Version 2 may have better performance, but version 1 may handle failures better in certain situations, as per https://issues.apache.org/jira/browse/MAPREDUCE-4815;>MAPREDUCE-4815. +The default value depends on the Hadoop version used in an environment: +1 for Hadoop versions lower than 3.0 +2 for Hadoop versions 3.0 and higher +It's important to note that this can change back to 1 again in the future once https://issues.apache.org/jira/browse/MAPREDUCE-7282;>MAPREDUCE-7282 +is fixed and merged. 2.2.0 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ed51a7f -> 8749b2b)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ed51a7f [SPARK-30654] Bootstrap4 docs upgrade add 8749b2b [SPARK-32701][CORE][DOCS] mapreduce.fileoutputcommitter.algorithm.version default value No new revisions were added by this update. Summary of changes: docs/configuration.md | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f14f374 -> ed51a7f)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f14f374 [SPARK-32696][SQL][TEST-HIVE1.2][TEST-HADOOP2.7] Get columns operation should handle interval column properly add ed51a7f [SPARK-30654] Bootstrap4 docs upgrade No new revisions were added by this update. Summary of changes: docs/_layouts/global.html | 128 +- docs/css/bootstrap-responsive.css | 1040 docs/css/bootstrap-responsive.min.css |9 - docs/css/bootstrap.css | 5624 docs/css/bootstrap.min.css | 14 +- .../ui/static => docs/css}/bootstrap.min.css.map |0 docs/css/main.css | 150 +- docs/js/main.js| 34 +- .../js/vendor}/bootstrap.bundle.min.js |0 .../js/vendor}/bootstrap.bundle.min.js.map |0 docs/js/vendor/bootstrap.js| 2027 --- docs/js/vendor/bootstrap.min.js|6 - 12 files changed, 222 insertions(+), 8810 deletions(-) delete mode 100644 docs/css/bootstrap-responsive.css delete mode 100644 docs/css/bootstrap-responsive.min.css delete mode 100644 docs/css/bootstrap.css copy {core/src/main/resources/org/apache/spark/ui/static => docs/css}/bootstrap.min.css.map (100%) copy {core/src/main/resources/org/apache/spark/ui/static => docs/js/vendor}/bootstrap.bundle.min.js (100%) copy {core/src/main/resources/org/apache/spark/ui/static => docs/js/vendor}/bootstrap.bundle.min.js.map (100%) delete mode 100755 docs/js/vendor/bootstrap.js delete mode 100755 docs/js/vendor/bootstrap.min.js - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32701][CORE][DOCS] mapreduce.fileoutputcommitter.algorithm.version default value
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 60f4856 [SPARK-32701][CORE][DOCS] mapreduce.fileoutputcommitter.algorithm.version default value 60f4856 is described below commit 60f485671a07a93ae8a8506ed2c0999cfe6ded7b Author: waleedfateem AuthorDate: Thu Aug 27 09:05:50 2020 -0500 [SPARK-32701][CORE][DOCS] mapreduce.fileoutputcommitter.algorithm.version default value The current documentation states that the default value of spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version is 1 which is not entirely true since this configuration isn't set anywhere in Spark but rather inherited from the Hadoop FileOutputCommitter class. ### What changes were proposed in this pull request? I'm submitting this change, to clarify that the default value will entirely depend on the Hadoop version of the runtime environment. ### Why are the changes needed? An application would end up using algorithm version 1 on certain environments but without any changes the same exact application will use version 2 on environments running Hadoop 3.0 and later. This can have pretty bad consequences in certain scenarios, for example, two tasks can partially overwrite their output if speculation is enabled. Also, please refer to the following JIRA: https://issues.apache.org/jira/browse/MAPREDUCE-7282 ### Does this PR introduce _any_ user-facing change? Yes. Configuration page content was modified where previously we explicitly highlighted that the default version for the FileOutputCommitter algorithm was v1, this now has changed to "Dependent on environment" with additional information in the description column to elaborate. ### How was this patch tested? Checked changes locally in browser Closes #29541 from waleedfateem/SPARK-32701. Authored-by: waleedfateem Signed-off-by: Sean Owen (cherry picked from commit 8749b2b6fae5ee0ce7b48aae6d859ed71e98491d) Signed-off-by: Sean Owen --- docs/configuration.md | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/docs/configuration.md b/docs/configuration.md index 2701fdb..95ff282 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -1761,11 +1761,16 @@ Apart from these, the following properties are also available, and may be useful spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version - 1 + Dependent on environment The file output committer algorithm version, valid algorithm version number: 1 or 2. Version 2 may have better performance, but version 1 may handle failures better in certain situations, as per https://issues.apache.org/jira/browse/MAPREDUCE-4815;>MAPREDUCE-4815. +The default value depends on the Hadoop version used in an environment: +1 for Hadoop versions lower than 3.0 +2 for Hadoop versions 3.0 and higher +It's important to note that this can change back to 1 again in the future once https://issues.apache.org/jira/browse/MAPREDUCE-7282;>MAPREDUCE-7282 +is fixed and merged. 2.2.0 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ed51a7f -> 8749b2b)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ed51a7f [SPARK-30654] Bootstrap4 docs upgrade add 8749b2b [SPARK-32701][CORE][DOCS] mapreduce.fileoutputcommitter.algorithm.version default value No new revisions were added by this update. Summary of changes: docs/configuration.md | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f14f374 -> ed51a7f)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f14f374 [SPARK-32696][SQL][TEST-HIVE1.2][TEST-HADOOP2.7] Get columns operation should handle interval column properly add ed51a7f [SPARK-30654] Bootstrap4 docs upgrade No new revisions were added by this update. Summary of changes: docs/_layouts/global.html | 128 +- docs/css/bootstrap-responsive.css | 1040 docs/css/bootstrap-responsive.min.css |9 - docs/css/bootstrap.css | 5624 docs/css/bootstrap.min.css | 14 +- .../ui/static => docs/css}/bootstrap.min.css.map |0 docs/css/main.css | 150 +- docs/js/main.js| 34 +- .../js/vendor}/bootstrap.bundle.min.js |0 .../js/vendor}/bootstrap.bundle.min.js.map |0 docs/js/vendor/bootstrap.js| 2027 --- docs/js/vendor/bootstrap.min.js|6 - 12 files changed, 222 insertions(+), 8810 deletions(-) delete mode 100644 docs/css/bootstrap-responsive.css delete mode 100644 docs/css/bootstrap-responsive.min.css delete mode 100644 docs/css/bootstrap.css copy {core/src/main/resources/org/apache/spark/ui/static => docs/css}/bootstrap.min.css.map (100%) copy {core/src/main/resources/org/apache/spark/ui/static => docs/js/vendor}/bootstrap.bundle.min.js (100%) copy {core/src/main/resources/org/apache/spark/ui/static => docs/js/vendor}/bootstrap.bundle.min.js.map (100%) delete mode 100755 docs/js/vendor/bootstrap.js delete mode 100755 docs/js/vendor/bootstrap.min.js - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f14f374 -> ed51a7f)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f14f374 [SPARK-32696][SQL][TEST-HIVE1.2][TEST-HADOOP2.7] Get columns operation should handle interval column properly add ed51a7f [SPARK-30654] Bootstrap4 docs upgrade No new revisions were added by this update. Summary of changes: docs/_layouts/global.html | 128 +- docs/css/bootstrap-responsive.css | 1040 docs/css/bootstrap-responsive.min.css |9 - docs/css/bootstrap.css | 5624 docs/css/bootstrap.min.css | 14 +- .../ui/static => docs/css}/bootstrap.min.css.map |0 docs/css/main.css | 150 +- docs/js/main.js| 34 +- .../js/vendor}/bootstrap.bundle.min.js |0 .../js/vendor}/bootstrap.bundle.min.js.map |0 docs/js/vendor/bootstrap.js| 2027 --- docs/js/vendor/bootstrap.min.js|6 - 12 files changed, 222 insertions(+), 8810 deletions(-) delete mode 100644 docs/css/bootstrap-responsive.css delete mode 100644 docs/css/bootstrap-responsive.min.css delete mode 100644 docs/css/bootstrap.css copy {core/src/main/resources/org/apache/spark/ui/static => docs/css}/bootstrap.min.css.map (100%) copy {core/src/main/resources/org/apache/spark/ui/static => docs/js/vendor}/bootstrap.bundle.min.js (100%) copy {core/src/main/resources/org/apache/spark/ui/static => docs/js/vendor}/bootstrap.bundle.min.js.map (100%) delete mode 100755 docs/js/vendor/bootstrap.js delete mode 100755 docs/js/vendor/bootstrap.min.js - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (8aa644e -> 4a67f1e)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 8aa644e [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code add 4a67f1e [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in tests No new revisions were added by this update. Summary of changes: .../apache/spark/storage/BlockManagerSuite.scala | 26 +++-- .../apache/spark/storage/MemoryStoreSuite.scala| 29 +-- .../org/apache/spark/util/SizeEstimatorSuite.scala | 43 ++ 3 files changed, 75 insertions(+), 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (8aa644e -> 4a67f1e)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 8aa644e [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code add 4a67f1e [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in tests No new revisions were added by this update. Summary of changes: .../apache/spark/storage/BlockManagerSuite.scala | 26 +++-- .../apache/spark/storage/MemoryStoreSuite.scala| 29 +-- .../org/apache/spark/util/SizeEstimatorSuite.scala | 43 ++ 3 files changed, 75 insertions(+), 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (08b951b -> bc23bb7)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 08b951b [SPARK-32649][SQL] Optimize BHJ/SHJ inner/semi join with empty hashed relation add bc23bb7 [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in tests No new revisions were added by this update. Summary of changes: .../apache/spark/storage/BlockManagerSuite.scala | 26 +++-- .../apache/spark/storage/MemoryStoreSuite.scala| 29 +-- .../org/apache/spark/util/SizeEstimatorSuite.scala | 43 ++ 3 files changed, 75 insertions(+), 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (8aa644e -> 4a67f1e)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 8aa644e [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code add 4a67f1e [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in tests No new revisions were added by this update. Summary of changes: .../apache/spark/storage/BlockManagerSuite.scala | 26 +++-- .../apache/spark/storage/MemoryStoreSuite.scala| 29 +-- .../org/apache/spark/util/SizeEstimatorSuite.scala | 43 ++ 3 files changed, 75 insertions(+), 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (08b951b -> bc23bb7)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 08b951b [SPARK-32649][SQL] Optimize BHJ/SHJ inner/semi join with empty hashed relation add bc23bb7 [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in tests No new revisions were added by this update. Summary of changes: .../apache/spark/storage/BlockManagerSuite.scala | 26 +++-- .../apache/spark/storage/MemoryStoreSuite.scala| 29 +-- .../org/apache/spark/util/SizeEstimatorSuite.scala | 43 ++ 3 files changed, 75 insertions(+), 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (8aa644e -> 4a67f1e)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 8aa644e [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code add 4a67f1e [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in tests No new revisions were added by this update. Summary of changes: .../apache/spark/storage/BlockManagerSuite.scala | 26 +++-- .../apache/spark/storage/MemoryStoreSuite.scala| 29 +-- .../org/apache/spark/util/SizeEstimatorSuite.scala | 43 ++ 3 files changed, 75 insertions(+), 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (08b951b -> bc23bb7)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 08b951b [SPARK-32649][SQL] Optimize BHJ/SHJ inner/semi join with empty hashed relation add bc23bb7 [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in tests No new revisions were added by this update. Summary of changes: .../apache/spark/storage/BlockManagerSuite.scala | 26 +++-- .../apache/spark/storage/MemoryStoreSuite.scala| 29 +-- .../org/apache/spark/util/SizeEstimatorSuite.scala | 43 ++ 3 files changed, 75 insertions(+), 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (8aa644e -> 4a67f1e)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 8aa644e [SPARK-32092][ML][PYSPARK][3.0] Removed foldCol related code add 4a67f1e [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in tests No new revisions were added by this update. Summary of changes: .../apache/spark/storage/BlockManagerSuite.scala | 26 +++-- .../apache/spark/storage/MemoryStoreSuite.scala| 29 +-- .../org/apache/spark/util/SizeEstimatorSuite.scala | 43 ++ 3 files changed, 75 insertions(+), 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (08b951b -> bc23bb7)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 08b951b [SPARK-32649][SQL] Optimize BHJ/SHJ inner/semi join with empty hashed relation add bc23bb7 [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in tests No new revisions were added by this update. Summary of changes: .../apache/spark/storage/BlockManagerSuite.scala | 26 +++-- .../apache/spark/storage/MemoryStoreSuite.scala| 29 +-- .../org/apache/spark/util/SizeEstimatorSuite.scala | 43 ++ 3 files changed, 75 insertions(+), 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (08b951b -> bc23bb7)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 08b951b [SPARK-32649][SQL] Optimize BHJ/SHJ inner/semi join with empty hashed relation add bc23bb7 [SPARK-32588][CORE][TEST] Fix SizeEstimator initialization in tests No new revisions were added by this update. Summary of changes: .../apache/spark/storage/BlockManagerSuite.scala | 26 +++-- .../apache/spark/storage/MemoryStoreSuite.scala| 29 +-- .../org/apache/spark/util/SizeEstimatorSuite.scala | 43 ++ 3 files changed, 75 insertions(+), 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (1c798f9 -> ac520d4)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 1c798f9 [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] Override `get()` and use Julian days in `DaysWritable` add ac520d4 [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans No new revisions were added by this update. Summary of changes: .../spark/ml/clustering/BisectingKMeans.scala | 33 ++- .../org/apache/spark/ml/clustering/KMeans.scala| 33 ++- .../spark/mllib/clustering/BisectingKMeans.scala | 47 ++ .../org/apache/spark/mllib/clustering/KMeans.scala | 29 +++-- 4 files changed, 59 insertions(+), 83 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (1c798f9 -> ac520d4)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 1c798f9 [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] Override `get()` and use Julian days in `DaysWritable` add ac520d4 [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans No new revisions were added by this update. Summary of changes: .../spark/ml/clustering/BisectingKMeans.scala | 33 ++- .../org/apache/spark/ml/clustering/KMeans.scala| 33 ++- .../spark/mllib/clustering/BisectingKMeans.scala | 47 ++ .../org/apache/spark/mllib/clustering/KMeans.scala | 29 +++-- 4 files changed, 59 insertions(+), 83 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (1c798f9 -> ac520d4)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 1c798f9 [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] Override `get()` and use Julian days in `DaysWritable` add ac520d4 [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans No new revisions were added by this update. Summary of changes: .../spark/ml/clustering/BisectingKMeans.scala | 33 ++- .../org/apache/spark/ml/clustering/KMeans.scala| 33 ++- .../spark/mllib/clustering/BisectingKMeans.scala | 47 ++ .../org/apache/spark/mllib/clustering/KMeans.scala | 29 +++-- 4 files changed, 59 insertions(+), 83 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (1c798f9 -> ac520d4)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 1c798f9 [SPARK-32594][SQL][FOLLOWUP][TEST-HADOOP2.7][TEST-HIVE1.2] Override `get()` and use Julian days in `DaysWritable` add ac520d4 [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans No new revisions were added by this update. Summary of changes: .../spark/ml/clustering/BisectingKMeans.scala | 33 ++- .../org/apache/spark/ml/clustering/KMeans.scala| 33 ++- .../spark/mllib/clustering/BisectingKMeans.scala | 47 ++ .../org/apache/spark/mllib/clustering/KMeans.scala | 29 +++-- 4 files changed, 59 insertions(+), 83 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ac520d4 [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans ac520d4 is described below commit ac520d4a7c40a1d67358ee64af26e7f73face448 Author: zhengruifeng AuthorDate: Sun Aug 23 17:14:40 2020 -0500 [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans ### What changes were proposed in this pull request? Fix double caching in KMeans/BiKMeans: 1, let the callers of `runWithWeight` to pass whether `handlePersistence` is needed; 2, persist and unpersist inside of `runWithWeight`; 3, persist the `norms` if needed according to the comments; ### Why are the changes needed? avoid double caching ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing testsuites Closes #29501 from zhengruifeng/kmeans_handlePersistence. Authored-by: zhengruifeng Signed-off-by: Sean Owen --- .../spark/ml/clustering/BisectingKMeans.scala | 33 ++- .../org/apache/spark/ml/clustering/KMeans.scala| 33 ++- .../spark/mllib/clustering/BisectingKMeans.scala | 47 ++ .../org/apache/spark/mllib/clustering/KMeans.scala | 29 +++-- 4 files changed, 59 insertions(+), 83 deletions(-) diff --git a/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala b/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala index 5a60bed..061091c 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala @@ -29,9 +29,8 @@ import org.apache.spark.ml.util._ import org.apache.spark.ml.util.Instrumentation.instrumented import org.apache.spark.mllib.clustering.{BisectingKMeans => MLlibBisectingKMeans, BisectingKMeansModel => MLlibBisectingKMeansModel} -import org.apache.spark.mllib.linalg.{Vector => OldVector, Vectors => OldVectors} +import org.apache.spark.mllib.linalg.{Vectors => OldVectors} import org.apache.spark.mllib.linalg.VectorImplicits._ -import org.apache.spark.rdd.RDD import org.apache.spark.sql.{DataFrame, Dataset, Row} import org.apache.spark.sql.functions._ import org.apache.spark.sql.types.{DoubleType, IntegerType, StructType} @@ -276,21 +275,6 @@ class BisectingKMeans @Since("2.0.0") ( override def fit(dataset: Dataset[_]): BisectingKMeansModel = instrumented { instr => transformSchema(dataset.schema, logging = true) -val handlePersistence = dataset.storageLevel == StorageLevel.NONE -val w = if (isDefined(weightCol) && $(weightCol).nonEmpty) { - checkNonNegativeWeight(col($(weightCol)).cast(DoubleType)) -} else { - lit(1.0) -} - -val instances: RDD[(OldVector, Double)] = dataset - .select(DatasetUtils.columnToVector(dataset, getFeaturesCol), w).rdd.map { - case Row(point: Vector, weight: Double) => (OldVectors.fromML(point), weight) -} -if (handlePersistence) { - instances.persist(StorageLevel.MEMORY_AND_DISK) -} - instr.logPipelineStage(this) instr.logDataset(dataset) instr.logParams(this, featuresCol, predictionCol, k, maxIter, seed, @@ -302,11 +286,18 @@ class BisectingKMeans @Since("2.0.0") ( .setMinDivisibleClusterSize($(minDivisibleClusterSize)) .setSeed($(seed)) .setDistanceMeasure($(distanceMeasure)) -val parentModel = bkm.runWithWeight(instances, Some(instr)) -val model = copyValues(new BisectingKMeansModel(uid, parentModel).setParent(this)) -if (handlePersistence) { - instances.unpersist() + +val w = if (isDefined(weightCol) && $(weightCol).nonEmpty) { + checkNonNegativeWeight(col($(weightCol)).cast(DoubleType)) +} else { + lit(1.0) } +val instances = dataset.select(DatasetUtils.columnToVector(dataset, getFeaturesCol), w) + .rdd.map { case Row(point: Vector, weight: Double) => (OldVectors.fromML(point), weight) } + +val handlePersistence = dataset.storageLevel == StorageLevel.NONE +val parentModel = bkm.runWithWeight(instances, handlePersistence, Some(instr)) +val model = copyValues(new BisectingKMeansModel(uid, parentModel).setParent(this)) val summary = new BisectingKMeansSummary( model.transform(dataset), diff --git a/mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala b/mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala index 5c06973..f6f6eb7 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala @@ -32,7 +32,6 @@ import org.apache.spark.ml.util.Instrument
[spark] branch master updated (d9eb06e -> a4d785d)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d9eb06e [SPARK-32092][ML][PYSPARK] Fix parameters not being copied in CrossValidatorModel.copy(), read() and write() add a4d785d [MINOR] Typo in ShuffleMapStage.scala No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/scheduler/ShuffleMapStage.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d9eb06e -> a4d785d)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d9eb06e [SPARK-32092][ML][PYSPARK] Fix parameters not being copied in CrossValidatorModel.copy(), read() and write() add a4d785d [MINOR] Typo in ShuffleMapStage.scala No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/scheduler/ShuffleMapStage.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d9eb06e -> a4d785d)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d9eb06e [SPARK-32092][ML][PYSPARK] Fix parameters not being copied in CrossValidatorModel.copy(), read() and write() add a4d785d [MINOR] Typo in ShuffleMapStage.scala No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/scheduler/ShuffleMapStage.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d9eb06e -> a4d785d)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d9eb06e [SPARK-32092][ML][PYSPARK] Fix parameters not being copied in CrossValidatorModel.copy(), read() and write() add a4d785d [MINOR] Typo in ShuffleMapStage.scala No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/scheduler/ShuffleMapStage.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d9eb06e -> a4d785d)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d9eb06e [SPARK-32092][ML][PYSPARK] Fix parameters not being copied in CrossValidatorModel.copy(), read() and write() add a4d785d [MINOR] Typo in ShuffleMapStage.scala No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/scheduler/ShuffleMapStage.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (a6df16b -> 85c9e8c)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from a6df16b [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations add 85c9e8c [SPARK-32092][ML][PYSPARK] Fix parameters not being copied in CrossValidatorModel.copy(), read() and write() No new revisions were added by this update. Summary of changes: python/pyspark/ml/tests/test_tuning.py | 131 ++--- python/pyspark/ml/tuning.py| 67 + 2 files changed, 172 insertions(+), 26 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (25c7d0f -> d9eb06e)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 25c7d0f [SPARK-32526][SQL] Pass all test of sql/catalyst module in Scala 2.13 add d9eb06e [SPARK-32092][ML][PYSPARK] Fix parameters not being copied in CrossValidatorModel.copy(), read() and write() No new revisions were added by this update. Summary of changes: python/pyspark/ml/tests/test_tuning.py | 131 ++--- python/pyspark/ml/tuning.py| 67 + 2 files changed, 172 insertions(+), 26 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (a6df16b -> 85c9e8c)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from a6df16b [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations add 85c9e8c [SPARK-32092][ML][PYSPARK] Fix parameters not being copied in CrossValidatorModel.copy(), read() and write() No new revisions were added by this update. Summary of changes: python/pyspark/ml/tests/test_tuning.py | 131 ++--- python/pyspark/ml/tuning.py| 67 + 2 files changed, 172 insertions(+), 26 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (25c7d0f -> d9eb06e)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 25c7d0f [SPARK-32526][SQL] Pass all test of sql/catalyst module in Scala 2.13 add d9eb06e [SPARK-32092][ML][PYSPARK] Fix parameters not being copied in CrossValidatorModel.copy(), read() and write() No new revisions were added by this update. Summary of changes: python/pyspark/ml/tests/test_tuning.py | 131 ++--- python/pyspark/ml/tuning.py| 67 + 2 files changed, 172 insertions(+), 26 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32092][ML][PYSPARK] Fix parameters not being copied in CrossValidatorModel.copy(), read() and write()
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 85c9e8c [SPARK-32092][ML][PYSPARK] Fix parameters not being copied in CrossValidatorModel.copy(), read() and write() 85c9e8c is described below commit 85c9e8c54c30b69c39075e97cd3cac295be09303 Author: Louiszr AuthorDate: Sat Aug 22 09:27:31 2020 -0500 [SPARK-32092][ML][PYSPARK] Fix parameters not being copied in CrossValidatorModel.copy(), read() and write() ### What changes were proposed in this pull request? Changed the definitions of `CrossValidatorModel.copy()/_to_java()/_from_java()` so that exposed parameters (i.e. parameters with `get()` methods) are copied in these methods. ### Why are the changes needed? Parameters are copied in the respective Scala interface for `CrossValidatorModel.copy()`. It fits the semantics to persist parameters when calling `CrossValidatorModel.save()` and `CrossValidatorModel.load()` so that the user gets the same model by saving and loading it after. Not copying across `numFolds` also causes bugs like Array index out of bound and losing sub-models because this parameters will always default to 3 (as described in the JIRA ticket). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Tests for `CrossValidatorModel.copy()` and `save()`/`load()` are updated so that they check parameters before and after function calls. Closes #29445 from Louiszr/master. Authored-by: Louiszr Signed-off-by: Sean Owen (cherry picked from commit d9eb06ea37cab185f1e49c641313be9707270252) Signed-off-by: Sean Owen --- python/pyspark/ml/tests/test_tuning.py | 131 ++--- python/pyspark/ml/tuning.py| 67 + 2 files changed, 172 insertions(+), 26 deletions(-) diff --git a/python/pyspark/ml/tests/test_tuning.py b/python/pyspark/ml/tests/test_tuning.py index 6bcc3f9..b250740 100644 --- a/python/pyspark/ml/tests/test_tuning.py +++ b/python/pyspark/ml/tests/test_tuning.py @@ -89,15 +89,50 @@ class CrossValidatorTests(SparkSessionTestCase): grid = (ParamGridBuilder() .addGrid(iee.inducedError, [100.0, 0.0, 1.0]) .build()) -cv = CrossValidator(estimator=iee, estimatorParamMaps=grid, evaluator=evaluator) +cv = CrossValidator( +estimator=iee, +estimatorParamMaps=grid, +evaluator=evaluator, +collectSubModels=True, +numFolds=2 +) cvCopied = cv.copy() -self.assertEqual(cv.getEstimator().uid, cvCopied.getEstimator().uid) +for param in [ +lambda x: x.getEstimator().uid, +# SPARK-32092: CrossValidator.copy() needs to copy all existing params +lambda x: x.getNumFolds(), +lambda x: x.getFoldCol(), +lambda x: x.getCollectSubModels(), +lambda x: x.getParallelism(), +lambda x: x.getSeed() +]: +self.assertEqual(param(cv), param(cvCopied)) cvModel = cv.fit(dataset) cvModelCopied = cvModel.copy() for index in range(len(cvModel.avgMetrics)): self.assertTrue(abs(cvModel.avgMetrics[index] - cvModelCopied.avgMetrics[index]) < 0.0001) +# SPARK-32092: CrossValidatorModel.copy() needs to copy all existing params +for param in [ +lambda x: x.getNumFolds(), +lambda x: x.getFoldCol(), +lambda x: x.getSeed() +]: +self.assertEqual(param(cvModel), param(cvModelCopied)) + +cvModel.avgMetrics[0] = 'foo' +self.assertNotEqual( +cvModelCopied.avgMetrics[0], +'foo', +"Changing the original avgMetrics should not affect the copied model" +) +cvModel.subModels[0] = 'foo' +self.assertNotEqual( +cvModelCopied.subModels[0], +'foo', +"Changing the original subModels should not affect the copied model" +) def test_fit_minimize_metric(self): dataset = self.spark.createDataFrame([ @@ -166,16 +201,39 @@ class CrossValidatorTests(SparkSessionTestCase): lr = LogisticRegression() grid = ParamGridBuilder().addGrid(lr.maxIter, [0, 1]).build() evaluator = BinaryClassificationEvaluator() -cv = CrossValidator(estimator=lr, estimatorParamMaps=grid, evaluator=evaluator) +cv = CrossValidator( +estimator=lr, +estimatorParamMaps=grid, +evaluator=evaluator, +collectSubModels=True, +numFolds=4, +seed=42 +) c
[spark] branch master updated (25c7d0f -> d9eb06e)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 25c7d0f [SPARK-32526][SQL] Pass all test of sql/catalyst module in Scala 2.13 add d9eb06e [SPARK-32092][ML][PYSPARK] Fix parameters not being copied in CrossValidatorModel.copy(), read() and write() No new revisions were added by this update. Summary of changes: python/pyspark/ml/tests/test_tuning.py | 131 ++--- python/pyspark/ml/tuning.py| 67 + 2 files changed, 172 insertions(+), 26 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32092][ML][PYSPARK] Fix parameters not being copied in CrossValidatorModel.copy(), read() and write()
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 85c9e8c [SPARK-32092][ML][PYSPARK] Fix parameters not being copied in CrossValidatorModel.copy(), read() and write() 85c9e8c is described below commit 85c9e8c54c30b69c39075e97cd3cac295be09303 Author: Louiszr AuthorDate: Sat Aug 22 09:27:31 2020 -0500 [SPARK-32092][ML][PYSPARK] Fix parameters not being copied in CrossValidatorModel.copy(), read() and write() ### What changes were proposed in this pull request? Changed the definitions of `CrossValidatorModel.copy()/_to_java()/_from_java()` so that exposed parameters (i.e. parameters with `get()` methods) are copied in these methods. ### Why are the changes needed? Parameters are copied in the respective Scala interface for `CrossValidatorModel.copy()`. It fits the semantics to persist parameters when calling `CrossValidatorModel.save()` and `CrossValidatorModel.load()` so that the user gets the same model by saving and loading it after. Not copying across `numFolds` also causes bugs like Array index out of bound and losing sub-models because this parameters will always default to 3 (as described in the JIRA ticket). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Tests for `CrossValidatorModel.copy()` and `save()`/`load()` are updated so that they check parameters before and after function calls. Closes #29445 from Louiszr/master. Authored-by: Louiszr Signed-off-by: Sean Owen (cherry picked from commit d9eb06ea37cab185f1e49c641313be9707270252) Signed-off-by: Sean Owen --- python/pyspark/ml/tests/test_tuning.py | 131 ++--- python/pyspark/ml/tuning.py| 67 + 2 files changed, 172 insertions(+), 26 deletions(-) diff --git a/python/pyspark/ml/tests/test_tuning.py b/python/pyspark/ml/tests/test_tuning.py index 6bcc3f9..b250740 100644 --- a/python/pyspark/ml/tests/test_tuning.py +++ b/python/pyspark/ml/tests/test_tuning.py @@ -89,15 +89,50 @@ class CrossValidatorTests(SparkSessionTestCase): grid = (ParamGridBuilder() .addGrid(iee.inducedError, [100.0, 0.0, 1.0]) .build()) -cv = CrossValidator(estimator=iee, estimatorParamMaps=grid, evaluator=evaluator) +cv = CrossValidator( +estimator=iee, +estimatorParamMaps=grid, +evaluator=evaluator, +collectSubModels=True, +numFolds=2 +) cvCopied = cv.copy() -self.assertEqual(cv.getEstimator().uid, cvCopied.getEstimator().uid) +for param in [ +lambda x: x.getEstimator().uid, +# SPARK-32092: CrossValidator.copy() needs to copy all existing params +lambda x: x.getNumFolds(), +lambda x: x.getFoldCol(), +lambda x: x.getCollectSubModels(), +lambda x: x.getParallelism(), +lambda x: x.getSeed() +]: +self.assertEqual(param(cv), param(cvCopied)) cvModel = cv.fit(dataset) cvModelCopied = cvModel.copy() for index in range(len(cvModel.avgMetrics)): self.assertTrue(abs(cvModel.avgMetrics[index] - cvModelCopied.avgMetrics[index]) < 0.0001) +# SPARK-32092: CrossValidatorModel.copy() needs to copy all existing params +for param in [ +lambda x: x.getNumFolds(), +lambda x: x.getFoldCol(), +lambda x: x.getSeed() +]: +self.assertEqual(param(cvModel), param(cvModelCopied)) + +cvModel.avgMetrics[0] = 'foo' +self.assertNotEqual( +cvModelCopied.avgMetrics[0], +'foo', +"Changing the original avgMetrics should not affect the copied model" +) +cvModel.subModels[0] = 'foo' +self.assertNotEqual( +cvModelCopied.subModels[0], +'foo', +"Changing the original subModels should not affect the copied model" +) def test_fit_minimize_metric(self): dataset = self.spark.createDataFrame([ @@ -166,16 +201,39 @@ class CrossValidatorTests(SparkSessionTestCase): lr = LogisticRegression() grid = ParamGridBuilder().addGrid(lr.maxIter, [0, 1]).build() evaluator = BinaryClassificationEvaluator() -cv = CrossValidator(estimator=lr, estimatorParamMaps=grid, evaluator=evaluator) +cv = CrossValidator( +estimator=lr, +estimatorParamMaps=grid, +evaluator=evaluator, +collectSubModels=True, +numFolds=4, +seed=42 +) c
[spark] branch master updated (25c7d0f -> d9eb06e)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 25c7d0f [SPARK-32526][SQL] Pass all test of sql/catalyst module in Scala 2.13 add d9eb06e [SPARK-32092][ML][PYSPARK] Fix parameters not being copied in CrossValidatorModel.copy(), read() and write() No new revisions were added by this update. Summary of changes: python/pyspark/ml/tests/test_tuning.py | 131 ++--- python/pyspark/ml/tuning.py| 67 + 2 files changed, 172 insertions(+), 26 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8b26c69 -> 25c7d0f)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8b26c69 [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations add 25c7d0f [SPARK-32526][SQL] Pass all test of sql/catalyst module in Scala 2.13 No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/ExecutorAllocationManager.scala | 2 +- sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala| 2 +- .../org/apache/spark/sql/catalyst/CatalystTypeConverters.scala| 2 +- .../scala/org/apache/spark/sql/catalyst/ScalaReflection.scala | 7 +++ .../scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala | 4 +++- .../apache/spark/sql/catalyst/expressions/objects/objects.scala | 8 .../org/apache/spark/sql/catalyst/util/ArrayBasedMapData.scala| 3 ++- .../sql/catalyst/optimizer/StarJoinCostBasedReorderSuite.scala| 6 -- 8 files changed, 19 insertions(+), 15 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32092][ML][PYSPARK] Fix parameters not being copied in CrossValidatorModel.copy(), read() and write()
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 85c9e8c [SPARK-32092][ML][PYSPARK] Fix parameters not being copied in CrossValidatorModel.copy(), read() and write() 85c9e8c is described below commit 85c9e8c54c30b69c39075e97cd3cac295be09303 Author: Louiszr AuthorDate: Sat Aug 22 09:27:31 2020 -0500 [SPARK-32092][ML][PYSPARK] Fix parameters not being copied in CrossValidatorModel.copy(), read() and write() ### What changes were proposed in this pull request? Changed the definitions of `CrossValidatorModel.copy()/_to_java()/_from_java()` so that exposed parameters (i.e. parameters with `get()` methods) are copied in these methods. ### Why are the changes needed? Parameters are copied in the respective Scala interface for `CrossValidatorModel.copy()`. It fits the semantics to persist parameters when calling `CrossValidatorModel.save()` and `CrossValidatorModel.load()` so that the user gets the same model by saving and loading it after. Not copying across `numFolds` also causes bugs like Array index out of bound and losing sub-models because this parameters will always default to 3 (as described in the JIRA ticket). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Tests for `CrossValidatorModel.copy()` and `save()`/`load()` are updated so that they check parameters before and after function calls. Closes #29445 from Louiszr/master. Authored-by: Louiszr Signed-off-by: Sean Owen (cherry picked from commit d9eb06ea37cab185f1e49c641313be9707270252) Signed-off-by: Sean Owen --- python/pyspark/ml/tests/test_tuning.py | 131 ++--- python/pyspark/ml/tuning.py| 67 + 2 files changed, 172 insertions(+), 26 deletions(-) diff --git a/python/pyspark/ml/tests/test_tuning.py b/python/pyspark/ml/tests/test_tuning.py index 6bcc3f9..b250740 100644 --- a/python/pyspark/ml/tests/test_tuning.py +++ b/python/pyspark/ml/tests/test_tuning.py @@ -89,15 +89,50 @@ class CrossValidatorTests(SparkSessionTestCase): grid = (ParamGridBuilder() .addGrid(iee.inducedError, [100.0, 0.0, 1.0]) .build()) -cv = CrossValidator(estimator=iee, estimatorParamMaps=grid, evaluator=evaluator) +cv = CrossValidator( +estimator=iee, +estimatorParamMaps=grid, +evaluator=evaluator, +collectSubModels=True, +numFolds=2 +) cvCopied = cv.copy() -self.assertEqual(cv.getEstimator().uid, cvCopied.getEstimator().uid) +for param in [ +lambda x: x.getEstimator().uid, +# SPARK-32092: CrossValidator.copy() needs to copy all existing params +lambda x: x.getNumFolds(), +lambda x: x.getFoldCol(), +lambda x: x.getCollectSubModels(), +lambda x: x.getParallelism(), +lambda x: x.getSeed() +]: +self.assertEqual(param(cv), param(cvCopied)) cvModel = cv.fit(dataset) cvModelCopied = cvModel.copy() for index in range(len(cvModel.avgMetrics)): self.assertTrue(abs(cvModel.avgMetrics[index] - cvModelCopied.avgMetrics[index]) < 0.0001) +# SPARK-32092: CrossValidatorModel.copy() needs to copy all existing params +for param in [ +lambda x: x.getNumFolds(), +lambda x: x.getFoldCol(), +lambda x: x.getSeed() +]: +self.assertEqual(param(cvModel), param(cvModelCopied)) + +cvModel.avgMetrics[0] = 'foo' +self.assertNotEqual( +cvModelCopied.avgMetrics[0], +'foo', +"Changing the original avgMetrics should not affect the copied model" +) +cvModel.subModels[0] = 'foo' +self.assertNotEqual( +cvModelCopied.subModels[0], +'foo', +"Changing the original subModels should not affect the copied model" +) def test_fit_minimize_metric(self): dataset = self.spark.createDataFrame([ @@ -166,16 +201,39 @@ class CrossValidatorTests(SparkSessionTestCase): lr = LogisticRegression() grid = ParamGridBuilder().addGrid(lr.maxIter, [0, 1]).build() evaluator = BinaryClassificationEvaluator() -cv = CrossValidator(estimator=lr, estimatorParamMaps=grid, evaluator=evaluator) +cv = CrossValidator( +estimator=lr, +estimatorParamMaps=grid, +evaluator=evaluator, +collectSubModels=True, +numFolds=4, +seed=42 +) c
[spark] branch master updated (25c7d0f -> d9eb06e)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 25c7d0f [SPARK-32526][SQL] Pass all test of sql/catalyst module in Scala 2.13 add d9eb06e [SPARK-32092][ML][PYSPARK] Fix parameters not being copied in CrossValidatorModel.copy(), read() and write() No new revisions were added by this update. Summary of changes: python/pyspark/ml/tests/test_tuning.py | 131 ++--- python/pyspark/ml/tuning.py| 67 + 2 files changed, 172 insertions(+), 26 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8b26c69 -> 25c7d0f)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8b26c69 [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations add 25c7d0f [SPARK-32526][SQL] Pass all test of sql/catalyst module in Scala 2.13 No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/ExecutorAllocationManager.scala | 2 +- sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala| 2 +- .../org/apache/spark/sql/catalyst/CatalystTypeConverters.scala| 2 +- .../scala/org/apache/spark/sql/catalyst/ScalaReflection.scala | 7 +++ .../scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala | 4 +++- .../apache/spark/sql/catalyst/expressions/objects/objects.scala | 8 .../org/apache/spark/sql/catalyst/util/ArrayBasedMapData.scala| 3 ++- .../sql/catalyst/optimizer/StarJoinCostBasedReorderSuite.scala| 6 -- 8 files changed, 19 insertions(+), 15 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8b26c69 -> 25c7d0f)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8b26c69 [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations add 25c7d0f [SPARK-32526][SQL] Pass all test of sql/catalyst module in Scala 2.13 No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/ExecutorAllocationManager.scala | 2 +- sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala| 2 +- .../org/apache/spark/sql/catalyst/CatalystTypeConverters.scala| 2 +- .../scala/org/apache/spark/sql/catalyst/ScalaReflection.scala | 7 +++ .../scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala | 4 +++- .../apache/spark/sql/catalyst/expressions/objects/objects.scala | 8 .../org/apache/spark/sql/catalyst/util/ArrayBasedMapData.scala| 3 ++- .../sql/catalyst/optimizer/StarJoinCostBasedReorderSuite.scala| 6 -- 8 files changed, 19 insertions(+), 15 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8b26c69 -> 25c7d0f)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8b26c69 [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations add 25c7d0f [SPARK-32526][SQL] Pass all test of sql/catalyst module in Scala 2.13 No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/ExecutorAllocationManager.scala | 2 +- sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala| 2 +- .../org/apache/spark/sql/catalyst/CatalystTypeConverters.scala| 2 +- .../scala/org/apache/spark/sql/catalyst/ScalaReflection.scala | 7 +++ .../scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala | 4 +++- .../apache/spark/sql/catalyst/expressions/objects/objects.scala | 8 .../org/apache/spark/sql/catalyst/util/ArrayBasedMapData.scala| 3 ++- .../sql/catalyst/optimizer/StarJoinCostBasedReorderSuite.scala| 6 -- 8 files changed, 19 insertions(+), 15 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-32526][SQL] Pass all test of sql/catalyst module in Scala 2.13
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 25c7d0f [SPARK-32526][SQL] Pass all test of sql/catalyst module in Scala 2.13 25c7d0f is described below commit 25c7d0fe6ae20a4c1c42e0cd0b448c08ab03f3fb Author: yangjie01 AuthorDate: Sat Aug 22 09:24:16 2020 -0500 [SPARK-32526][SQL] Pass all test of sql/catalyst module in Scala 2.13 ### What changes were proposed in this pull request? The purpose of this pr is to resolve [SPARK-32526](https://issues.apache.org/jira/browse/SPARK-32526), all remaining failed cases are fixed. The main change of this pr as follow: - Change of `ExecutorAllocationManager.scala` for core module compilation in Scala 2.13, it's a blocking problem - Change `Seq[_]` to `scala.collection.Seq[_]` refer to failed cases - Added different expected plan of `Test 4: Star with several branches` of StarJoinCostBasedReorderSuite for Scala 2.13 because the candidates plans: ``` Join Inner, (d1_pk#5 = f1_fk1#0) :- Join Inner, (f1_fk2#1 = d2_pk#8) : :- Join Inner, (f1_fk3#2 = d3_pk#11) ``` and ``` Join Inner, (f1_fk2#1 = d2_pk#8) :- Join Inner, (d1_pk#5 = f1_fk1#0) : :- Join Inner, (f1_fk3#2 = d3_pk#11) ``` have same cost `Cost(200,9200)`, but `HashMap` is rewritten in scala 2.13 and The order of iterations leads to different results. This pr fix test cases as follow: - LiteralExpressionSuite (1 FAILED -> PASS) - StarJoinCostBasedReorderSuite ( 1 FAILED-> PASS) - ObjectExpressionsSuite( 2 FAILED-> PASS) - ScalaReflectionSuite (1 FAILED-> PASS) - RowEncoderSuite (10 FAILED-> PASS) - ExpressionEncoderSuite (ABORTED-> PASS) ### Why are the changes needed? We need to support a Scala 2.13 build. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested?
[spark] branch branch-3.0 updated: [SPARK-32610][DOCS] Fix the link to metrics.dropwizard.io in monitoring.md to refer the proper version
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new c4807ce [SPARK-32610][DOCS] Fix the link to metrics.dropwizard.io in monitoring.md to refer the proper version c4807ce is described below commit c4807ced3913a4d524892dc7bab502250687a43c Author: Kousuke Saruta AuthorDate: Sun Aug 16 12:07:37 2020 -0500 [SPARK-32610][DOCS] Fix the link to metrics.dropwizard.io in monitoring.md to refer the proper version ### What changes were proposed in this pull request? This PR fixes the link to metrics.dropwizard.io in monitoring.md to refer the proper version of the library. ### Why are the changes needed? There are links to metrics.dropwizard.io in monitoring.md but the link targets refer the version 3.1.0, while we use 4.1.1. Now that users can create their own metrics using the dropwizard library, it's better to fix the links to refer the proper version. ### Does this PR introduce _any_ user-facing change? Yes. The modified links refer the version 4.1.1. ### How was this patch tested? Build the docs and visit all the modified links. Closes #29426 from sarutak/fix-dropwizard-url. Authored-by: Kousuke Saruta Signed-off-by: Sean Owen (cherry picked from commit 9a79bbc8b6e426e7b29a9f4867beb396014d8046) Signed-off-by: Sean Owen --- docs/monitoring.md | 8 pom.xml| 4 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/docs/monitoring.md b/docs/monitoring.md index 1808167..4608a4e 100644 --- a/docs/monitoring.md +++ b/docs/monitoring.md @@ -718,7 +718,7 @@ The JSON end point is exposed at: `/applications/[app-id]/executors`, and the Pr The Prometheus endpoint is experimental and conditional to a configuration parameter: `spark.ui.prometheus.enabled=true` (the default is `false`). In addition, aggregated per-stage peak values of the executor memory metrics are written to the event log if `spark.eventLog.logStageExecutorMetrics` is true. -Executor memory metrics are also exposed via the Spark metrics system based on the Dropwizard metrics library. +Executor memory metrics are also exposed via the Spark metrics system based on the [Dropwizard metrics library](http://metrics.dropwizard.io/4.1.1). A list of the available metrics, with a short description: @@ -922,7 +922,7 @@ keep the paths consistent in both modes. # Metrics Spark has a configurable metrics system based on the -[Dropwizard Metrics Library](http://metrics.dropwizard.io/). +[Dropwizard Metrics Library](http://metrics.dropwizard.io/4.1.1). This allows users to report Spark metrics to a variety of sinks including HTTP, JMX, and CSV files. The metrics are generated by sources embedded in the Spark code base. They provide instrumentation for specific activities and Spark components. @@ -1016,7 +1016,7 @@ activates the JVM source: ## List of available metrics providers Metrics used by Spark are of multiple types: gauge, counter, histogram, meter and timer, -see [Dropwizard library documentation for details](https://metrics.dropwizard.io/3.1.0/getting-started/). +see [Dropwizard library documentation for details](https://metrics.dropwizard.io/4.1.1/getting-started.html). The following list of components and metrics reports the name and some details about the available metrics, grouped per component instance and source namespace. The most common time of metrics used in Spark instrumentation are gauges and counters. @@ -1244,7 +1244,7 @@ Notes: `spark.metrics.staticSources.enabled` (default is true) - This source is available for driver and executor instances and is also available for other instances. - This source provides information on JVM metrics using the - [Dropwizard/Codahale Metric Sets for JVM instrumentation](https://metrics.dropwizard.io/3.1.0/manual/jvm/) + [Dropwizard/Codahale Metric Sets for JVM instrumentation](https://metrics.dropwizard.io/4.1.1/manual/jvm.html) and in particular the metric sets BufferPoolMetricSet, GarbageCollectorMetricSet and MemoryUsageGaugeSet. ### Component instance = applicationMaster diff --git a/pom.xml b/pom.xml index e9ae204..1bf5de0 100644 --- a/pom.xml +++ b/pom.xml @@ -145,6 +145,10 @@ 0.9.5 2.4.0 2.0.8 + 4.1.1 1.8.2 hadoop2 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32610][DOCS] Fix the link to metrics.dropwizard.io in monitoring.md to refer the proper version
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new c4807ce [SPARK-32610][DOCS] Fix the link to metrics.dropwizard.io in monitoring.md to refer the proper version c4807ce is described below commit c4807ced3913a4d524892dc7bab502250687a43c Author: Kousuke Saruta AuthorDate: Sun Aug 16 12:07:37 2020 -0500 [SPARK-32610][DOCS] Fix the link to metrics.dropwizard.io in monitoring.md to refer the proper version ### What changes were proposed in this pull request? This PR fixes the link to metrics.dropwizard.io in monitoring.md to refer the proper version of the library. ### Why are the changes needed? There are links to metrics.dropwizard.io in monitoring.md but the link targets refer the version 3.1.0, while we use 4.1.1. Now that users can create their own metrics using the dropwizard library, it's better to fix the links to refer the proper version. ### Does this PR introduce _any_ user-facing change? Yes. The modified links refer the version 4.1.1. ### How was this patch tested? Build the docs and visit all the modified links. Closes #29426 from sarutak/fix-dropwizard-url. Authored-by: Kousuke Saruta Signed-off-by: Sean Owen (cherry picked from commit 9a79bbc8b6e426e7b29a9f4867beb396014d8046) Signed-off-by: Sean Owen --- docs/monitoring.md | 8 pom.xml| 4 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/docs/monitoring.md b/docs/monitoring.md index 1808167..4608a4e 100644 --- a/docs/monitoring.md +++ b/docs/monitoring.md @@ -718,7 +718,7 @@ The JSON end point is exposed at: `/applications/[app-id]/executors`, and the Pr The Prometheus endpoint is experimental and conditional to a configuration parameter: `spark.ui.prometheus.enabled=true` (the default is `false`). In addition, aggregated per-stage peak values of the executor memory metrics are written to the event log if `spark.eventLog.logStageExecutorMetrics` is true. -Executor memory metrics are also exposed via the Spark metrics system based on the Dropwizard metrics library. +Executor memory metrics are also exposed via the Spark metrics system based on the [Dropwizard metrics library](http://metrics.dropwizard.io/4.1.1). A list of the available metrics, with a short description: @@ -922,7 +922,7 @@ keep the paths consistent in both modes. # Metrics Spark has a configurable metrics system based on the -[Dropwizard Metrics Library](http://metrics.dropwizard.io/). +[Dropwizard Metrics Library](http://metrics.dropwizard.io/4.1.1). This allows users to report Spark metrics to a variety of sinks including HTTP, JMX, and CSV files. The metrics are generated by sources embedded in the Spark code base. They provide instrumentation for specific activities and Spark components. @@ -1016,7 +1016,7 @@ activates the JVM source: ## List of available metrics providers Metrics used by Spark are of multiple types: gauge, counter, histogram, meter and timer, -see [Dropwizard library documentation for details](https://metrics.dropwizard.io/3.1.0/getting-started/). +see [Dropwizard library documentation for details](https://metrics.dropwizard.io/4.1.1/getting-started.html). The following list of components and metrics reports the name and some details about the available metrics, grouped per component instance and source namespace. The most common time of metrics used in Spark instrumentation are gauges and counters. @@ -1244,7 +1244,7 @@ Notes: `spark.metrics.staticSources.enabled` (default is true) - This source is available for driver and executor instances and is also available for other instances. - This source provides information on JVM metrics using the - [Dropwizard/Codahale Metric Sets for JVM instrumentation](https://metrics.dropwizard.io/3.1.0/manual/jvm/) + [Dropwizard/Codahale Metric Sets for JVM instrumentation](https://metrics.dropwizard.io/4.1.1/manual/jvm.html) and in particular the metric sets BufferPoolMetricSet, GarbageCollectorMetricSet and MemoryUsageGaugeSet. ### Component instance = applicationMaster diff --git a/pom.xml b/pom.xml index e9ae204..1bf5de0 100644 --- a/pom.xml +++ b/pom.xml @@ -145,6 +145,10 @@ 0.9.5 2.4.0 2.0.8 + 4.1.1 1.8.2 hadoop2 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c280c7f -> 9a79bbc)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c280c7f [SPARK-32625][SQL] Log error message when falling back to interpreter mode add 9a79bbc [SPARK-32610][DOCS] Fix the link to metrics.dropwizard.io in monitoring.md to refer the proper version No new revisions were added by this update. Summary of changes: docs/monitoring.md | 8 pom.xml| 4 2 files changed, 8 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32610][DOCS] Fix the link to metrics.dropwizard.io in monitoring.md to refer the proper version
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new c4807ce [SPARK-32610][DOCS] Fix the link to metrics.dropwizard.io in monitoring.md to refer the proper version c4807ce is described below commit c4807ced3913a4d524892dc7bab502250687a43c Author: Kousuke Saruta AuthorDate: Sun Aug 16 12:07:37 2020 -0500 [SPARK-32610][DOCS] Fix the link to metrics.dropwizard.io in monitoring.md to refer the proper version ### What changes were proposed in this pull request? This PR fixes the link to metrics.dropwizard.io in monitoring.md to refer the proper version of the library. ### Why are the changes needed? There are links to metrics.dropwizard.io in monitoring.md but the link targets refer the version 3.1.0, while we use 4.1.1. Now that users can create their own metrics using the dropwizard library, it's better to fix the links to refer the proper version. ### Does this PR introduce _any_ user-facing change? Yes. The modified links refer the version 4.1.1. ### How was this patch tested? Build the docs and visit all the modified links. Closes #29426 from sarutak/fix-dropwizard-url. Authored-by: Kousuke Saruta Signed-off-by: Sean Owen (cherry picked from commit 9a79bbc8b6e426e7b29a9f4867beb396014d8046) Signed-off-by: Sean Owen --- docs/monitoring.md | 8 pom.xml| 4 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/docs/monitoring.md b/docs/monitoring.md index 1808167..4608a4e 100644 --- a/docs/monitoring.md +++ b/docs/monitoring.md @@ -718,7 +718,7 @@ The JSON end point is exposed at: `/applications/[app-id]/executors`, and the Pr The Prometheus endpoint is experimental and conditional to a configuration parameter: `spark.ui.prometheus.enabled=true` (the default is `false`). In addition, aggregated per-stage peak values of the executor memory metrics are written to the event log if `spark.eventLog.logStageExecutorMetrics` is true. -Executor memory metrics are also exposed via the Spark metrics system based on the Dropwizard metrics library. +Executor memory metrics are also exposed via the Spark metrics system based on the [Dropwizard metrics library](http://metrics.dropwizard.io/4.1.1). A list of the available metrics, with a short description: @@ -922,7 +922,7 @@ keep the paths consistent in both modes. # Metrics Spark has a configurable metrics system based on the -[Dropwizard Metrics Library](http://metrics.dropwizard.io/). +[Dropwizard Metrics Library](http://metrics.dropwizard.io/4.1.1). This allows users to report Spark metrics to a variety of sinks including HTTP, JMX, and CSV files. The metrics are generated by sources embedded in the Spark code base. They provide instrumentation for specific activities and Spark components. @@ -1016,7 +1016,7 @@ activates the JVM source: ## List of available metrics providers Metrics used by Spark are of multiple types: gauge, counter, histogram, meter and timer, -see [Dropwizard library documentation for details](https://metrics.dropwizard.io/3.1.0/getting-started/). +see [Dropwizard library documentation for details](https://metrics.dropwizard.io/4.1.1/getting-started.html). The following list of components and metrics reports the name and some details about the available metrics, grouped per component instance and source namespace. The most common time of metrics used in Spark instrumentation are gauges and counters. @@ -1244,7 +1244,7 @@ Notes: `spark.metrics.staticSources.enabled` (default is true) - This source is available for driver and executor instances and is also available for other instances. - This source provides information on JVM metrics using the - [Dropwizard/Codahale Metric Sets for JVM instrumentation](https://metrics.dropwizard.io/3.1.0/manual/jvm/) + [Dropwizard/Codahale Metric Sets for JVM instrumentation](https://metrics.dropwizard.io/4.1.1/manual/jvm.html) and in particular the metric sets BufferPoolMetricSet, GarbageCollectorMetricSet and MemoryUsageGaugeSet. ### Component instance = applicationMaster diff --git a/pom.xml b/pom.xml index e9ae204..1bf5de0 100644 --- a/pom.xml +++ b/pom.xml @@ -145,6 +145,10 @@ 0.9.5 2.4.0 2.0.8 + 4.1.1 1.8.2 hadoop2 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c280c7f -> 9a79bbc)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c280c7f [SPARK-32625][SQL] Log error message when falling back to interpreter mode add 9a79bbc [SPARK-32610][DOCS] Fix the link to metrics.dropwizard.io in monitoring.md to refer the proper version No new revisions were added by this update. Summary of changes: docs/monitoring.md | 8 pom.xml| 4 2 files changed, 8 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32610][DOCS] Fix the link to metrics.dropwizard.io in monitoring.md to refer the proper version
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new c4807ce [SPARK-32610][DOCS] Fix the link to metrics.dropwizard.io in monitoring.md to refer the proper version c4807ce is described below commit c4807ced3913a4d524892dc7bab502250687a43c Author: Kousuke Saruta AuthorDate: Sun Aug 16 12:07:37 2020 -0500 [SPARK-32610][DOCS] Fix the link to metrics.dropwizard.io in monitoring.md to refer the proper version ### What changes were proposed in this pull request? This PR fixes the link to metrics.dropwizard.io in monitoring.md to refer the proper version of the library. ### Why are the changes needed? There are links to metrics.dropwizard.io in monitoring.md but the link targets refer the version 3.1.0, while we use 4.1.1. Now that users can create their own metrics using the dropwizard library, it's better to fix the links to refer the proper version. ### Does this PR introduce _any_ user-facing change? Yes. The modified links refer the version 4.1.1. ### How was this patch tested? Build the docs and visit all the modified links. Closes #29426 from sarutak/fix-dropwizard-url. Authored-by: Kousuke Saruta Signed-off-by: Sean Owen (cherry picked from commit 9a79bbc8b6e426e7b29a9f4867beb396014d8046) Signed-off-by: Sean Owen --- docs/monitoring.md | 8 pom.xml| 4 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/docs/monitoring.md b/docs/monitoring.md index 1808167..4608a4e 100644 --- a/docs/monitoring.md +++ b/docs/monitoring.md @@ -718,7 +718,7 @@ The JSON end point is exposed at: `/applications/[app-id]/executors`, and the Pr The Prometheus endpoint is experimental and conditional to a configuration parameter: `spark.ui.prometheus.enabled=true` (the default is `false`). In addition, aggregated per-stage peak values of the executor memory metrics are written to the event log if `spark.eventLog.logStageExecutorMetrics` is true. -Executor memory metrics are also exposed via the Spark metrics system based on the Dropwizard metrics library. +Executor memory metrics are also exposed via the Spark metrics system based on the [Dropwizard metrics library](http://metrics.dropwizard.io/4.1.1). A list of the available metrics, with a short description: @@ -922,7 +922,7 @@ keep the paths consistent in both modes. # Metrics Spark has a configurable metrics system based on the -[Dropwizard Metrics Library](http://metrics.dropwizard.io/). +[Dropwizard Metrics Library](http://metrics.dropwizard.io/4.1.1). This allows users to report Spark metrics to a variety of sinks including HTTP, JMX, and CSV files. The metrics are generated by sources embedded in the Spark code base. They provide instrumentation for specific activities and Spark components. @@ -1016,7 +1016,7 @@ activates the JVM source: ## List of available metrics providers Metrics used by Spark are of multiple types: gauge, counter, histogram, meter and timer, -see [Dropwizard library documentation for details](https://metrics.dropwizard.io/3.1.0/getting-started/). +see [Dropwizard library documentation for details](https://metrics.dropwizard.io/4.1.1/getting-started.html). The following list of components and metrics reports the name and some details about the available metrics, grouped per component instance and source namespace. The most common time of metrics used in Spark instrumentation are gauges and counters. @@ -1244,7 +1244,7 @@ Notes: `spark.metrics.staticSources.enabled` (default is true) - This source is available for driver and executor instances and is also available for other instances. - This source provides information on JVM metrics using the - [Dropwizard/Codahale Metric Sets for JVM instrumentation](https://metrics.dropwizard.io/3.1.0/manual/jvm/) + [Dropwizard/Codahale Metric Sets for JVM instrumentation](https://metrics.dropwizard.io/4.1.1/manual/jvm.html) and in particular the metric sets BufferPoolMetricSet, GarbageCollectorMetricSet and MemoryUsageGaugeSet. ### Component instance = applicationMaster diff --git a/pom.xml b/pom.xml index e9ae204..1bf5de0 100644 --- a/pom.xml +++ b/pom.xml @@ -145,6 +145,10 @@ 0.9.5 2.4.0 2.0.8 + 4.1.1 1.8.2 hadoop2 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c280c7f -> 9a79bbc)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c280c7f [SPARK-32625][SQL] Log error message when falling back to interpreter mode add 9a79bbc [SPARK-32610][DOCS] Fix the link to metrics.dropwizard.io in monitoring.md to refer the proper version No new revisions were added by this update. Summary of changes: docs/monitoring.md | 8 pom.xml| 4 2 files changed, 8 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32610][DOCS] Fix the link to metrics.dropwizard.io in monitoring.md to refer the proper version
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new c4807ce [SPARK-32610][DOCS] Fix the link to metrics.dropwizard.io in monitoring.md to refer the proper version c4807ce is described below commit c4807ced3913a4d524892dc7bab502250687a43c Author: Kousuke Saruta AuthorDate: Sun Aug 16 12:07:37 2020 -0500 [SPARK-32610][DOCS] Fix the link to metrics.dropwizard.io in monitoring.md to refer the proper version ### What changes were proposed in this pull request? This PR fixes the link to metrics.dropwizard.io in monitoring.md to refer the proper version of the library. ### Why are the changes needed? There are links to metrics.dropwizard.io in monitoring.md but the link targets refer the version 3.1.0, while we use 4.1.1. Now that users can create their own metrics using the dropwizard library, it's better to fix the links to refer the proper version. ### Does this PR introduce _any_ user-facing change? Yes. The modified links refer the version 4.1.1. ### How was this patch tested? Build the docs and visit all the modified links. Closes #29426 from sarutak/fix-dropwizard-url. Authored-by: Kousuke Saruta Signed-off-by: Sean Owen (cherry picked from commit 9a79bbc8b6e426e7b29a9f4867beb396014d8046) Signed-off-by: Sean Owen --- docs/monitoring.md | 8 pom.xml| 4 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/docs/monitoring.md b/docs/monitoring.md index 1808167..4608a4e 100644 --- a/docs/monitoring.md +++ b/docs/monitoring.md @@ -718,7 +718,7 @@ The JSON end point is exposed at: `/applications/[app-id]/executors`, and the Pr The Prometheus endpoint is experimental and conditional to a configuration parameter: `spark.ui.prometheus.enabled=true` (the default is `false`). In addition, aggregated per-stage peak values of the executor memory metrics are written to the event log if `spark.eventLog.logStageExecutorMetrics` is true. -Executor memory metrics are also exposed via the Spark metrics system based on the Dropwizard metrics library. +Executor memory metrics are also exposed via the Spark metrics system based on the [Dropwizard metrics library](http://metrics.dropwizard.io/4.1.1). A list of the available metrics, with a short description: @@ -922,7 +922,7 @@ keep the paths consistent in both modes. # Metrics Spark has a configurable metrics system based on the -[Dropwizard Metrics Library](http://metrics.dropwizard.io/). +[Dropwizard Metrics Library](http://metrics.dropwizard.io/4.1.1). This allows users to report Spark metrics to a variety of sinks including HTTP, JMX, and CSV files. The metrics are generated by sources embedded in the Spark code base. They provide instrumentation for specific activities and Spark components. @@ -1016,7 +1016,7 @@ activates the JVM source: ## List of available metrics providers Metrics used by Spark are of multiple types: gauge, counter, histogram, meter and timer, -see [Dropwizard library documentation for details](https://metrics.dropwizard.io/3.1.0/getting-started/). +see [Dropwizard library documentation for details](https://metrics.dropwizard.io/4.1.1/getting-started.html). The following list of components and metrics reports the name and some details about the available metrics, grouped per component instance and source namespace. The most common time of metrics used in Spark instrumentation are gauges and counters. @@ -1244,7 +1244,7 @@ Notes: `spark.metrics.staticSources.enabled` (default is true) - This source is available for driver and executor instances and is also available for other instances. - This source provides information on JVM metrics using the - [Dropwizard/Codahale Metric Sets for JVM instrumentation](https://metrics.dropwizard.io/3.1.0/manual/jvm/) + [Dropwizard/Codahale Metric Sets for JVM instrumentation](https://metrics.dropwizard.io/4.1.1/manual/jvm.html) and in particular the metric sets BufferPoolMetricSet, GarbageCollectorMetricSet and MemoryUsageGaugeSet. ### Component instance = applicationMaster diff --git a/pom.xml b/pom.xml index e9ae204..1bf5de0 100644 --- a/pom.xml +++ b/pom.xml @@ -145,6 +145,10 @@ 0.9.5 2.4.0 2.0.8 + 4.1.1 1.8.2 hadoop2 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c280c7f -> 9a79bbc)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c280c7f [SPARK-32625][SQL] Log error message when falling back to interpreter mode add 9a79bbc [SPARK-32610][DOCS] Fix the link to metrics.dropwizard.io in monitoring.md to refer the proper version No new revisions were added by this update. Summary of changes: docs/monitoring.md | 8 pom.xml| 4 2 files changed, 8 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-32610][DOCS] Fix the link to metrics.dropwizard.io in monitoring.md to refer the proper version
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 9a79bbc [SPARK-32610][DOCS] Fix the link to metrics.dropwizard.io in monitoring.md to refer the proper version 9a79bbc is described below commit 9a79bbc8b6e426e7b29a9f4867beb396014d8046 Author: Kousuke Saruta AuthorDate: Sun Aug 16 12:07:37 2020 -0500 [SPARK-32610][DOCS] Fix the link to metrics.dropwizard.io in monitoring.md to refer the proper version ### What changes were proposed in this pull request? This PR fixes the link to metrics.dropwizard.io in monitoring.md to refer the proper version of the library. ### Why are the changes needed? There are links to metrics.dropwizard.io in monitoring.md but the link targets refer the version 3.1.0, while we use 4.1.1. Now that users can create their own metrics using the dropwizard library, it's better to fix the links to refer the proper version. ### Does this PR introduce _any_ user-facing change? Yes. The modified links refer the version 4.1.1. ### How was this patch tested? Build the docs and visit all the modified links. Closes #29426 from sarutak/fix-dropwizard-url. Authored-by: Kousuke Saruta Signed-off-by: Sean Owen --- docs/monitoring.md | 8 pom.xml| 4 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/docs/monitoring.md b/docs/monitoring.md index 5fdf308..31fc160 100644 --- a/docs/monitoring.md +++ b/docs/monitoring.md @@ -758,7 +758,7 @@ The JSON end point is exposed at: `/applications/[app-id]/executors`, and the Pr The Prometheus endpoint is experimental and conditional to a configuration parameter: `spark.ui.prometheus.enabled=true` (the default is `false`). In addition, aggregated per-stage peak values of the executor memory metrics are written to the event log if `spark.eventLog.logStageExecutorMetrics` is true. -Executor memory metrics are also exposed via the Spark metrics system based on the Dropwizard metrics library. +Executor memory metrics are also exposed via the Spark metrics system based on the [Dropwizard metrics library](http://metrics.dropwizard.io/4.1.1). A list of the available metrics, with a short description: @@ -962,7 +962,7 @@ keep the paths consistent in both modes. # Metrics Spark has a configurable metrics system based on the -[Dropwizard Metrics Library](http://metrics.dropwizard.io/). +[Dropwizard Metrics Library](http://metrics.dropwizard.io/4.1.1). This allows users to report Spark metrics to a variety of sinks including HTTP, JMX, and CSV files. The metrics are generated by sources embedded in the Spark code base. They provide instrumentation for specific activities and Spark components. @@ -1056,7 +1056,7 @@ activates the JVM source: ## List of available metrics providers Metrics used by Spark are of multiple types: gauge, counter, histogram, meter and timer, -see [Dropwizard library documentation for details](https://metrics.dropwizard.io/3.1.0/getting-started/). +see [Dropwizard library documentation for details](https://metrics.dropwizard.io/4.1.1/getting-started.html). The following list of components and metrics reports the name and some details about the available metrics, grouped per component instance and source namespace. The most common time of metrics used in Spark instrumentation are gauges and counters. @@ -1284,7 +1284,7 @@ Notes: `spark.metrics.staticSources.enabled` (default is true) - This source is available for driver and executor instances and is also available for other instances. - This source provides information on JVM metrics using the - [Dropwizard/Codahale Metric Sets for JVM instrumentation](https://metrics.dropwizard.io/3.1.0/manual/jvm/) + [Dropwizard/Codahale Metric Sets for JVM instrumentation](https://metrics.dropwizard.io/4.1.1/manual/jvm.html) and in particular the metric sets BufferPoolMetricSet, GarbageCollectorMetricSet and MemoryUsageGaugeSet. ### Component instance = applicationMaster diff --git a/pom.xml b/pom.xml index e414835..23de569 100644 --- a/pom.xml +++ b/pom.xml @@ -145,6 +145,10 @@ 0.9.5 2.4.0 2.0.8 + 4.1.1 1.8.2 hadoop2 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (0c850c7 -> 6ae2cb2)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 0c850c7 [SPARK-32511][SQL] Add dropFields method to Column class add 6ae2cb2 [SPARK-32526][SQL] Fix some test cases of `sql/catalyst` module in scala 2.13 No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/expressions/ExpressionSet.scala | 8 +++- .../apache/spark/sql/catalyst/analysis/Analyzer.scala | 6 +++--- .../sql/catalyst/expressions/collectionOperations.scala | 17 + .../sql/catalyst/expressions/higherOrderFunctions.scala | 2 +- .../sql/catalyst/expressions/stringExpressions.scala| 4 ++-- .../apache/spark/sql/catalyst/json/JacksonParser.scala | 2 +- .../apache/spark/sql/catalyst/optimizer/Optimizer.scala | 2 +- .../apache/spark/sql/catalyst/parser/AstBuilder.scala | 5 +++-- .../org/apache/spark/sql/catalyst/trees/TreeNode.scala | 8 ++-- .../scala/org/apache/spark/sql/types/Metadata.scala | 4 +++- .../scala/org/apache/spark/sql/util/SchemaUtils.scala | 2 +- .../org/apache/spark/sql/RandomDataGenerator.scala | 2 +- .../org/apache/spark/sql/util/SchemaUtilsSuite.scala| 2 +- 13 files changed, 39 insertions(+), 25 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (0c850c7 -> 6ae2cb2)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 0c850c7 [SPARK-32511][SQL] Add dropFields method to Column class add 6ae2cb2 [SPARK-32526][SQL] Fix some test cases of `sql/catalyst` module in scala 2.13 No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/expressions/ExpressionSet.scala | 8 +++- .../apache/spark/sql/catalyst/analysis/Analyzer.scala | 6 +++--- .../sql/catalyst/expressions/collectionOperations.scala | 17 + .../sql/catalyst/expressions/higherOrderFunctions.scala | 2 +- .../sql/catalyst/expressions/stringExpressions.scala| 4 ++-- .../apache/spark/sql/catalyst/json/JacksonParser.scala | 2 +- .../apache/spark/sql/catalyst/optimizer/Optimizer.scala | 2 +- .../apache/spark/sql/catalyst/parser/AstBuilder.scala | 5 +++-- .../org/apache/spark/sql/catalyst/trees/TreeNode.scala | 8 ++-- .../scala/org/apache/spark/sql/types/Metadata.scala | 4 +++- .../scala/org/apache/spark/sql/util/SchemaUtils.scala | 2 +- .../org/apache/spark/sql/RandomDataGenerator.scala | 2 +- .../org/apache/spark/sql/util/SchemaUtilsSuite.scala| 2 +- 13 files changed, 39 insertions(+), 25 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org