[spark] branch master updated (acfee3c -> 21b7479)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from acfee3c [SPARK-32870][DOCS][SQL] Make sure that all expressions have their ExpressionDescription filled add 21b7479 [SPARK-32959][SQL][TEST] Fix an invalid test in DataSourceV2SQLSuite No new revisions were added by this update. Summary of changes: .../apache/spark/sql/connector/DataSourceV2SQLSuite.scala | 3 ++- .../spark/sql/connector/TestV2SessionCatalogBase.scala| 15 +-- 2 files changed, 11 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (acfee3c -> 21b7479)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from acfee3c [SPARK-32870][DOCS][SQL] Make sure that all expressions have their ExpressionDescription filled add 21b7479 [SPARK-32959][SQL][TEST] Fix an invalid test in DataSourceV2SQLSuite No new revisions were added by this update. Summary of changes: .../apache/spark/sql/connector/DataSourceV2SQLSuite.scala | 3 ++- .../spark/sql/connector/TestV2SessionCatalogBase.scala| 15 +-- 2 files changed, 11 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (acfee3c -> 21b7479)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from acfee3c [SPARK-32870][DOCS][SQL] Make sure that all expressions have their ExpressionDescription filled add 21b7479 [SPARK-32959][SQL][TEST] Fix an invalid test in DataSourceV2SQLSuite No new revisions were added by this update. Summary of changes: .../apache/spark/sql/connector/DataSourceV2SQLSuite.scala | 3 ++- .../spark/sql/connector/TestV2SessionCatalogBase.scala| 15 +-- 2 files changed, 11 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (acfee3c -> 21b7479)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from acfee3c [SPARK-32870][DOCS][SQL] Make sure that all expressions have their ExpressionDescription filled add 21b7479 [SPARK-32959][SQL][TEST] Fix an invalid test in DataSourceV2SQLSuite No new revisions were added by this update. Summary of changes: .../apache/spark/sql/connector/DataSourceV2SQLSuite.scala | 3 ++- .../spark/sql/connector/TestV2SessionCatalogBase.scala| 15 +-- 2 files changed, 11 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b53da23 -> acfee3c)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b53da23 [MINOR][SQL] Improve examples for `percentile_approx()` add acfee3c [SPARK-32870][DOCS][SQL] Make sure that all expressions have their ExpressionDescription filled No new revisions were added by this update. Summary of changes: .../sql/catalyst/analysis/FunctionRegistry.scala | 2 +- .../expressions/CallMethodViaReflection.scala | 3 +- .../spark/sql/catalyst/expressions/Cast.scala | 3 +- .../expressions/MonotonicallyIncreasingID.scala| 8 +- .../catalyst/expressions/SparkPartitionID.scala| 8 +- .../expressions/aggregate/CountMinSketchAgg.scala | 7 ++ .../expressions/aggregate/bitwiseAggregates.scala | 1 + .../sql/catalyst/expressions/arithmetic.scala | 38 ++--- .../catalyst/expressions/bitwiseExpressions.scala | 12 ++- .../expressions/collectionOperations.scala | 18 +++-- .../catalyst/expressions/complexTypeCreator.scala | 20 +++-- .../expressions/conditionalExpressions.scala | 6 +- .../catalyst/expressions/datetimeExpressions.scala | 40 +- .../sql/catalyst/expressions/generators.scala | 12 ++- .../spark/sql/catalyst/expressions/hash.scala | 18 +++-- .../sql/catalyst/expressions/inputFileBlock.scala | 27 ++- .../sql/catalyst/expressions/jsonExpressions.scala | 6 +- .../spark/sql/catalyst/expressions/misc.scala | 16 +++- .../sql/catalyst/expressions/predicates.scala | 61 --- .../catalyst/expressions/windowExpressions.scala | 91 ++ .../spark/sql/catalyst/expressions/xml/xpath.scala | 24 -- .../sql-functions/sql-expression-schema.md | 48 ++-- .../test/resources/sql-tests/results/cast.sql.out | 2 + .../apache/spark/sql/ExpressionsSchemaSuite.scala | 10 ++- .../spark/sql/execution/command/DDLSuite.scala | 6 +- .../sql/expressions/ExpressionInfoSuite.scala | 37 - 26 files changed, 404 insertions(+), 120 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b53da23 -> acfee3c)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b53da23 [MINOR][SQL] Improve examples for `percentile_approx()` add acfee3c [SPARK-32870][DOCS][SQL] Make sure that all expressions have their ExpressionDescription filled No new revisions were added by this update. Summary of changes: .../sql/catalyst/analysis/FunctionRegistry.scala | 2 +- .../expressions/CallMethodViaReflection.scala | 3 +- .../spark/sql/catalyst/expressions/Cast.scala | 3 +- .../expressions/MonotonicallyIncreasingID.scala| 8 +- .../catalyst/expressions/SparkPartitionID.scala| 8 +- .../expressions/aggregate/CountMinSketchAgg.scala | 7 ++ .../expressions/aggregate/bitwiseAggregates.scala | 1 + .../sql/catalyst/expressions/arithmetic.scala | 38 ++--- .../catalyst/expressions/bitwiseExpressions.scala | 12 ++- .../expressions/collectionOperations.scala | 18 +++-- .../catalyst/expressions/complexTypeCreator.scala | 20 +++-- .../expressions/conditionalExpressions.scala | 6 +- .../catalyst/expressions/datetimeExpressions.scala | 40 +- .../sql/catalyst/expressions/generators.scala | 12 ++- .../spark/sql/catalyst/expressions/hash.scala | 18 +++-- .../sql/catalyst/expressions/inputFileBlock.scala | 27 ++- .../sql/catalyst/expressions/jsonExpressions.scala | 6 +- .../spark/sql/catalyst/expressions/misc.scala | 16 +++- .../sql/catalyst/expressions/predicates.scala | 61 --- .../catalyst/expressions/windowExpressions.scala | 91 ++ .../spark/sql/catalyst/expressions/xml/xpath.scala | 24 -- .../sql-functions/sql-expression-schema.md | 48 ++-- .../test/resources/sql-tests/results/cast.sql.out | 2 + .../apache/spark/sql/ExpressionsSchemaSuite.scala | 10 ++- .../spark/sql/execution/command/DDLSuite.scala | 6 +- .../sql/expressions/ExpressionInfoSuite.scala | 37 - 26 files changed, 404 insertions(+), 120 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b53da23 -> acfee3c)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b53da23 [MINOR][SQL] Improve examples for `percentile_approx()` add acfee3c [SPARK-32870][DOCS][SQL] Make sure that all expressions have their ExpressionDescription filled No new revisions were added by this update. Summary of changes: .../sql/catalyst/analysis/FunctionRegistry.scala | 2 +- .../expressions/CallMethodViaReflection.scala | 3 +- .../spark/sql/catalyst/expressions/Cast.scala | 3 +- .../expressions/MonotonicallyIncreasingID.scala| 8 +- .../catalyst/expressions/SparkPartitionID.scala| 8 +- .../expressions/aggregate/CountMinSketchAgg.scala | 7 ++ .../expressions/aggregate/bitwiseAggregates.scala | 1 + .../sql/catalyst/expressions/arithmetic.scala | 38 ++--- .../catalyst/expressions/bitwiseExpressions.scala | 12 ++- .../expressions/collectionOperations.scala | 18 +++-- .../catalyst/expressions/complexTypeCreator.scala | 20 +++-- .../expressions/conditionalExpressions.scala | 6 +- .../catalyst/expressions/datetimeExpressions.scala | 40 +- .../sql/catalyst/expressions/generators.scala | 12 ++- .../spark/sql/catalyst/expressions/hash.scala | 18 +++-- .../sql/catalyst/expressions/inputFileBlock.scala | 27 ++- .../sql/catalyst/expressions/jsonExpressions.scala | 6 +- .../spark/sql/catalyst/expressions/misc.scala | 16 +++- .../sql/catalyst/expressions/predicates.scala | 61 --- .../catalyst/expressions/windowExpressions.scala | 91 ++ .../spark/sql/catalyst/expressions/xml/xpath.scala | 24 -- .../sql-functions/sql-expression-schema.md | 48 ++-- .../test/resources/sql-tests/results/cast.sql.out | 2 + .../apache/spark/sql/ExpressionsSchemaSuite.scala | 10 ++- .../spark/sql/execution/command/DDLSuite.scala | 6 +- .../sql/expressions/ExpressionInfoSuite.scala | 37 - 26 files changed, 404 insertions(+), 120 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b53da23 -> acfee3c)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b53da23 [MINOR][SQL] Improve examples for `percentile_approx()` add acfee3c [SPARK-32870][DOCS][SQL] Make sure that all expressions have their ExpressionDescription filled No new revisions were added by this update. Summary of changes: .../sql/catalyst/analysis/FunctionRegistry.scala | 2 +- .../expressions/CallMethodViaReflection.scala | 3 +- .../spark/sql/catalyst/expressions/Cast.scala | 3 +- .../expressions/MonotonicallyIncreasingID.scala| 8 +- .../catalyst/expressions/SparkPartitionID.scala| 8 +- .../expressions/aggregate/CountMinSketchAgg.scala | 7 ++ .../expressions/aggregate/bitwiseAggregates.scala | 1 + .../sql/catalyst/expressions/arithmetic.scala | 38 ++--- .../catalyst/expressions/bitwiseExpressions.scala | 12 ++- .../expressions/collectionOperations.scala | 18 +++-- .../catalyst/expressions/complexTypeCreator.scala | 20 +++-- .../expressions/conditionalExpressions.scala | 6 +- .../catalyst/expressions/datetimeExpressions.scala | 40 +- .../sql/catalyst/expressions/generators.scala | 12 ++- .../spark/sql/catalyst/expressions/hash.scala | 18 +++-- .../sql/catalyst/expressions/inputFileBlock.scala | 27 ++- .../sql/catalyst/expressions/jsonExpressions.scala | 6 +- .../spark/sql/catalyst/expressions/misc.scala | 16 +++- .../sql/catalyst/expressions/predicates.scala | 61 --- .../catalyst/expressions/windowExpressions.scala | 91 ++ .../spark/sql/catalyst/expressions/xml/xpath.scala | 24 -- .../sql-functions/sql-expression-schema.md | 48 ++-- .../test/resources/sql-tests/results/cast.sql.out | 2 + .../apache/spark/sql/ExpressionsSchemaSuite.scala | 10 ++- .../spark/sql/execution/command/DDLSuite.scala | 6 +- .../sql/expressions/ExpressionInfoSuite.scala | 37 - 26 files changed, 404 insertions(+), 120 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b53da23 -> acfee3c)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b53da23 [MINOR][SQL] Improve examples for `percentile_approx()` add acfee3c [SPARK-32870][DOCS][SQL] Make sure that all expressions have their ExpressionDescription filled No new revisions were added by this update. Summary of changes: .../sql/catalyst/analysis/FunctionRegistry.scala | 2 +- .../expressions/CallMethodViaReflection.scala | 3 +- .../spark/sql/catalyst/expressions/Cast.scala | 3 +- .../expressions/MonotonicallyIncreasingID.scala| 8 +- .../catalyst/expressions/SparkPartitionID.scala| 8 +- .../expressions/aggregate/CountMinSketchAgg.scala | 7 ++ .../expressions/aggregate/bitwiseAggregates.scala | 1 + .../sql/catalyst/expressions/arithmetic.scala | 38 ++--- .../catalyst/expressions/bitwiseExpressions.scala | 12 ++- .../expressions/collectionOperations.scala | 18 +++-- .../catalyst/expressions/complexTypeCreator.scala | 20 +++-- .../expressions/conditionalExpressions.scala | 6 +- .../catalyst/expressions/datetimeExpressions.scala | 40 +- .../sql/catalyst/expressions/generators.scala | 12 ++- .../spark/sql/catalyst/expressions/hash.scala | 18 +++-- .../sql/catalyst/expressions/inputFileBlock.scala | 27 ++- .../sql/catalyst/expressions/jsonExpressions.scala | 6 +- .../spark/sql/catalyst/expressions/misc.scala | 16 +++- .../sql/catalyst/expressions/predicates.scala | 61 --- .../catalyst/expressions/windowExpressions.scala | 91 ++ .../spark/sql/catalyst/expressions/xml/xpath.scala | 24 -- .../sql-functions/sql-expression-schema.md | 48 ++-- .../test/resources/sql-tests/results/cast.sql.out | 2 + .../apache/spark/sql/ExpressionsSchemaSuite.scala | 10 ++- .../spark/sql/execution/command/DDLSuite.scala | 6 +- .../sql/expressions/ExpressionInfoSuite.scala | 37 - 26 files changed, 404 insertions(+), 120 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (942f577 -> b53da23)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 942f577 [SPARK-32017][PYTHON][BUILD] Make Pyspark Hadoop 3.2+ Variant available in PyPI add b53da23 [MINOR][SQL] Improve examples for `percentile_approx()` No new revisions were added by this update. Summary of changes: .../catalyst/expressions/aggregate/ApproximatePercentile.scala| 8 .../src/test/resources/sql-functions/sql-expression-schema.md | 4 ++-- 2 files changed, 6 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (942f577 -> b53da23)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 942f577 [SPARK-32017][PYTHON][BUILD] Make Pyspark Hadoop 3.2+ Variant available in PyPI add b53da23 [MINOR][SQL] Improve examples for `percentile_approx()` No new revisions were added by this update. Summary of changes: .../catalyst/expressions/aggregate/ApproximatePercentile.scala| 8 .../src/test/resources/sql-functions/sql-expression-schema.md | 4 ++-- 2 files changed, 6 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (942f577 -> b53da23)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 942f577 [SPARK-32017][PYTHON][BUILD] Make Pyspark Hadoop 3.2+ Variant available in PyPI add b53da23 [MINOR][SQL] Improve examples for `percentile_approx()` No new revisions were added by this update. Summary of changes: .../catalyst/expressions/aggregate/ApproximatePercentile.scala| 8 .../src/test/resources/sql-functions/sql-expression-schema.md | 4 ++-- 2 files changed, 6 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (779f0a8 -> 942f577)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 779f0a8 [SPARK-32933][PYTHON] Use keyword-only syntax for keyword_only methods add 942f577 [SPARK-32017][PYTHON][BUILD] Make Pyspark Hadoop 3.2+ Variant available in PyPI No new revisions were added by this update. Summary of changes: dev/create-release/release-build.sh| 3 + dev/sparktestsupport/modules.py| 1 + python/docs/source/getting_started/install.rst | 35 + python/pyspark/find_spark_home.py | 11 +- python/pyspark/install.py | 173 + python/pyspark/tests/test_install_spark.py | 112 python/setup.py| 47 ++- 7 files changed, 380 insertions(+), 2 deletions(-) create mode 100644 python/pyspark/install.py create mode 100644 python/pyspark/tests/test_install_spark.py - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (942f577 -> b53da23)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 942f577 [SPARK-32017][PYTHON][BUILD] Make Pyspark Hadoop 3.2+ Variant available in PyPI add b53da23 [MINOR][SQL] Improve examples for `percentile_approx()` No new revisions were added by this update. Summary of changes: .../catalyst/expressions/aggregate/ApproximatePercentile.scala| 8 .../src/test/resources/sql-functions/sql-expression-schema.md | 4 ++-- 2 files changed, 6 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (779f0a8 -> 942f577)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 779f0a8 [SPARK-32933][PYTHON] Use keyword-only syntax for keyword_only methods add 942f577 [SPARK-32017][PYTHON][BUILD] Make Pyspark Hadoop 3.2+ Variant available in PyPI No new revisions were added by this update. Summary of changes: dev/create-release/release-build.sh| 3 + dev/sparktestsupport/modules.py| 1 + python/docs/source/getting_started/install.rst | 35 + python/pyspark/find_spark_home.py | 11 +- python/pyspark/install.py | 173 + python/pyspark/tests/test_install_spark.py | 112 python/setup.py| 47 ++- 7 files changed, 380 insertions(+), 2 deletions(-) create mode 100644 python/pyspark/install.py create mode 100644 python/pyspark/tests/test_install_spark.py - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7c14f17 -> 779f0a8)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7c14f17 [SPARK-32306][SQL][DOCS] Clarify the result of `percentile_approx()` add 779f0a8 [SPARK-32933][PYTHON] Use keyword-only syntax for keyword_only methods No new revisions were added by this update. Summary of changes: python/pyspark/ml/classification.py | 70 python/pyspark/ml/clustering.py | 40 ++--- python/pyspark/ml/evaluation.py | 48 +++--- python/pyspark/ml/feature.py| 332 ++-- python/pyspark/ml/fpm.py| 16 +- python/pyspark/ml/pipeline.py | 8 +- python/pyspark/ml/recommendation.py | 24 +-- python/pyspark/ml/regression.py | 64 +++ python/pyspark/ml/tuning.py | 24 +-- python/pyspark/sql/streaming.py | 2 +- 10 files changed, 318 insertions(+), 310 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (942f577 -> b53da23)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 942f577 [SPARK-32017][PYTHON][BUILD] Make Pyspark Hadoop 3.2+ Variant available in PyPI add b53da23 [MINOR][SQL] Improve examples for `percentile_approx()` No new revisions were added by this update. Summary of changes: .../catalyst/expressions/aggregate/ApproximatePercentile.scala| 8 .../src/test/resources/sql-functions/sql-expression-schema.md | 4 ++-- 2 files changed, 6 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (779f0a8 -> 942f577)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 779f0a8 [SPARK-32933][PYTHON] Use keyword-only syntax for keyword_only methods add 942f577 [SPARK-32017][PYTHON][BUILD] Make Pyspark Hadoop 3.2+ Variant available in PyPI No new revisions were added by this update. Summary of changes: dev/create-release/release-build.sh| 3 + dev/sparktestsupport/modules.py| 1 + python/docs/source/getting_started/install.rst | 35 + python/pyspark/find_spark_home.py | 11 +- python/pyspark/install.py | 173 + python/pyspark/tests/test_install_spark.py | 112 python/setup.py| 47 ++- 7 files changed, 380 insertions(+), 2 deletions(-) create mode 100644 python/pyspark/install.py create mode 100644 python/pyspark/tests/test_install_spark.py - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7c14f17 -> 779f0a8)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7c14f17 [SPARK-32306][SQL][DOCS] Clarify the result of `percentile_approx()` add 779f0a8 [SPARK-32933][PYTHON] Use keyword-only syntax for keyword_only methods No new revisions were added by this update. Summary of changes: python/pyspark/ml/classification.py | 70 python/pyspark/ml/clustering.py | 40 ++--- python/pyspark/ml/evaluation.py | 48 +++--- python/pyspark/ml/feature.py| 332 ++-- python/pyspark/ml/fpm.py| 16 +- python/pyspark/ml/pipeline.py | 8 +- python/pyspark/ml/recommendation.py | 24 +-- python/pyspark/ml/regression.py | 64 +++ python/pyspark/ml/tuning.py | 24 +-- python/pyspark/sql/streaming.py | 2 +- 10 files changed, 318 insertions(+), 310 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (779f0a8 -> 942f577)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 779f0a8 [SPARK-32933][PYTHON] Use keyword-only syntax for keyword_only methods add 942f577 [SPARK-32017][PYTHON][BUILD] Make Pyspark Hadoop 3.2+ Variant available in PyPI No new revisions were added by this update. Summary of changes: dev/create-release/release-build.sh| 3 + dev/sparktestsupport/modules.py| 1 + python/docs/source/getting_started/install.rst | 35 + python/pyspark/find_spark_home.py | 11 +- python/pyspark/install.py | 173 + python/pyspark/tests/test_install_spark.py | 112 python/setup.py| 47 ++- 7 files changed, 380 insertions(+), 2 deletions(-) create mode 100644 python/pyspark/install.py create mode 100644 python/pyspark/tests/test_install_spark.py - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7c14f17 -> 779f0a8)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7c14f17 [SPARK-32306][SQL][DOCS] Clarify the result of `percentile_approx()` add 779f0a8 [SPARK-32933][PYTHON] Use keyword-only syntax for keyword_only methods No new revisions were added by this update. Summary of changes: python/pyspark/ml/classification.py | 70 python/pyspark/ml/clustering.py | 40 ++--- python/pyspark/ml/evaluation.py | 48 +++--- python/pyspark/ml/feature.py| 332 ++-- python/pyspark/ml/fpm.py| 16 +- python/pyspark/ml/pipeline.py | 8 +- python/pyspark/ml/recommendation.py | 24 +-- python/pyspark/ml/regression.py | 64 +++ python/pyspark/ml/tuning.py | 24 +-- python/pyspark/sql/streaming.py | 2 +- 10 files changed, 318 insertions(+), 310 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (779f0a8 -> 942f577)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 779f0a8 [SPARK-32933][PYTHON] Use keyword-only syntax for keyword_only methods add 942f577 [SPARK-32017][PYTHON][BUILD] Make Pyspark Hadoop 3.2+ Variant available in PyPI No new revisions were added by this update. Summary of changes: dev/create-release/release-build.sh| 3 + dev/sparktestsupport/modules.py| 1 + python/docs/source/getting_started/install.rst | 35 + python/pyspark/find_spark_home.py | 11 +- python/pyspark/install.py | 173 + python/pyspark/tests/test_install_spark.py | 112 python/setup.py| 47 ++- 7 files changed, 380 insertions(+), 2 deletions(-) create mode 100644 python/pyspark/install.py create mode 100644 python/pyspark/tests/test_install_spark.py - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7c14f17 -> 779f0a8)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7c14f17 [SPARK-32306][SQL][DOCS] Clarify the result of `percentile_approx()` add 779f0a8 [SPARK-32933][PYTHON] Use keyword-only syntax for keyword_only methods No new revisions were added by this update. Summary of changes: python/pyspark/ml/classification.py | 70 python/pyspark/ml/clustering.py | 40 ++--- python/pyspark/ml/evaluation.py | 48 +++--- python/pyspark/ml/feature.py| 332 ++-- python/pyspark/ml/fpm.py| 16 +- python/pyspark/ml/pipeline.py | 8 +- python/pyspark/ml/recommendation.py | 24 +-- python/pyspark/ml/regression.py | 64 +++ python/pyspark/ml/tuning.py | 24 +-- python/pyspark/sql/streaming.py | 2 +- 10 files changed, 318 insertions(+), 310 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7c14f17 -> 779f0a8)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7c14f17 [SPARK-32306][SQL][DOCS] Clarify the result of `percentile_approx()` add 779f0a8 [SPARK-32933][PYTHON] Use keyword-only syntax for keyword_only methods No new revisions were added by this update. Summary of changes: python/pyspark/ml/classification.py | 70 python/pyspark/ml/clustering.py | 40 ++--- python/pyspark/ml/evaluation.py | 48 +++--- python/pyspark/ml/feature.py| 332 ++-- python/pyspark/ml/fpm.py| 16 +- python/pyspark/ml/pipeline.py | 8 +- python/pyspark/ml/recommendation.py | 24 +-- python/pyspark/ml/regression.py | 64 +++ python/pyspark/ml/tuning.py | 24 +-- python/pyspark/sql/streaming.py | 2 +- 10 files changed, 318 insertions(+), 310 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (fba5736 -> 7c14f17)
This is an automated email from the ASF dual-hosted git repository. viirya pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from fba5736 [SPARK-32757][SQL][FOLLOWUP] Preserve the attribute name as possible as we scan in SubqueryBroadcastExec add 7c14f17 [SPARK-32306][SQL][DOCS] Clarify the result of `percentile_approx()` No new revisions were added by this update. Summary of changes: R/pkg/R/functions.R | 6 -- python/pyspark/sql/functions.py | 4 +++- .../expressions/aggregate/ApproximatePercentile.scala| 12 +++- sql/core/src/main/scala/org/apache/spark/sql/functions.scala | 5 +++-- 4 files changed, 17 insertions(+), 10 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (fba5736 -> 7c14f17)
This is an automated email from the ASF dual-hosted git repository. viirya pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from fba5736 [SPARK-32757][SQL][FOLLOWUP] Preserve the attribute name as possible as we scan in SubqueryBroadcastExec add 7c14f17 [SPARK-32306][SQL][DOCS] Clarify the result of `percentile_approx()` No new revisions were added by this update. Summary of changes: R/pkg/R/functions.R | 6 -- python/pyspark/sql/functions.py | 4 +++- .../expressions/aggregate/ApproximatePercentile.scala| 12 +++- sql/core/src/main/scala/org/apache/spark/sql/functions.scala | 5 +++-- 4 files changed, 17 insertions(+), 10 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (fba5736 -> 7c14f17)
This is an automated email from the ASF dual-hosted git repository. viirya pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from fba5736 [SPARK-32757][SQL][FOLLOWUP] Preserve the attribute name as possible as we scan in SubqueryBroadcastExec add 7c14f17 [SPARK-32306][SQL][DOCS] Clarify the result of `percentile_approx()` No new revisions were added by this update. Summary of changes: R/pkg/R/functions.R | 6 -- python/pyspark/sql/functions.py | 4 +++- .../expressions/aggregate/ApproximatePercentile.scala| 12 +++- sql/core/src/main/scala/org/apache/spark/sql/functions.scala | 5 +++-- 4 files changed, 17 insertions(+), 10 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (fba5736 -> 7c14f17)
This is an automated email from the ASF dual-hosted git repository. viirya pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from fba5736 [SPARK-32757][SQL][FOLLOWUP] Preserve the attribute name as possible as we scan in SubqueryBroadcastExec add 7c14f17 [SPARK-32306][SQL][DOCS] Clarify the result of `percentile_approx()` No new revisions were added by this update. Summary of changes: R/pkg/R/functions.R | 6 -- python/pyspark/sql/functions.py | 4 +++- .../expressions/aggregate/ApproximatePercentile.scala| 12 +++- sql/core/src/main/scala/org/apache/spark/sql/functions.scala | 5 +++-- 4 files changed, 17 insertions(+), 10 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (fba5736 -> 7c14f17)
This is an automated email from the ASF dual-hosted git repository. viirya pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from fba5736 [SPARK-32757][SQL][FOLLOWUP] Preserve the attribute name as possible as we scan in SubqueryBroadcastExec add 7c14f17 [SPARK-32306][SQL][DOCS] Clarify the result of `percentile_approx()` No new revisions were added by this update. Summary of changes: R/pkg/R/functions.R | 6 -- python/pyspark/sql/functions.py | 4 +++- .../expressions/aggregate/ApproximatePercentile.scala| 12 +++- sql/core/src/main/scala/org/apache/spark/sql/functions.scala | 5 +++-- 4 files changed, 17 insertions(+), 10 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (dd808457 -> fba5736)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from dd808457 [SPARK-32964][DSTREAMS] Pass all `streaming` module UTs in Scala 2.13 add fba5736 [SPARK-32757][SQL][FOLLOWUP] Preserve the attribute name as possible as we scan in SubqueryBroadcastExec No new revisions were added by this update. Summary of changes: .../apache/spark/sql/execution/SubqueryBroadcastExec.scala| 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (dd808457 -> fba5736)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from dd808457 [SPARK-32964][DSTREAMS] Pass all `streaming` module UTs in Scala 2.13 add fba5736 [SPARK-32757][SQL][FOLLOWUP] Preserve the attribute name as possible as we scan in SubqueryBroadcastExec No new revisions were added by this update. Summary of changes: .../apache/spark/sql/execution/SubqueryBroadcastExec.scala| 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (6145621 -> dd808457)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 6145621 [SPARK-32659][SQL][FOLLOWUP] Broadcast Array instead of Set in InSubqueryExec add dd808457 [SPARK-32964][DSTREAMS] Pass all `streaming` module UTs in Scala 2.13 No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/streaming/DStreamGraph.scala | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (dd808457 -> fba5736)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from dd808457 [SPARK-32964][DSTREAMS] Pass all `streaming` module UTs in Scala 2.13 add fba5736 [SPARK-32757][SQL][FOLLOWUP] Preserve the attribute name as possible as we scan in SubqueryBroadcastExec No new revisions were added by this update. Summary of changes: .../apache/spark/sql/execution/SubqueryBroadcastExec.scala| 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (6145621 -> dd808457)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 6145621 [SPARK-32659][SQL][FOLLOWUP] Broadcast Array instead of Set in InSubqueryExec add dd808457 [SPARK-32964][DSTREAMS] Pass all `streaming` module UTs in Scala 2.13 No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/streaming/DStreamGraph.scala | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32659][SQL][FOLLOWUP][3.0] Broadcast Array instead of Set in InSubqueryExec
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 8a481d8 [SPARK-32659][SQL][FOLLOWUP][3.0] Broadcast Array instead of Set in InSubqueryExec 8a481d8 is described below commit 8a481d8c336360bc5dfa518af70590251e5b61bc Author: Wenchen Fan AuthorDate: Tue Sep 22 10:58:33 2020 -0700 [SPARK-32659][SQL][FOLLOWUP][3.0] Broadcast Array instead of Set in InSubqueryExec ### What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/29475. This PR updates the code to broadcast the Array instead of Set, which was the behavior before #29475 ### Why are the changes needed? The size of Set can be much bigger than Array. It's safer to keep the behavior the same as before and build the set at the executor side. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? existing tests Closes #29840 from cloud-fan/backport. Authored-by: Wenchen Fan Signed-off-by: Dongjoon Hyun --- .../main/scala/org/apache/spark/sql/execution/subquery.scala | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala index 9d15c76..48d6210 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala @@ -23,7 +23,7 @@ import scala.collection.mutable.ArrayBuffer import org.apache.spark.broadcast.Broadcast import org.apache.spark.sql.SparkSession import org.apache.spark.sql.catalyst.{expressions, InternalRow} -import org.apache.spark.sql.catalyst.expressions.{AttributeSeq, CreateNamedStruct, Expression, ExprId, InSet, ListQuery, Literal, PlanExpression} +import org.apache.spark.sql.catalyst.expressions.{CreateNamedStruct, Expression, ExprId, InSet, ListQuery, Literal, PlanExpression} import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode} import org.apache.spark.sql.catalyst.rules.Rule import org.apache.spark.sql.internal.SQLConf @@ -114,10 +114,10 @@ case class InSubqueryExec( child: Expression, plan: BaseSubqueryExec, exprId: ExprId, -private var resultBroadcast: Broadcast[Set[Any]] = null) extends ExecSubqueryExpression { +private var resultBroadcast: Broadcast[Array[Any]] = null) extends ExecSubqueryExpression { - @transient private var result: Set[Any] = _ - @transient private lazy val inSet = InSet(child, result) + @transient private var result: Array[Any] = _ + @transient private lazy val inSet = InSet(child, result.toSet) override def dataType: DataType = BooleanType override def children: Seq[Expression] = child :: Nil @@ -132,11 +132,11 @@ case class InSubqueryExec( def updateResult(): Unit = { val rows = plan.executeCollect() -result = rows.map(_.get(0, child.dataType)).toSet +result = rows.map(_.get(0, child.dataType)) resultBroadcast = plan.sqlContext.sparkContext.broadcast(result) } - def values(): Option[Set[Any]] = Option(resultBroadcast).map(_.value) + def values(): Option[Array[Any]] = Option(resultBroadcast).map(_.value) private def prepareResult(): Unit = { require(resultBroadcast != null, s"$this has not finished") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (dd808457 -> fba5736)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from dd808457 [SPARK-32964][DSTREAMS] Pass all `streaming` module UTs in Scala 2.13 add fba5736 [SPARK-32757][SQL][FOLLOWUP] Preserve the attribute name as possible as we scan in SubqueryBroadcastExec No new revisions were added by this update. Summary of changes: .../apache/spark/sql/execution/SubqueryBroadcastExec.scala| 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (6145621 -> dd808457)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 6145621 [SPARK-32659][SQL][FOLLOWUP] Broadcast Array instead of Set in InSubqueryExec add dd808457 [SPARK-32964][DSTREAMS] Pass all `streaming` module UTs in Scala 2.13 No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/streaming/DStreamGraph.scala | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32659][SQL][FOLLOWUP][3.0] Broadcast Array instead of Set in InSubqueryExec
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 8a481d8 [SPARK-32659][SQL][FOLLOWUP][3.0] Broadcast Array instead of Set in InSubqueryExec 8a481d8 is described below commit 8a481d8c336360bc5dfa518af70590251e5b61bc Author: Wenchen Fan AuthorDate: Tue Sep 22 10:58:33 2020 -0700 [SPARK-32659][SQL][FOLLOWUP][3.0] Broadcast Array instead of Set in InSubqueryExec ### What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/29475. This PR updates the code to broadcast the Array instead of Set, which was the behavior before #29475 ### Why are the changes needed? The size of Set can be much bigger than Array. It's safer to keep the behavior the same as before and build the set at the executor side. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? existing tests Closes #29840 from cloud-fan/backport. Authored-by: Wenchen Fan Signed-off-by: Dongjoon Hyun --- .../main/scala/org/apache/spark/sql/execution/subquery.scala | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala index 9d15c76..48d6210 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala @@ -23,7 +23,7 @@ import scala.collection.mutable.ArrayBuffer import org.apache.spark.broadcast.Broadcast import org.apache.spark.sql.SparkSession import org.apache.spark.sql.catalyst.{expressions, InternalRow} -import org.apache.spark.sql.catalyst.expressions.{AttributeSeq, CreateNamedStruct, Expression, ExprId, InSet, ListQuery, Literal, PlanExpression} +import org.apache.spark.sql.catalyst.expressions.{CreateNamedStruct, Expression, ExprId, InSet, ListQuery, Literal, PlanExpression} import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode} import org.apache.spark.sql.catalyst.rules.Rule import org.apache.spark.sql.internal.SQLConf @@ -114,10 +114,10 @@ case class InSubqueryExec( child: Expression, plan: BaseSubqueryExec, exprId: ExprId, -private var resultBroadcast: Broadcast[Set[Any]] = null) extends ExecSubqueryExpression { +private var resultBroadcast: Broadcast[Array[Any]] = null) extends ExecSubqueryExpression { - @transient private var result: Set[Any] = _ - @transient private lazy val inSet = InSet(child, result) + @transient private var result: Array[Any] = _ + @transient private lazy val inSet = InSet(child, result.toSet) override def dataType: DataType = BooleanType override def children: Seq[Expression] = child :: Nil @@ -132,11 +132,11 @@ case class InSubqueryExec( def updateResult(): Unit = { val rows = plan.executeCollect() -result = rows.map(_.get(0, child.dataType)).toSet +result = rows.map(_.get(0, child.dataType)) resultBroadcast = plan.sqlContext.sparkContext.broadcast(result) } - def values(): Option[Set[Any]] = Option(resultBroadcast).map(_.value) + def values(): Option[Array[Any]] = Option(resultBroadcast).map(_.value) private def prepareResult(): Unit = { require(resultBroadcast != null, s"$this has not finished") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (dd808457 -> fba5736)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from dd808457 [SPARK-32964][DSTREAMS] Pass all `streaming` module UTs in Scala 2.13 add fba5736 [SPARK-32757][SQL][FOLLOWUP] Preserve the attribute name as possible as we scan in SubqueryBroadcastExec No new revisions were added by this update. Summary of changes: .../apache/spark/sql/execution/SubqueryBroadcastExec.scala| 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (6145621 -> dd808457)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 6145621 [SPARK-32659][SQL][FOLLOWUP] Broadcast Array instead of Set in InSubqueryExec add dd808457 [SPARK-32964][DSTREAMS] Pass all `streaming` module UTs in Scala 2.13 No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/streaming/DStreamGraph.scala | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32659][SQL][FOLLOWUP][3.0] Broadcast Array instead of Set in InSubqueryExec
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 8a481d8 [SPARK-32659][SQL][FOLLOWUP][3.0] Broadcast Array instead of Set in InSubqueryExec 8a481d8 is described below commit 8a481d8c336360bc5dfa518af70590251e5b61bc Author: Wenchen Fan AuthorDate: Tue Sep 22 10:58:33 2020 -0700 [SPARK-32659][SQL][FOLLOWUP][3.0] Broadcast Array instead of Set in InSubqueryExec ### What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/29475. This PR updates the code to broadcast the Array instead of Set, which was the behavior before #29475 ### Why are the changes needed? The size of Set can be much bigger than Array. It's safer to keep the behavior the same as before and build the set at the executor side. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? existing tests Closes #29840 from cloud-fan/backport. Authored-by: Wenchen Fan Signed-off-by: Dongjoon Hyun --- .../main/scala/org/apache/spark/sql/execution/subquery.scala | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala index 9d15c76..48d6210 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala @@ -23,7 +23,7 @@ import scala.collection.mutable.ArrayBuffer import org.apache.spark.broadcast.Broadcast import org.apache.spark.sql.SparkSession import org.apache.spark.sql.catalyst.{expressions, InternalRow} -import org.apache.spark.sql.catalyst.expressions.{AttributeSeq, CreateNamedStruct, Expression, ExprId, InSet, ListQuery, Literal, PlanExpression} +import org.apache.spark.sql.catalyst.expressions.{CreateNamedStruct, Expression, ExprId, InSet, ListQuery, Literal, PlanExpression} import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode} import org.apache.spark.sql.catalyst.rules.Rule import org.apache.spark.sql.internal.SQLConf @@ -114,10 +114,10 @@ case class InSubqueryExec( child: Expression, plan: BaseSubqueryExec, exprId: ExprId, -private var resultBroadcast: Broadcast[Set[Any]] = null) extends ExecSubqueryExpression { +private var resultBroadcast: Broadcast[Array[Any]] = null) extends ExecSubqueryExpression { - @transient private var result: Set[Any] = _ - @transient private lazy val inSet = InSet(child, result) + @transient private var result: Array[Any] = _ + @transient private lazy val inSet = InSet(child, result.toSet) override def dataType: DataType = BooleanType override def children: Seq[Expression] = child :: Nil @@ -132,11 +132,11 @@ case class InSubqueryExec( def updateResult(): Unit = { val rows = plan.executeCollect() -result = rows.map(_.get(0, child.dataType)).toSet +result = rows.map(_.get(0, child.dataType)) resultBroadcast = plan.sqlContext.sparkContext.broadcast(result) } - def values(): Option[Set[Any]] = Option(resultBroadcast).map(_.value) + def values(): Option[Array[Any]] = Option(resultBroadcast).map(_.value) private def prepareResult(): Unit = { require(resultBroadcast != null, s"$this has not finished") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (6145621 -> dd808457)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 6145621 [SPARK-32659][SQL][FOLLOWUP] Broadcast Array instead of Set in InSubqueryExec add dd808457 [SPARK-32964][DSTREAMS] Pass all `streaming` module UTs in Scala 2.13 No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/streaming/DStreamGraph.scala | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32659][SQL][FOLLOWUP][3.0] Broadcast Array instead of Set in InSubqueryExec
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 8a481d8 [SPARK-32659][SQL][FOLLOWUP][3.0] Broadcast Array instead of Set in InSubqueryExec 8a481d8 is described below commit 8a481d8c336360bc5dfa518af70590251e5b61bc Author: Wenchen Fan AuthorDate: Tue Sep 22 10:58:33 2020 -0700 [SPARK-32659][SQL][FOLLOWUP][3.0] Broadcast Array instead of Set in InSubqueryExec ### What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/29475. This PR updates the code to broadcast the Array instead of Set, which was the behavior before #29475 ### Why are the changes needed? The size of Set can be much bigger than Array. It's safer to keep the behavior the same as before and build the set at the executor side. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? existing tests Closes #29840 from cloud-fan/backport. Authored-by: Wenchen Fan Signed-off-by: Dongjoon Hyun --- .../main/scala/org/apache/spark/sql/execution/subquery.scala | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala index 9d15c76..48d6210 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala @@ -23,7 +23,7 @@ import scala.collection.mutable.ArrayBuffer import org.apache.spark.broadcast.Broadcast import org.apache.spark.sql.SparkSession import org.apache.spark.sql.catalyst.{expressions, InternalRow} -import org.apache.spark.sql.catalyst.expressions.{AttributeSeq, CreateNamedStruct, Expression, ExprId, InSet, ListQuery, Literal, PlanExpression} +import org.apache.spark.sql.catalyst.expressions.{CreateNamedStruct, Expression, ExprId, InSet, ListQuery, Literal, PlanExpression} import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode} import org.apache.spark.sql.catalyst.rules.Rule import org.apache.spark.sql.internal.SQLConf @@ -114,10 +114,10 @@ case class InSubqueryExec( child: Expression, plan: BaseSubqueryExec, exprId: ExprId, -private var resultBroadcast: Broadcast[Set[Any]] = null) extends ExecSubqueryExpression { +private var resultBroadcast: Broadcast[Array[Any]] = null) extends ExecSubqueryExpression { - @transient private var result: Set[Any] = _ - @transient private lazy val inSet = InSet(child, result) + @transient private var result: Array[Any] = _ + @transient private lazy val inSet = InSet(child, result.toSet) override def dataType: DataType = BooleanType override def children: Seq[Expression] = child :: Nil @@ -132,11 +132,11 @@ case class InSubqueryExec( def updateResult(): Unit = { val rows = plan.executeCollect() -result = rows.map(_.get(0, child.dataType)).toSet +result = rows.map(_.get(0, child.dataType)) resultBroadcast = plan.sqlContext.sparkContext.broadcast(result) } - def values(): Option[Set[Any]] = Option(resultBroadcast).map(_.value) + def values(): Option[Array[Any]] = Option(resultBroadcast).map(_.value) private def prepareResult(): Unit = { require(resultBroadcast != null, s"$this has not finished") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-32659][SQL][FOLLOWUP][3.0] Broadcast Array instead of Set in InSubqueryExec
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 8a481d8 [SPARK-32659][SQL][FOLLOWUP][3.0] Broadcast Array instead of Set in InSubqueryExec 8a481d8 is described below commit 8a481d8c336360bc5dfa518af70590251e5b61bc Author: Wenchen Fan AuthorDate: Tue Sep 22 10:58:33 2020 -0700 [SPARK-32659][SQL][FOLLOWUP][3.0] Broadcast Array instead of Set in InSubqueryExec ### What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/29475. This PR updates the code to broadcast the Array instead of Set, which was the behavior before #29475 ### Why are the changes needed? The size of Set can be much bigger than Array. It's safer to keep the behavior the same as before and build the set at the executor side. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? existing tests Closes #29840 from cloud-fan/backport. Authored-by: Wenchen Fan Signed-off-by: Dongjoon Hyun --- .../main/scala/org/apache/spark/sql/execution/subquery.scala | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala index 9d15c76..48d6210 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala @@ -23,7 +23,7 @@ import scala.collection.mutable.ArrayBuffer import org.apache.spark.broadcast.Broadcast import org.apache.spark.sql.SparkSession import org.apache.spark.sql.catalyst.{expressions, InternalRow} -import org.apache.spark.sql.catalyst.expressions.{AttributeSeq, CreateNamedStruct, Expression, ExprId, InSet, ListQuery, Literal, PlanExpression} +import org.apache.spark.sql.catalyst.expressions.{CreateNamedStruct, Expression, ExprId, InSet, ListQuery, Literal, PlanExpression} import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode} import org.apache.spark.sql.catalyst.rules.Rule import org.apache.spark.sql.internal.SQLConf @@ -114,10 +114,10 @@ case class InSubqueryExec( child: Expression, plan: BaseSubqueryExec, exprId: ExprId, -private var resultBroadcast: Broadcast[Set[Any]] = null) extends ExecSubqueryExpression { +private var resultBroadcast: Broadcast[Array[Any]] = null) extends ExecSubqueryExpression { - @transient private var result: Set[Any] = _ - @transient private lazy val inSet = InSet(child, result) + @transient private var result: Array[Any] = _ + @transient private lazy val inSet = InSet(child, result.toSet) override def dataType: DataType = BooleanType override def children: Seq[Expression] = child :: Nil @@ -132,11 +132,11 @@ case class InSubqueryExec( def updateResult(): Unit = { val rows = plan.executeCollect() -result = rows.map(_.get(0, child.dataType)).toSet +result = rows.map(_.get(0, child.dataType)) resultBroadcast = plan.sqlContext.sparkContext.broadcast(result) } - def values(): Option[Set[Any]] = Option(resultBroadcast).map(_.value) + def values(): Option[Array[Any]] = Option(resultBroadcast).map(_.value) private def prepareResult(): Unit = { require(resultBroadcast != null, s"$this has not finished") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (790d9ef2d -> 6145621)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 790d9ef2d [SPARK-32955][DOCS] An item in the navigation bar in the WebUI has a wrong link add 6145621 [SPARK-32659][SQL][FOLLOWUP] Broadcast Array instead of Set in InSubqueryExec No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/execution/subquery.scala| 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (790d9ef2d -> 6145621)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 790d9ef2d [SPARK-32955][DOCS] An item in the navigation bar in the WebUI has a wrong link add 6145621 [SPARK-32659][SQL][FOLLOWUP] Broadcast Array instead of Set in InSubqueryExec No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/execution/subquery.scala| 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (790d9ef2d -> 6145621)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 790d9ef2d [SPARK-32955][DOCS] An item in the navigation bar in the WebUI has a wrong link add 6145621 [SPARK-32659][SQL][FOLLOWUP] Broadcast Array instead of Set in InSubqueryExec No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/execution/subquery.scala| 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (790d9ef2d -> 6145621)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 790d9ef2d [SPARK-32955][DOCS] An item in the navigation bar in the WebUI has a wrong link add 6145621 [SPARK-32659][SQL][FOLLOWUP] Broadcast Array instead of Set in InSubqueryExec No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/execution/subquery.scala| 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (790d9ef2d -> 6145621)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 790d9ef2d [SPARK-32955][DOCS] An item in the navigation bar in the WebUI has a wrong link add 6145621 [SPARK-32659][SQL][FOLLOWUP] Broadcast Array instead of Set in InSubqueryExec No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/execution/subquery.scala| 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-32898][2.4][CORE] Fix wrong executorRunTime when task killed before real start
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new e1e94ed [SPARK-32898][2.4][CORE] Fix wrong executorRunTime when task killed before real start e1e94ed is described below commit e1e94ed4ef45ef81814f1b920bac0afa52ae06a2 Author: yi.wu AuthorDate: Mon Sep 21 23:20:18 2020 -0700 [SPARK-32898][2.4][CORE] Fix wrong executorRunTime when task killed before real start ### What changes were proposed in this pull request? Only calculate the executorRunTime when taskStartTime > 0. Otherwise, set executorRunTime to 0. ### Why are the changes needed? bug fix. It's possible that a task be killed (e.g., by another successful attempt) before it reaches "taskStartTime = System.currentTimeMillis()". In this case, taskStartTime is still 0 since it hasn't been really initialized. And we will get the wrong executorRunTime by calculating System.currentTimeMillis() - taskStartTime. ### Does this PR introduce _any_ user-facing change? Yes, users will see the correct executorRunTime. ### How was this patch tested? Pass existing tests. Closes #29832 from Ngone51/backport-spark-3289. Authored-by: yi.wu Signed-off-by: Dongjoon Hyun --- core/src/main/scala/org/apache/spark/executor/Executor.scala | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/core/src/main/scala/org/apache/spark/executor/Executor.scala b/core/src/main/scala/org/apache/spark/executor/Executor.scala index f7ff0b8..fe57b1c 100644 --- a/core/src/main/scala/org/apache/spark/executor/Executor.scala +++ b/core/src/main/scala/org/apache/spark/executor/Executor.scala @@ -337,7 +337,10 @@ private[spark] class Executor( private def collectAccumulatorsAndResetStatusOnFailure(taskStartTime: Long) = { // Report executor runtime and JVM gc time Option(task).foreach(t => { -t.metrics.setExecutorRunTime(System.currentTimeMillis() - taskStartTime) +t.metrics.setExecutorRunTime( + // SPARK-32898: it's possible that a task is killed when taskStartTime has the initial + // value(=0) still. In this case, the executorRunTime should be considered as 0. + if (taskStartTime > 0) System.currentTimeMillis() - taskStartTime else 0) t.metrics.setJvmGCTime(computeTotalGcTime() - startGCTime) }) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-32898][2.4][CORE] Fix wrong executorRunTime when task killed before real start
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new e1e94ed [SPARK-32898][2.4][CORE] Fix wrong executorRunTime when task killed before real start e1e94ed is described below commit e1e94ed4ef45ef81814f1b920bac0afa52ae06a2 Author: yi.wu AuthorDate: Mon Sep 21 23:20:18 2020 -0700 [SPARK-32898][2.4][CORE] Fix wrong executorRunTime when task killed before real start ### What changes were proposed in this pull request? Only calculate the executorRunTime when taskStartTime > 0. Otherwise, set executorRunTime to 0. ### Why are the changes needed? bug fix. It's possible that a task be killed (e.g., by another successful attempt) before it reaches "taskStartTime = System.currentTimeMillis()". In this case, taskStartTime is still 0 since it hasn't been really initialized. And we will get the wrong executorRunTime by calculating System.currentTimeMillis() - taskStartTime. ### Does this PR introduce _any_ user-facing change? Yes, users will see the correct executorRunTime. ### How was this patch tested? Pass existing tests. Closes #29832 from Ngone51/backport-spark-3289. Authored-by: yi.wu Signed-off-by: Dongjoon Hyun --- core/src/main/scala/org/apache/spark/executor/Executor.scala | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/core/src/main/scala/org/apache/spark/executor/Executor.scala b/core/src/main/scala/org/apache/spark/executor/Executor.scala index f7ff0b8..fe57b1c 100644 --- a/core/src/main/scala/org/apache/spark/executor/Executor.scala +++ b/core/src/main/scala/org/apache/spark/executor/Executor.scala @@ -337,7 +337,10 @@ private[spark] class Executor( private def collectAccumulatorsAndResetStatusOnFailure(taskStartTime: Long) = { // Report executor runtime and JVM gc time Option(task).foreach(t => { -t.metrics.setExecutorRunTime(System.currentTimeMillis() - taskStartTime) +t.metrics.setExecutorRunTime( + // SPARK-32898: it's possible that a task is killed when taskStartTime has the initial + // value(=0) still. In this case, the executorRunTime should be considered as 0. + if (taskStartTime > 0) System.currentTimeMillis() - taskStartTime else 0) t.metrics.setJvmGCTime(computeTotalGcTime() - startGCTime) }) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-32898][2.4][CORE] Fix wrong executorRunTime when task killed before real start
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new e1e94ed [SPARK-32898][2.4][CORE] Fix wrong executorRunTime when task killed before real start e1e94ed is described below commit e1e94ed4ef45ef81814f1b920bac0afa52ae06a2 Author: yi.wu AuthorDate: Mon Sep 21 23:20:18 2020 -0700 [SPARK-32898][2.4][CORE] Fix wrong executorRunTime when task killed before real start ### What changes were proposed in this pull request? Only calculate the executorRunTime when taskStartTime > 0. Otherwise, set executorRunTime to 0. ### Why are the changes needed? bug fix. It's possible that a task be killed (e.g., by another successful attempt) before it reaches "taskStartTime = System.currentTimeMillis()". In this case, taskStartTime is still 0 since it hasn't been really initialized. And we will get the wrong executorRunTime by calculating System.currentTimeMillis() - taskStartTime. ### Does this PR introduce _any_ user-facing change? Yes, users will see the correct executorRunTime. ### How was this patch tested? Pass existing tests. Closes #29832 from Ngone51/backport-spark-3289. Authored-by: yi.wu Signed-off-by: Dongjoon Hyun --- core/src/main/scala/org/apache/spark/executor/Executor.scala | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/core/src/main/scala/org/apache/spark/executor/Executor.scala b/core/src/main/scala/org/apache/spark/executor/Executor.scala index f7ff0b8..fe57b1c 100644 --- a/core/src/main/scala/org/apache/spark/executor/Executor.scala +++ b/core/src/main/scala/org/apache/spark/executor/Executor.scala @@ -337,7 +337,10 @@ private[spark] class Executor( private def collectAccumulatorsAndResetStatusOnFailure(taskStartTime: Long) = { // Report executor runtime and JVM gc time Option(task).foreach(t => { -t.metrics.setExecutorRunTime(System.currentTimeMillis() - taskStartTime) +t.metrics.setExecutorRunTime( + // SPARK-32898: it's possible that a task is killed when taskStartTime has the initial + // value(=0) still. In this case, the executorRunTime should be considered as 0. + if (taskStartTime > 0) System.currentTimeMillis() - taskStartTime else 0) t.metrics.setJvmGCTime(computeTotalGcTime() - startGCTime) }) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-32898][2.4][CORE] Fix wrong executorRunTime when task killed before real start
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new e1e94ed [SPARK-32898][2.4][CORE] Fix wrong executorRunTime when task killed before real start e1e94ed is described below commit e1e94ed4ef45ef81814f1b920bac0afa52ae06a2 Author: yi.wu AuthorDate: Mon Sep 21 23:20:18 2020 -0700 [SPARK-32898][2.4][CORE] Fix wrong executorRunTime when task killed before real start ### What changes were proposed in this pull request? Only calculate the executorRunTime when taskStartTime > 0. Otherwise, set executorRunTime to 0. ### Why are the changes needed? bug fix. It's possible that a task be killed (e.g., by another successful attempt) before it reaches "taskStartTime = System.currentTimeMillis()". In this case, taskStartTime is still 0 since it hasn't been really initialized. And we will get the wrong executorRunTime by calculating System.currentTimeMillis() - taskStartTime. ### Does this PR introduce _any_ user-facing change? Yes, users will see the correct executorRunTime. ### How was this patch tested? Pass existing tests. Closes #29832 from Ngone51/backport-spark-3289. Authored-by: yi.wu Signed-off-by: Dongjoon Hyun --- core/src/main/scala/org/apache/spark/executor/Executor.scala | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/core/src/main/scala/org/apache/spark/executor/Executor.scala b/core/src/main/scala/org/apache/spark/executor/Executor.scala index f7ff0b8..fe57b1c 100644 --- a/core/src/main/scala/org/apache/spark/executor/Executor.scala +++ b/core/src/main/scala/org/apache/spark/executor/Executor.scala @@ -337,7 +337,10 @@ private[spark] class Executor( private def collectAccumulatorsAndResetStatusOnFailure(taskStartTime: Long) = { // Report executor runtime and JVM gc time Option(task).foreach(t => { -t.metrics.setExecutorRunTime(System.currentTimeMillis() - taskStartTime) +t.metrics.setExecutorRunTime( + // SPARK-32898: it's possible that a task is killed when taskStartTime has the initial + // value(=0) still. In this case, the executorRunTime should be considered as 0. + if (taskStartTime > 0) System.currentTimeMillis() - taskStartTime else 0) t.metrics.setJvmGCTime(computeTotalGcTime() - startGCTime) }) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-32898][2.4][CORE] Fix wrong executorRunTime when task killed before real start
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new e1e94ed [SPARK-32898][2.4][CORE] Fix wrong executorRunTime when task killed before real start e1e94ed is described below commit e1e94ed4ef45ef81814f1b920bac0afa52ae06a2 Author: yi.wu AuthorDate: Mon Sep 21 23:20:18 2020 -0700 [SPARK-32898][2.4][CORE] Fix wrong executorRunTime when task killed before real start ### What changes were proposed in this pull request? Only calculate the executorRunTime when taskStartTime > 0. Otherwise, set executorRunTime to 0. ### Why are the changes needed? bug fix. It's possible that a task be killed (e.g., by another successful attempt) before it reaches "taskStartTime = System.currentTimeMillis()". In this case, taskStartTime is still 0 since it hasn't been really initialized. And we will get the wrong executorRunTime by calculating System.currentTimeMillis() - taskStartTime. ### Does this PR introduce _any_ user-facing change? Yes, users will see the correct executorRunTime. ### How was this patch tested? Pass existing tests. Closes #29832 from Ngone51/backport-spark-3289. Authored-by: yi.wu Signed-off-by: Dongjoon Hyun --- core/src/main/scala/org/apache/spark/executor/Executor.scala | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/core/src/main/scala/org/apache/spark/executor/Executor.scala b/core/src/main/scala/org/apache/spark/executor/Executor.scala index f7ff0b8..fe57b1c 100644 --- a/core/src/main/scala/org/apache/spark/executor/Executor.scala +++ b/core/src/main/scala/org/apache/spark/executor/Executor.scala @@ -337,7 +337,10 @@ private[spark] class Executor( private def collectAccumulatorsAndResetStatusOnFailure(taskStartTime: Long) = { // Report executor runtime and JVM gc time Option(task).foreach(t => { -t.metrics.setExecutorRunTime(System.currentTimeMillis() - taskStartTime) +t.metrics.setExecutorRunTime( + // SPARK-32898: it's possible that a task is killed when taskStartTime has the initial + // value(=0) still. In this case, the executorRunTime should be considered as 0. + if (taskStartTime > 0) System.currentTimeMillis() - taskStartTime else 0) t.metrics.setJvmGCTime(computeTotalGcTime() - startGCTime) }) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3118c22 -> 790d9ef2d)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3118c22 [SPARK-32949][R][SQL] Add timestamp_seconds to SparkR add 790d9ef2d [SPARK-32955][DOCS] An item in the navigation bar in the WebUI has a wrong link No new revisions were added by this update. Summary of changes: docs/_layouts/global.html | 2 +- docs/api.md | 27 --- 2 files changed, 1 insertion(+), 28 deletions(-) delete mode 100644 docs/api.md - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org