[spark] branch master updated (747fe72 -> 3b859a1)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 747fe72 [SPARK-35419][PYTHON] Enable spark.sql.execution.pyspark.udf.simplifiedTraceback.enabled by default add 3b859a1 [SPARK-35431][SQL][TESTS] Sort elements generated by collect_set in SQLQueryTestSuite No new revisions were added by this update. Summary of changes: .../inputs/subquery/scalar-subquery/scalar-subquery-select.sql| 2 +- .../results/subquery/scalar-subquery/scalar-subquery-select.sql.out | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] viirya commented on pull request #342: Update website for 2.4.8 release
viirya commented on pull request #342: URL: https://github.com/apache/spark-website/pull/342#issuecomment-842803660 Thanks @maropu @HyukjinKwon @dongjoon-hyun! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] dongjoon-hyun merged pull request #342: Update website for 2.4.8 release
dongjoon-hyun merged pull request #342: URL: https://github.com/apache/spark-website/pull/342 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a60c364 -> 747fe72)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a60c364 [SPARK-34981][SQL][TESTS][FOLLOWUP] Fix test failure under Scala 2.13 add 747fe72 [SPARK-35419][PYTHON] Enable spark.sql.execution.pyspark.udf.simplifiedTraceback.enabled by default No new revisions were added by this update. Summary of changes: python/docs/source/migration_guide/pyspark_3.1_to_3.2.rst | 2 ++ sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] viirya commented on pull request #341: Update release process
viirya commented on pull request #341: URL: https://github.com/apache/spark-website/pull/341#issuecomment-842770942 Thank you @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] HyukjinKwon commented on pull request #341: Update release process
HyukjinKwon commented on pull request #341: URL: https://github.com/apache/spark-website/pull/341#issuecomment-842769465 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] viirya commented on pull request #342: Update website for 2.4.8 release
viirya commented on pull request #342: URL: https://github.com/apache/spark-website/pull/342#issuecomment-842753706 cc @dongjoon-hyun @maropu @srowen @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] viirya opened a new pull request #342: Update website for 2.4.8 release
viirya opened a new pull request #342: URL: https://github.com/apache/spark-website/pull/342 Update website for 2.4.8 release: Added: releases/_posts/2021-05-17-spark-release-2-4-8.md news/_posts/2021-05-17-spark-2-4-8-released.md Run `bundle exec jekyll build` to update html files. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] viirya commented on pull request #341: Update release process
viirya commented on pull request #341: URL: https://github.com/apache/spark-website/pull/341#issuecomment-842706959 Thanks @dongjoon-hyun @srowen @maropu -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] maropu commented on pull request #341: Update release process
maropu commented on pull request #341: URL: https://github.com/apache/spark-website/pull/341#issuecomment-842706441 lgtm, too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-website] branch asf-site updated: Update release process (#341)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new 9dc8dc6 Update release process (#341) 9dc8dc6 is described below commit 9dc8dc6670e1313d2925b30eb5e7cf6d03a70e68 Author: Liang-Chi Hsieh AuthorDate: Mon May 17 16:18:24 2021 -0700 Update release process (#341) --- release-process.md| 13 - site/release-process.html | 13 - 2 files changed, 26 deletions(-) diff --git a/release-process.md b/release-process.md index 0e5e5db..72dcdb4 100644 --- a/release-process.md +++ b/release-process.md @@ -203,10 +203,6 @@ $ svn rm https://dist.apache.org/repos/dist/release/spark/spark-1.1.0 You will also need to update `js/download.js` to indicate the release is not mirrored anymore, so that the correct links are generated on the site. -Also take a moment to check `HiveExternalCatalogVersionsSuite.scala` starting with branch-2.2 -and see if it needs to be adjusted, since that test relies on mirrored downloads of previous -releases. - Update the Spark Apache Repository @@ -317,15 +313,6 @@ $ git shortlog v1.1.1 --grep "$EXPR" > contrib.txt $ git log v1.1.1 --grep "$expr" --shortstat --oneline | grep -B 1 -e "[3-9][0-9][0-9] insert" -e "[1-9][1-9][1-9][1-9] insert" | grep SPARK > large-patches.txt ``` -Update `HiveExternalCatalogVersionsSuite` - -When a new release occurs, `PROCESS_TABLES.testingVersions` in `HiveExternalCatalogVersionsSuite` -must be updated shortly thereafter. This list should contain the latest release in all active -maintenance branches, and no more. -For example, as of this writing, it has value `val testingVersions = Seq("2.1.3", "2.2.2", "2.3.2")`. -"2.4.0" will be added to the list when it's released. "2.1.3" will be removed (and removed from the Spark dist mirrors) -when the branch is no longer maintained. "2.3.2" will become "2.3.3" when "2.3.3" is released. - Create an Announcement Once everything is working (website docs, website changes) create an announcement on the website diff --git a/site/release-process.html b/site/release-process.html index 860abec..87acebe 100644 --- a/site/release-process.html +++ b/site/release-process.html @@ -398,10 +398,6 @@ To delete older versions simply use svn rm: You will also need to update js/download.js to indicate the release is not mirrored anymore, so that the correct links are generated on the site. -Also take a moment to check HiveExternalCatalogVersionsSuite.scala starting with branch-2.2 -and see if it needs to be adjusted, since that test relies on mirrored downloads of previous -releases. - Update the Spark Apache Repository Check out the tagged commit for the release candidate that passed and apply the correct version tag. @@ -508,15 +504,6 @@ $ git shortlog v1.1.1 --grep "$EXPR" contrib.txt $ git log v1.1.1 --grep "$expr" --shortstat --oneline | grep -B 1 -e "[3-9][0-9][0-9] insert" -e "[1-9][1-9][1-9][1-9] insert" | grep SPARK large-patches.txt -Update `HiveExternalCatalogVersionsSuite` - -When a new release occurs, PROCESS_TABLES.testingVersions in HiveExternalCatalogVersionsSuite -must be updated shortly thereafter. This list should contain the latest release in all active -maintenance branches, and no more. -For example, as of this writing, it has value val testingVersions = Seq("2.1.3", "2.2.2", "2.3.2"). -2.4.0 will be added to the list when its released. 2.1.3 will be removed (and removed from the Spark dist mirrors) -when the branch is no longer maintained. 2.3.2 will become 2.3.3 when 2.3.3 is released. - Create an Announcement Once everything is working (website docs, website changes) create an announcement on the website - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] dongjoon-hyun merged pull request #341: Update release process
dongjoon-hyun merged pull request #341: URL: https://github.com/apache/spark-website/pull/341 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (2a335f2 -> a60c364)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2a335f2 [SPARK-34941][PYTHON] Fix mypy errors and enable mypy check for pandas-on-Spark add a60c364 [SPARK-34981][SQL][TESTS][FOLLOWUP] Fix test failure under Scala 2.13 No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/connector/DataSourceV2FunctionSuite.scala | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] viirya commented on a change in pull request #341: Update release process
viirya commented on a change in pull request #341: URL: https://github.com/apache/spark-website/pull/341#discussion_r633888912 ## File path: site/sitemap.xml ## @@ -876,27 +876,27 @@ weekly - https://spark.apache.org/screencasts/ + https://spark.apache.org/graphx/ Review comment: Yea, let me revert it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] dongjoon-hyun commented on pull request #341: Update release process
dongjoon-hyun commented on pull request #341: URL: https://github.com/apache/spark-website/pull/341#issuecomment-842650705 +1, LGTM (except the above comment on site/sitemap.xml). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] dongjoon-hyun commented on a change in pull request #341: Update release process
dongjoon-hyun commented on a change in pull request #341: URL: https://github.com/apache/spark-website/pull/341#discussion_r633878110 ## File path: site/sitemap.xml ## @@ -876,27 +876,27 @@ weekly - https://spark.apache.org/screencasts/ + https://spark.apache.org/graphx/ Review comment: Yes, +1 for reverting this file change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] srowen commented on a change in pull request #341: Update release process
srowen commented on a change in pull request #341: URL: https://github.com/apache/spark-website/pull/341#discussion_r633877163 ## File path: site/sitemap.xml ## @@ -876,27 +876,27 @@ weekly - https://spark.apache.org/screencasts/ + https://spark.apache.org/graphx/ Review comment: You could revert this change, but I'm not sure which version is 'right' - which one is the result as of the latest site generation tools. Rest is OK. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] viirya commented on pull request #341: Update release process
viirya commented on pull request #341: URL: https://github.com/apache/spark-website/pull/341#issuecomment-842637559 cc @dongjoon-hyun @srowen @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] viirya opened a new pull request #341: Update release process
viirya opened a new pull request #341: URL: https://github.com/apache/spark-website/pull/341 `HiveExternalCatalogVersionsSuite` now doesn't need to manually update for `testingVersions`. It automatically gets the latest releases now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] viirya commented on pull request #340: Add docs for Apache Spark 2.4.8
viirya commented on pull request #340: URL: https://github.com/apache/spark-website/pull/340#issuecomment-841721564 Thanks @srowen @dongjoon-hyun! Merging to asf-site. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] viirya closed pull request #340: Add docs for Apache Spark 2.4.8
viirya closed pull request #340: URL: https://github.com/apache/spark-website/pull/340 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] HyukjinKwon commented on pull request #340: Add docs for Apache Spark 2.4.8
HyukjinKwon commented on pull request #340: URL: https://github.com/apache/spark-website/pull/340#issuecomment-841743729 Awesome! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3a3f8ca -> 2a335f2)
This is an automated email from the ASF dual-hosted git repository. ueshin pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3a3f8ca [SPARK-35359][SQL] Insert data with char/varchar datatype will fail when data length exceed length limitation add 2a335f2 [SPARK-34941][PYTHON] Fix mypy errors and enable mypy check for pandas-on-Spark No new revisions were added by this update. Summary of changes: python/mypy.ini| 18 +-- python/pyspark/pandas/accessors.py | 26 - python/pyspark/pandas/base.py | 32 +-- python/pyspark/pandas/frame.py | 60 ++--- python/pyspark/pandas/generic.py | 5 +- python/pyspark/pandas/groupby.py | 45 python/pyspark/pandas/indexes/base.py | 37 +++-- python/pyspark/pandas/indexing.py | 26 + python/pyspark/pandas/internal.py | 66 --- python/pyspark/pandas/ml.py| 4 +- python/pyspark/pandas/namespace.py | 7 ++- python/pyspark/pandas/numpy_compat.py | 85 ++ python/pyspark/pandas/series.py| 4 +- python/pyspark/pandas/spark/accessors.py | 3 +- python/pyspark/pandas/spark/functions.py | 2 +- python/pyspark/pandas/spark/utils.py | 62 -- python/pyspark/pandas/sql_processor.py | 4 +- python/pyspark/pandas/strings.py | 6 +-- python/pyspark/pandas/typedef/typehints.py | 6 +-- python/pyspark/pandas/utils.py | 16 +- 20 files changed, 303 insertions(+), 211 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated: [SPARK-35359][SQL] Insert data with char/varchar datatype will fail when data length exceed length limitation
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new aa5c72f [SPARK-35359][SQL] Insert data with char/varchar datatype will fail when data length exceed length limitation aa5c72f is described below commit aa5c72f8caa5d63a92dcd28fcb263682a3f0e250 Author: fhygh <283452...@qq.com> AuthorDate: Tue May 18 00:13:40 2021 +0800 [SPARK-35359][SQL] Insert data with char/varchar datatype will fail when data length exceed length limitation ### What changes were proposed in this pull request? This PR is used to fix this bug: ``` set spark.sql.legacy.charVarcharAsString=true; create table chartb01(a char(3)); insert into chartb01 select 'a'; ``` here we expect the data of table chartb01 is 'aaa', but it runs failed. ### Why are the changes needed? Improve backward compatibility ``` spark-sql> > create table tchar01(col char(2)) using parquet; Time taken: 0.767 seconds spark-sql> > insert into tchar01 select 'aaa'; ERROR | Executor task launch worker for task 0.0 in stage 0.0 (TID 0) | Aborting task | org.apache.spark.util.Utils.logError(Logging.scala:94) java.lang.RuntimeException: Exceeds char/varchar type length limitation: 2 at org.apache.spark.sql.catalyst.util.CharVarcharCodegenUtils.trimTrailingSpaces(CharVarcharCodegenUtils.java:31) at org.apache.spark.sql.catalyst.util.CharVarcharCodegenUtils.charTypeWriteSideCheck(CharVarcharCodegenUtils.java:44) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:279) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1500) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:288) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$15(FileFormatWriter.scala:212) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:131) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1466) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ``` ### Does this PR introduce _any_ user-facing change? No (the legacy config is false by default). ### How was this patch tested? Added unit tests. Closes #32501 from fhygh/master. Authored-by: fhygh <283452...@qq.com> Signed-off-by: Wenchen Fan (cherry picked from commit 3a3f8ca6f421b9bc51e0059c954262489aa41f5d) Signed-off-by: Wenchen Fan --- .../catalyst/analysis/TableOutputResolver.scala| 6 +++- .../apache/spark/sql/util/PartitioningUtils.scala | 36 -- .../apache/spark/sql/CharVarcharTestSuite.scala| 12 3 files changed, 36 insertions(+), 18 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala index d5c407b..32bdb82 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala @@ -100,7 +100,11 @@ object TableOutputResolver { case _ => Cast(queryExpr, tableAttr.dataType, Option(conf.sessionLocalTimeZone)) } - val exprWithStrLenCheck = CharVarcharUtils.stringLengthCheck(casted, tableAttr) + val exprWithStrLenCheck = if (conf.charVarcharAsString) { +casted + } else { +CharVarcharUtils.stringLengthCheck(casted, tableAttr) + } // Renaming is needed for handling the following cases like // 1) Column names/types do not match, e.g.,
[spark] branch master updated (3b63f32 -> 3a3f8ca)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3b63f32 [SPARK-35400][SQL] Simplify getOuterReferences and improve error message for correlated subquery add 3a3f8ca [SPARK-35359][SQL] Insert data with char/varchar datatype will fail when data length exceed length limitation No new revisions were added by this update. Summary of changes: .../catalyst/analysis/TableOutputResolver.scala| 6 +++- .../apache/spark/sql/util/PartitioningUtils.scala | 36 -- .../apache/spark/sql/CharVarcharTestSuite.scala| 12 3 files changed, 36 insertions(+), 18 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ceb8122 -> 3b63f32)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ceb8122 [SPARK-35399][DOCUMENTATION] State is still needed in the event of executor failure add 3b63f32 [SPARK-35400][SQL] Simplify getOuterReferences and improve error message for correlated subquery No new revisions were added by this update. Summary of changes: .../sql/catalyst/analysis/CheckAnalysis.scala | 12 ++--- .../spark/sql/catalyst/expressions/subquery.scala | 29 +- .../spark/sql/errors/QueryCompilationErrors.scala | 6 + .../results/postgreSQL/aggregates_part1.sql.out| 5 +--- .../negative-cases/invalid-correlation.sql.out | 4 +-- .../udf/postgreSQL/udf-aggregates_part1.sql.out| 5 +--- 6 files changed, 29 insertions(+), 32 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b4348b7 -> ceb8122)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b4348b7 [SPARK-35420][BUILD] Replace the usage of toStringHelper with ToStringBuilder add ceb8122 [SPARK-35399][DOCUMENTATION] State is still needed in the event of executor failure No new revisions were added by this update. Summary of changes: docs/configuration.md | 4 ++-- docs/job-scheduling.md | 13 ++--- 2 files changed, 8 insertions(+), 9 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7c13636 -> b4348b7)
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7c13636 [SPARK-34888][SS] Introduce UpdatingSessionIterator adjusting session window on elements add b4348b7 [SPARK-35420][BUILD] Replace the usage of toStringHelper with ToStringBuilder No new revisions were added by this update. Summary of changes: .../spark/network/shuffle/RemoteBlockPushResolver.java | 8 +--- .../network/shuffle/protocol/FinalizeShuffleMerge.java | 8 +--- .../spark/network/shuffle/protocol/MergeStatuses.java | 8 +--- .../spark/network/shuffle/protocol/PushBlockStream.java| 14 -- 4 files changed, 23 insertions(+), 15 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9eb45ec -> 7c13636)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9eb45ec [SPARK-35408][PYTHON] Improve parameter validation in DataFrame.show add 7c13636 [SPARK-34888][SS] Introduce UpdatingSessionIterator adjusting session window on elements No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/internal/SQLConf.scala| 24 ++ .../execution/aggregate/UpdatingSessionsExec.scala | 77 .../aggregate/UpdatingSessionsIterator.scala | 218 +++ .../streaming/UpdatingSessionsIteratorSuite.scala | 423 + 4 files changed, 742 insertions(+) create mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/UpdatingSessionsExec.scala create mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/UpdatingSessionsIterator.scala create mode 100644 sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/UpdatingSessionsIteratorSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35408][PYTHON] Improve parameter validation in DataFrame.show
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 9eb45ec [SPARK-35408][PYTHON] Improve parameter validation in DataFrame.show 9eb45ec is described below commit 9eb45ecb4f39f372e20529da468f304c4ec7c175 Author: Gera Shegalov AuthorDate: Mon May 17 16:22:46 2021 +0900 [SPARK-35408][PYTHON] Improve parameter validation in DataFrame.show ### What changes were proposed in this pull request? Provide clearer error message tied to the user's Python code if incorrect parameters are passed to `DataFrame.show` rather than the message about a missing JVM method the user is not calling directly. ``` py4j.Py4JException: Method showString([class java.lang.Boolean, class java.lang.Integer, class java.lang.Boolean]) does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:274) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748 ``` ### Why are the changes needed? For faster debugging through actionable error message. ### Does this PR introduce _any_ user-facing change? No change for the correct parameters but different error messages for the parameters triggering an exception. ### How was this patch tested? - unit test - manually in PySpark REPL Closes #32555 from gerashegalov/df_show_validation. Authored-by: Gera Shegalov Signed-off-by: Hyukjin Kwon --- python/pyspark/sql/dataframe.py| 16 ++-- python/pyspark/sql/tests/test_dataframe.py | 18 ++ 2 files changed, 32 insertions(+), 2 deletions(-) diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py index 8fe263e..22cc7a4 100644 --- a/python/pyspark/sql/dataframe.py +++ b/python/pyspark/sql/dataframe.py @@ -448,7 +448,7 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin): -- n : int, optional Number of rows to show. -truncate : bool, optional +truncate : bool or int, optional If set to ``True``, truncate strings longer than 20 chars by default. If set to a number greater than one, truncates long strings to length ``truncate`` and align cells right. @@ -482,10 +482,22 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin): age | 5 name | Bob """ + +if not isinstance(n, int) or isinstance(n, bool): +raise TypeError("Parameter 'n' (number of rows) must be an int") + +if not isinstance(vertical, bool): +raise TypeError("Parameter 'vertical' must be a bool") + if isinstance(truncate, bool) and truncate: print(self._jdf.showString(n, 20, vertical)) else: -print(self._jdf.showString(n, int(truncate), vertical)) +try: +int_truncate = int(truncate) +except ValueError: +raise TypeError(f"Parameter 'truncate={truncate}' should be either bool or int.") + +print(self._jdf.showString(n, int_truncate, vertical)) def __repr__(self): if not self._support_repr_html and self.sql_ctx._conf.isReplEagerEvalEnabled(): diff --git a/python/pyspark/sql/tests/test_dataframe.py b/python/pyspark/sql/tests/test_dataframe.py index 3e961cb..74895c0 100644 --- a/python/pyspark/sql/tests/test_dataframe.py +++ b/python/pyspark/sql/tests/test_dataframe.py @@ -837,6 +837,24 @@ class DataFrameTests(ReusedSQLTestCase): finally: shutil.rmtree(tpath) +def test_df_show(self): +# SPARK-35408: ensure better diagnostics if incorrect parameters are passed +# to DataFrame.show + +df = self.spark.createDataFrame([('foo',)]) +df.show(5) +df.show(5, True) +df.show(5, 1, True) +df.show(n=5, truncate='1', vertical=False) +df.show(n=5, truncate=1.5, vertical=False) + +with self.assertRaisesRegex(TypeError, "Parameter 'n'"): +df.show(True) +with self.assertRaisesRegex(TypeError, "Parameter 'vertical'"): +df.show(vertical='foo') +with self.assertRaisesRegex(TypeError, "Parameter 'truncate=foo'"): +df.show(truncate='foo') + class QueryExecutionListenerTests(unittest.TestCase, SQLTestUtils): # These tests are separate because it uses 'spark.sql.queryExecutionListeners' which is
[spark] branch master updated (fb93163 -> 4c01555)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from fb93163 [SPARK-32792][SQL][FOLLOWUP] Fix conflict with SPARK-34661 add 4c01555 [SPARK-35416][K8S] Support PersistentVolumeClaim Reuse No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/deploy/k8s/Config.scala | 14 .../cluster/k8s/ExecutorPodsAllocator.scala| 68 +-- .../apache/spark/deploy/k8s/Fabric8Aliases.scala | 6 +- .../cluster/k8s/ExecutorLifecycleTestUtils.scala | 44 +++- .../cluster/k8s/ExecutorPodsAllocatorSuite.scala | 79 +- 5 files changed, 204 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org