date:20210517

[spark] branch master updated (747fe72 -> 3b859a1)

2021-05-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 747fe72  [SPARK-35419][PYTHON] Enable 
spark.sql.execution.pyspark.udf.simplifiedTraceback.enabled by default
 add 3b859a1  [SPARK-35431][SQL][TESTS] Sort elements generated by 
collect_set in SQLQueryTestSuite

No new revisions were added by this update.

Summary of changes:
 .../inputs/subquery/scalar-subquery/scalar-subquery-select.sql| 2 +-
 .../results/subquery/scalar-subquery/scalar-subquery-select.sql.out   | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] viirya commented on pull request #342: Update website for 2.4.8 release

2021-05-17 Thread GitBox



viirya commented on pull request #342:
URL: https://github.com/apache/spark-website/pull/342#issuecomment-842803660


   Thanks @maropu @HyukjinKwon @dongjoon-hyun!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] dongjoon-hyun merged pull request #342: Update website for 2.4.8 release

2021-05-17 Thread GitBox



dongjoon-hyun merged pull request #342:
URL: https://github.com/apache/spark-website/pull/342


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a60c364 -> 747fe72)

2021-05-17 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a60c364  [SPARK-34981][SQL][TESTS][FOLLOWUP] Fix test failure under 
Scala 2.13
 add 747fe72  [SPARK-35419][PYTHON] Enable 
spark.sql.execution.pyspark.udf.simplifiedTraceback.enabled by default

No new revisions were added by this update.

Summary of changes:
 python/docs/source/migration_guide/pyspark_3.1_to_3.2.rst   | 2 ++
 sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] viirya commented on pull request #341: Update release process

2021-05-17 Thread GitBox



viirya commented on pull request #341:
URL: https://github.com/apache/spark-website/pull/341#issuecomment-842770942


   Thank you @HyukjinKwon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] HyukjinKwon commented on pull request #341: Update release process

2021-05-17 Thread GitBox



HyukjinKwon commented on pull request #341:
URL: https://github.com/apache/spark-website/pull/341#issuecomment-842769465


    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] viirya commented on pull request #342: Update website for 2.4.8 release

2021-05-17 Thread GitBox



viirya commented on pull request #342:
URL: https://github.com/apache/spark-website/pull/342#issuecomment-842753706


   cc @dongjoon-hyun @maropu @srowen @HyukjinKwon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] viirya opened a new pull request #342: Update website for 2.4.8 release

2021-05-17 Thread GitBox



viirya opened a new pull request #342:
URL: https://github.com/apache/spark-website/pull/342


   Update website for 2.4.8 release:
   
   Added:
   releases/_posts/2021-05-17-spark-release-2-4-8.md
   news/_posts/2021-05-17-spark-2-4-8-released.md 
   
   Run `bundle exec jekyll build` to update html files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] viirya commented on pull request #341: Update release process

2021-05-17 Thread GitBox



viirya commented on pull request #341:
URL: https://github.com/apache/spark-website/pull/341#issuecomment-842706959


   Thanks @dongjoon-hyun @srowen @maropu 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] maropu commented on pull request #341: Update release process

2021-05-17 Thread GitBox



maropu commented on pull request #341:
URL: https://github.com/apache/spark-website/pull/341#issuecomment-842706441


   lgtm, too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark-website] branch asf-site updated: Update release process (#341)

2021-05-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 9dc8dc6  Update release process (#341)
9dc8dc6 is described below

commit 9dc8dc6670e1313d2925b30eb5e7cf6d03a70e68
Author: Liang-Chi Hsieh 
AuthorDate: Mon May 17 16:18:24 2021 -0700

Update release process (#341)
---
 release-process.md| 13 -
 site/release-process.html | 13 -
 2 files changed, 26 deletions(-)

diff --git a/release-process.md b/release-process.md
index 0e5e5db..72dcdb4 100644
--- a/release-process.md
+++ b/release-process.md
@@ -203,10 +203,6 @@ $ svn rm 
https://dist.apache.org/repos/dist/release/spark/spark-1.1.0
 You will also need to update `js/download.js` to indicate the release is not 
mirrored
 anymore, so that the correct links are generated on the site.
 
-Also take a moment to check `HiveExternalCatalogVersionsSuite.scala` starting 
with branch-2.2
-and see if it needs to be adjusted, since that test relies on mirrored 
downloads of previous
-releases.
-
 
 Update the Spark Apache Repository
 
@@ -317,15 +313,6 @@ $ git shortlog v1.1.1 --grep "$EXPR" > contrib.txt
 $ git log v1.1.1 --grep "$expr" --shortstat --oneline | grep -B 1 -e 
"[3-9][0-9][0-9] insert" -e "[1-9][1-9][1-9][1-9] insert" | grep SPARK > 
large-patches.txt
 ```
 
-Update `HiveExternalCatalogVersionsSuite`
-
-When a new release occurs, `PROCESS_TABLES.testingVersions` in 
`HiveExternalCatalogVersionsSuite`
-must be updated shortly thereafter. This list should contain the latest 
release in all active
-maintenance branches, and no more.
-For example, as of this writing, it has value `val testingVersions = 
Seq("2.1.3", "2.2.2", "2.3.2")`.
-"2.4.0" will be added to the list when it's released. "2.1.3" will be removed 
(and removed from the Spark dist mirrors)
-when the branch is no longer maintained. "2.3.2" will become "2.3.3" when 
"2.3.3" is released.
-
 Create an Announcement
 
 Once everything is working (website docs, website changes) create an 
announcement on the website
diff --git a/site/release-process.html b/site/release-process.html
index 860abec..87acebe 100644
--- a/site/release-process.html
+++ b/site/release-process.html
@@ -398,10 +398,6 @@ To delete older versions simply use svn rm:
 You will also need to update js/download.js to indicate the release is not mirrored
 anymore, so that the correct links are generated on the site.
 
-Also take a moment to check HiveExternalCatalogVersionsSuite.scala starting with 
branch-2.2
-and see if it needs to be adjusted, since that test relies on mirrored 
downloads of previous
-releases.
-
 Update the Spark Apache Repository
 
 Check out the tagged commit for the release candidate that passed and apply 
the correct version tag.
@@ -508,15 +504,6 @@ $ git shortlog v1.1.1 --grep "$EXPR"  contrib.txt
 $ git log v1.1.1 --grep "$expr" --shortstat --oneline | grep -B 1 -e 
"[3-9][0-9][0-9] insert" -e "[1-9][1-9][1-9][1-9] insert" | grep SPARK  
large-patches.txt
 
 
-Update `HiveExternalCatalogVersionsSuite`
-
-When a new release occurs, PROCESS_TABLES.testingVersions in HiveExternalCatalogVersionsSuite
-must be updated shortly thereafter. This list should contain the latest 
release in all active
-maintenance branches, and no more.
-For example, as of this writing, it has value val testingVersions = Seq("2.1.3", "2.2.2", "2.3.2").
-2.4.0 will be added to the list when its released. 
2.1.3 will be removed (and removed from the Spark dist mirrors)
-when the branch is no longer maintained. 2.3.2 will become 
2.3.3 when 2.3.3 is released.
-
 Create an Announcement
 
 Once everything is working (website docs, website changes) create an 
announcement on the website

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] dongjoon-hyun merged pull request #341: Update release process

2021-05-17 Thread GitBox



dongjoon-hyun merged pull request #341:
URL: https://github.com/apache/spark-website/pull/341


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (2a335f2 -> a60c364)

2021-05-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2a335f2  [SPARK-34941][PYTHON] Fix mypy errors and enable mypy check 
for pandas-on-Spark
 add a60c364  [SPARK-34981][SQL][TESTS][FOLLOWUP] Fix test failure under 
Scala 2.13

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/connector/DataSourceV2FunctionSuite.scala  | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] viirya commented on a change in pull request #341: Update release process

2021-05-17 Thread GitBox



viirya commented on a change in pull request #341:
URL: https://github.com/apache/spark-website/pull/341#discussion_r633888912



##
File path: site/sitemap.xml
##
@@ -876,27 +876,27 @@
   weekly
 
 
-  https://spark.apache.org/screencasts/
+  https://spark.apache.org/graphx/

Review comment:
   Yea, let me revert it. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] dongjoon-hyun commented on pull request #341: Update release process

2021-05-17 Thread GitBox



dongjoon-hyun commented on pull request #341:
URL: https://github.com/apache/spark-website/pull/341#issuecomment-842650705


   +1, LGTM (except the above comment on site/sitemap.xml).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] dongjoon-hyun commented on a change in pull request #341: Update release process

2021-05-17 Thread GitBox



dongjoon-hyun commented on a change in pull request #341:
URL: https://github.com/apache/spark-website/pull/341#discussion_r633878110



##
File path: site/sitemap.xml
##
@@ -876,27 +876,27 @@
   weekly
 
 
-  https://spark.apache.org/screencasts/
+  https://spark.apache.org/graphx/

Review comment:
   Yes, +1 for reverting this file change.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] srowen commented on a change in pull request #341: Update release process

2021-05-17 Thread GitBox



srowen commented on a change in pull request #341:
URL: https://github.com/apache/spark-website/pull/341#discussion_r633877163



##
File path: site/sitemap.xml
##
@@ -876,27 +876,27 @@
   weekly
 
 
-  https://spark.apache.org/screencasts/
+  https://spark.apache.org/graphx/

Review comment:
   You could revert this change, but I'm not sure which version is 'right' 
- which one is the result as of the latest site generation tools. Rest is OK.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] viirya commented on pull request #341: Update release process

2021-05-17 Thread GitBox



viirya commented on pull request #341:
URL: https://github.com/apache/spark-website/pull/341#issuecomment-842637559


   cc @dongjoon-hyun @srowen @HyukjinKwon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] viirya opened a new pull request #341: Update release process

2021-05-17 Thread GitBox



viirya opened a new pull request #341:
URL: https://github.com/apache/spark-website/pull/341


   `HiveExternalCatalogVersionsSuite` now doesn't need to manually update for 
`testingVersions`. It automatically gets the latest releases now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] viirya commented on pull request #340: Add docs for Apache Spark 2.4.8

2021-05-17 Thread GitBox



viirya commented on pull request #340:
URL: https://github.com/apache/spark-website/pull/340#issuecomment-841721564


   Thanks @srowen @dongjoon-hyun! Merging to asf-site.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] viirya closed pull request #340: Add docs for Apache Spark 2.4.8

2021-05-17 Thread GitBox



viirya closed pull request #340:
URL: https://github.com/apache/spark-website/pull/340


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] HyukjinKwon commented on pull request #340: Add docs for Apache Spark 2.4.8

2021-05-17 Thread GitBox



HyukjinKwon commented on pull request #340:
URL: https://github.com/apache/spark-website/pull/340#issuecomment-841743729


   Awesome!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (3a3f8ca -> 2a335f2)

2021-05-17 Thread ueshin

This is an automated email from the ASF dual-hosted git repository.

ueshin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3a3f8ca  [SPARK-35359][SQL] Insert data with char/varchar datatype 
will fail when data length exceed length limitation
 add 2a335f2  [SPARK-34941][PYTHON] Fix mypy errors and enable mypy check 
for pandas-on-Spark

No new revisions were added by this update.

Summary of changes:
 python/mypy.ini| 18 +--
 python/pyspark/pandas/accessors.py | 26 -
 python/pyspark/pandas/base.py  | 32 +--
 python/pyspark/pandas/frame.py | 60 ++---
 python/pyspark/pandas/generic.py   |  5 +-
 python/pyspark/pandas/groupby.py   | 45 
 python/pyspark/pandas/indexes/base.py  | 37 +++--
 python/pyspark/pandas/indexing.py  | 26 +
 python/pyspark/pandas/internal.py  | 66 ---
 python/pyspark/pandas/ml.py|  4 +-
 python/pyspark/pandas/namespace.py |  7 ++-
 python/pyspark/pandas/numpy_compat.py  | 85 ++
 python/pyspark/pandas/series.py|  4 +-
 python/pyspark/pandas/spark/accessors.py   |  3 +-
 python/pyspark/pandas/spark/functions.py   |  2 +-
 python/pyspark/pandas/spark/utils.py   | 62 --
 python/pyspark/pandas/sql_processor.py |  4 +-
 python/pyspark/pandas/strings.py   |  6 +--
 python/pyspark/pandas/typedef/typehints.py |  6 +--
 python/pyspark/pandas/utils.py | 16 +-
 20 files changed, 303 insertions(+), 211 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.1 updated: [SPARK-35359][SQL] Insert data with char/varchar datatype will fail when data length exceed length limitation

2021-05-17 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new aa5c72f  [SPARK-35359][SQL] Insert data with char/varchar datatype 
will fail when data length exceed length limitation
aa5c72f is described below

commit aa5c72f8caa5d63a92dcd28fcb263682a3f0e250
Author: fhygh <283452...@qq.com>
AuthorDate: Tue May 18 00:13:40 2021 +0800

[SPARK-35359][SQL] Insert data with char/varchar datatype will fail when 
data length exceed length limitation

### What changes were proposed in this pull request?
This PR is used to fix this bug:

```
set spark.sql.legacy.charVarcharAsString=true;
create table chartb01(a char(3));
insert into chartb01 select 'a';
```

here we expect the data of table chartb01 is 'aaa', but it runs failed.

### Why are the changes needed?
Improve backward compatibility

```
spark-sql>
 > create table tchar01(col char(2)) using parquet;
Time taken: 0.767 seconds
spark-sql>
 > insert into tchar01 select 'aaa';
ERROR | Executor task launch worker for task 0.0 in stage 0.0 (TID 0) | 
Aborting task | org.apache.spark.util.Utils.logError(Logging.scala:94)
java.lang.RuntimeException: Exceeds char/varchar type length limitation: 2
at 
org.apache.spark.sql.catalyst.util.CharVarcharCodegenUtils.trimTrailingSpaces(CharVarcharCodegenUtils.java:31)
at 
org.apache.spark.sql.catalyst.util.CharVarcharCodegenUtils.charTypeWriteSideCheck(CharVarcharCodegenUtils.java:44)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:279)
at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1500)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:288)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$15(FileFormatWriter.scala:212)
at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1466)
at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
```

### Does this PR introduce _any_ user-facing change?
No (the legacy config is false by default).

### How was this patch tested?
Added unit tests.

Closes #32501 from fhygh/master.

Authored-by: fhygh <283452...@qq.com>
Signed-off-by: Wenchen Fan 
(cherry picked from commit 3a3f8ca6f421b9bc51e0059c954262489aa41f5d)
Signed-off-by: Wenchen Fan 
---
 .../catalyst/analysis/TableOutputResolver.scala|  6 +++-
 .../apache/spark/sql/util/PartitioningUtils.scala  | 36 --
 .../apache/spark/sql/CharVarcharTestSuite.scala| 12 
 3 files changed, 36 insertions(+), 18 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala
index d5c407b..32bdb82 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala
@@ -100,7 +100,11 @@ object TableOutputResolver {
 case _ =>
   Cast(queryExpr, tableAttr.dataType, 
Option(conf.sessionLocalTimeZone))
   }
-  val exprWithStrLenCheck = CharVarcharUtils.stringLengthCheck(casted, 
tableAttr)
+  val exprWithStrLenCheck = if (conf.charVarcharAsString) {
+casted
+  } else {
+CharVarcharUtils.stringLengthCheck(casted, tableAttr)
+  }
   // Renaming is needed for handling the following cases like
   // 1) Column names/types do not match, e.g.,

[spark] branch master updated (3b63f32 -> 3a3f8ca)

2021-05-17 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3b63f32  [SPARK-35400][SQL] Simplify getOuterReferences and improve 
error message for correlated subquery
 add 3a3f8ca  [SPARK-35359][SQL] Insert data with char/varchar datatype 
will fail when data length exceed length limitation

No new revisions were added by this update.

Summary of changes:
 .../catalyst/analysis/TableOutputResolver.scala|  6 +++-
 .../apache/spark/sql/util/PartitioningUtils.scala  | 36 --
 .../apache/spark/sql/CharVarcharTestSuite.scala| 12 
 3 files changed, 36 insertions(+), 18 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (ceb8122 -> 3b63f32)

2021-05-17 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ceb8122  [SPARK-35399][DOCUMENTATION] State is still needed in the 
event of executor failure
 add 3b63f32  [SPARK-35400][SQL] Simplify getOuterReferences and improve 
error message for correlated subquery

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/analysis/CheckAnalysis.scala  | 12 ++---
 .../spark/sql/catalyst/expressions/subquery.scala  | 29 +-
 .../spark/sql/errors/QueryCompilationErrors.scala  |  6 +
 .../results/postgreSQL/aggregates_part1.sql.out|  5 +---
 .../negative-cases/invalid-correlation.sql.out |  4 +--
 .../udf/postgreSQL/udf-aggregates_part1.sql.out|  5 +---
 6 files changed, 29 insertions(+), 32 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (b4348b7 -> ceb8122)

2021-05-17 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b4348b7  [SPARK-35420][BUILD] Replace the usage of toStringHelper with 
ToStringBuilder
 add ceb8122  [SPARK-35399][DOCUMENTATION] State is still needed in the 
event of executor failure

No new revisions were added by this update.

Summary of changes:
 docs/configuration.md  |  4 ++--
 docs/job-scheduling.md | 13 ++---
 2 files changed, 8 insertions(+), 9 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (7c13636 -> b4348b7)

2021-05-17 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7c13636  [SPARK-34888][SS] Introduce UpdatingSessionIterator adjusting 
session window on elements
 add b4348b7  [SPARK-35420][BUILD] Replace the usage of toStringHelper with 
ToStringBuilder

No new revisions were added by this update.

Summary of changes:
 .../spark/network/shuffle/RemoteBlockPushResolver.java |  8 +---
 .../network/shuffle/protocol/FinalizeShuffleMerge.java |  8 +---
 .../spark/network/shuffle/protocol/MergeStatuses.java  |  8 +---
 .../spark/network/shuffle/protocol/PushBlockStream.java| 14 --
 4 files changed, 23 insertions(+), 15 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (9eb45ec -> 7c13636)

2021-05-17 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9eb45ec  [SPARK-35408][PYTHON] Improve parameter validation in 
DataFrame.show
 add 7c13636  [SPARK-34888][SS] Introduce UpdatingSessionIterator adjusting 
session window on elements

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/internal/SQLConf.scala|  24 ++
 .../execution/aggregate/UpdatingSessionsExec.scala |  77 
 .../aggregate/UpdatingSessionsIterator.scala   | 218 +++
 .../streaming/UpdatingSessionsIteratorSuite.scala  | 423 +
 4 files changed, 742 insertions(+)
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/UpdatingSessionsExec.scala
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/UpdatingSessionsIterator.scala
 create mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/UpdatingSessionsIteratorSuite.scala

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-35408][PYTHON] Improve parameter validation in DataFrame.show

2021-05-17 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9eb45ec  [SPARK-35408][PYTHON] Improve parameter validation in 
DataFrame.show
9eb45ec is described below

commit 9eb45ecb4f39f372e20529da468f304c4ec7c175
Author: Gera Shegalov 
AuthorDate: Mon May 17 16:22:46 2021 +0900

[SPARK-35408][PYTHON] Improve parameter validation in DataFrame.show

### What changes were proposed in this pull request?
Provide clearer error message tied to the user's Python code if incorrect 
parameters are passed to `DataFrame.show` rather than the message about a 
missing JVM method the user is not calling directly.

```
py4j.Py4JException: Method showString([class java.lang.Boolean, class 
java.lang.Integer, class java.lang.Boolean]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
at py4j.Gateway.invoke(Gateway.java:274)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748
```

### Why are the changes needed?
For faster debugging through actionable error message.

### Does this PR introduce _any_ user-facing change?
No change for the correct parameters but different error messages for the 
parameters triggering an exception.

### How was this patch tested?
- unit test
- manually in PySpark REPL

Closes #32555 from gerashegalov/df_show_validation.

Authored-by: Gera Shegalov 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/sql/dataframe.py| 16 ++--
 python/pyspark/sql/tests/test_dataframe.py | 18 ++
 2 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index 8fe263e..22cc7a4 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -448,7 +448,7 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin):
 --
 n : int, optional
 Number of rows to show.
-truncate : bool, optional
+truncate : bool or int, optional
 If set to ``True``, truncate strings longer than 20 chars by 
default.
 If set to a number greater than one, truncates long strings to 
length ``truncate``
 and align cells right.
@@ -482,10 +482,22 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin):
  age  | 5
  name | Bob
 """
+
+if not isinstance(n, int) or isinstance(n, bool):
+raise TypeError("Parameter 'n' (number of rows) must be an int")
+
+if not isinstance(vertical, bool):
+raise TypeError("Parameter 'vertical' must be a bool")
+
 if isinstance(truncate, bool) and truncate:
 print(self._jdf.showString(n, 20, vertical))
 else:
-print(self._jdf.showString(n, int(truncate), vertical))
+try:
+int_truncate = int(truncate)
+except ValueError:
+raise TypeError(f"Parameter 'truncate={truncate}' should be 
either bool or int.")
+
+print(self._jdf.showString(n, int_truncate, vertical))
 
 def __repr__(self):
 if not self._support_repr_html and 
self.sql_ctx._conf.isReplEagerEvalEnabled():
diff --git a/python/pyspark/sql/tests/test_dataframe.py 
b/python/pyspark/sql/tests/test_dataframe.py
index 3e961cb..74895c0 100644
--- a/python/pyspark/sql/tests/test_dataframe.py
+++ b/python/pyspark/sql/tests/test_dataframe.py
@@ -837,6 +837,24 @@ class DataFrameTests(ReusedSQLTestCase):
 finally:
 shutil.rmtree(tpath)
 
+def test_df_show(self):
+# SPARK-35408: ensure better diagnostics if incorrect parameters are 
passed
+# to DataFrame.show
+
+df = self.spark.createDataFrame([('foo',)])
+df.show(5)
+df.show(5, True)
+df.show(5, 1, True)
+df.show(n=5, truncate='1', vertical=False)
+df.show(n=5, truncate=1.5, vertical=False)
+
+with self.assertRaisesRegex(TypeError, "Parameter 'n'"):
+df.show(True)
+with self.assertRaisesRegex(TypeError, "Parameter 'vertical'"):
+df.show(vertical='foo')
+with self.assertRaisesRegex(TypeError, "Parameter 'truncate=foo'"):
+df.show(truncate='foo')
+
 
 class QueryExecutionListenerTests(unittest.TestCase, SQLTestUtils):
 # These tests are separate because it uses 
'spark.sql.queryExecutionListeners' which is

[spark] branch master updated (fb93163 -> 4c01555)

2021-05-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from fb93163  [SPARK-32792][SQL][FOLLOWUP] Fix conflict with SPARK-34661
 add 4c01555  [SPARK-35416][K8S] Support PersistentVolumeClaim Reuse

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/deploy/k8s/Config.scala | 14 
 .../cluster/k8s/ExecutorPodsAllocator.scala| 68 +--
 .../apache/spark/deploy/k8s/Fabric8Aliases.scala   |  6 +-
 .../cluster/k8s/ExecutorLifecycleTestUtils.scala   | 44 +++-
 .../cluster/k8s/ExecutorPodsAllocatorSuite.scala   | 79 +-
 5 files changed, 204 insertions(+), 7 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (747fe72 -> 3b859a1)

[GitHub] [spark-website] viirya commented on pull request #342: Update website for 2.4.8 release

[GitHub] [spark-website] dongjoon-hyun merged pull request #342: Update website for 2.4.8 release

[spark] branch master updated (a60c364 -> 747fe72)

[GitHub] [spark-website] viirya commented on pull request #341: Update release process

[GitHub] [spark-website] HyukjinKwon commented on pull request #341: Update release process

[GitHub] [spark-website] viirya commented on pull request #342: Update website for 2.4.8 release

[GitHub] [spark-website] viirya opened a new pull request #342: Update website for 2.4.8 release

[GitHub] [spark-website] viirya commented on pull request #341: Update release process

[GitHub] [spark-website] maropu commented on pull request #341: Update release process

[spark-website] branch asf-site updated: Update release process (#341)

[GitHub] [spark-website] dongjoon-hyun merged pull request #341: Update release process

[spark] branch master updated (2a335f2 -> a60c364)

[GitHub] [spark-website] viirya commented on a change in pull request #341: Update release process

[GitHub] [spark-website] dongjoon-hyun commented on pull request #341: Update release process

[GitHub] [spark-website] dongjoon-hyun commented on a change in pull request #341: Update release process

[GitHub] [spark-website] srowen commented on a change in pull request #341: Update release process

[GitHub] [spark-website] viirya commented on pull request #341: Update release process

[GitHub] [spark-website] viirya opened a new pull request #341: Update release process

[GitHub] [spark-website] viirya commented on pull request #340: Add docs for Apache Spark 2.4.8

[GitHub] [spark-website] viirya closed pull request #340: Add docs for Apache Spark 2.4.8

[GitHub] [spark-website] HyukjinKwon commented on pull request #340: Add docs for Apache Spark 2.4.8

[spark] branch master updated (3a3f8ca -> 2a335f2)

[spark] branch branch-3.1 updated: [SPARK-35359][SQL] Insert data with char/varchar datatype will fail when data length exceed length limitation

[spark] branch master updated (3b63f32 -> 3a3f8ca)

[spark] branch master updated (ceb8122 -> 3b63f32)

[spark] branch master updated (b4348b7 -> ceb8122)

[spark] branch master updated (7c13636 -> b4348b7)

[spark] branch master updated (9eb45ec -> 7c13636)

[spark] branch master updated: [SPARK-35408][PYTHON] Improve parameter validation in DataFrame.show

[spark] branch master updated (fb93163 -> 4c01555)

31 matches

Site Navigation

Mail list logo

Footer information