[GitHub] [spark-website] gengliangwang commented on a change in pull request #356: Improve the guideline of Preparing gpg key

2021-08-18 Thread GitBox


gengliangwang commented on a change in pull request #356:
URL: https://github.com/apache/spark-website/pull/356#discussion_r691803255



##
File path: release-process.md
##
@@ -39,15 +39,82 @@ If you are a new Release Manager, you can read up on the 
process from the follow
 
 You can skip this section if you have already uploaded your key.
 
-After generating the gpg key, you need to upload your key to a public key 
server. Please refer to
-https://www.apache.org/dev/openpgp.html#generate-key;>https://www.apache.org/dev/openpgp.html#generate-key
-for details.
+Generate Key
 
-If you want to do the release on another machine, you can transfer your secret 
key to that machine
-via the `gpg --export-secret-keys` and `gpg --import` commands.
+Here's an example of gpg 2.0.12. If you use gpg version 1 series, please refer 
to https://www.apache.org/dev/openpgp.html#generate-key;>generate-key 
for details.
+
+```
+:::console
+$ gpg --full-gen-key
+gpg (GnuPG) 2.0.12; Copyright (C) 2009 Free Software Foundation, Inc.
+This is free software: you are free to change and redistribute it.
+There is NO WARRANTY, to the extent permitted by law.
+
+Please select what kind of key you want:
+   (1) RSA and RSA (default)
+   (2) DSA and Elgamal
+   (3) DSA (sign only)
+   (4) RSA (sign only)
+Your selection? 1
+RSA keys may be between 1024 and 4096 bits long.
+What keysize do you want? (2048) 4096
+Requested keysize is 4096 bits
+Please specify how long the key should be valid.
+ 0 = key does not expire
+= key expires in n days
+  w = key expires in n weeks
+  m = key expires in n months
+  y = key expires in n years
+Key is valid for? (0) 
+Key does not expire at all
+Is this correct? (y/N) y
+
+GnuPG needs to construct a user ID to identify your key.
+
+Real name: Robert Burrell Donkin
+Email address: rdon...@apache.org
+Comment: CODE SIGNING KEY
+You selected this USER-ID:
+"Robert Burrell Donkin (CODE SIGNING KEY) "
+
+Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? O
+You need a Passphrase to protect your secret key.
+```
+
+Upload Key

Review comment:
   This new section seems duplicated with 
https://infra.apache.org/openpgp.html




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (559fe96 -> 013f2b7)

2021-08-18 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 559fe96  [SPARK-35991][SQL] Add PlanStability suite for TPCH
 add 013f2b7  [SPARK-36512][UI][TESTS] Fix UISeleniumSuite in 
sql/hive-thriftserver

No new revisions were added by this update.

Summary of changes:
 .../sql/hive/thriftserver/UISeleniumSuite.scala| 31 ++
 1 file changed, 26 insertions(+), 5 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] cloud-fan commented on a change in pull request #356: Improve the guideline of Preparing gpg key

2021-08-18 Thread GitBox


cloud-fan commented on a change in pull request #356:
URL: https://github.com/apache/spark-website/pull/356#discussion_r691793523



##
File path: release-process.md
##
@@ -39,15 +39,82 @@ If you are a new Release Manager, you can read up on the 
process from the follow
 
 You can skip this section if you have already uploaded your key.
 
-After generating the gpg key, you need to upload your key to a public key 
server. Please refer to
-https://www.apache.org/dev/openpgp.html#generate-key;>https://www.apache.org/dev/openpgp.html#generate-key
-for details.
+Generate Key
 
-If you want to do the release on another machine, you can transfer your secret 
key to that machine
-via the `gpg --export-secret-keys` and `gpg --import` commands.
+Here's an example of gpg 2.0.12. If you use gpg version 1 series, please refer 
to https://www.apache.org/dev/openpgp.html#generate-key;>generate-key 
for details.
+
+```
+:::console
+$ gpg --full-gen-key
+gpg (GnuPG) 2.0.12; Copyright (C) 2009 Free Software Foundation, Inc.
+This is free software: you are free to change and redistribute it.
+There is NO WARRANTY, to the extent permitted by law.
+
+Please select what kind of key you want:
+   (1) RSA and RSA (default)
+   (2) DSA and Elgamal
+   (3) DSA (sign only)
+   (4) RSA (sign only)
+Your selection? 1
+RSA keys may be between 1024 and 4096 bits long.
+What keysize do you want? (2048) 4096
+Requested keysize is 4096 bits
+Please specify how long the key should be valid.
+ 0 = key does not expire
+= key expires in n days
+  w = key expires in n weeks
+  m = key expires in n months
+  y = key expires in n years
+Key is valid for? (0) 
+Key does not expire at all
+Is this correct? (y/N) y
+
+GnuPG needs to construct a user ID to identify your key.
+
+Real name: Robert Burrell Donkin
+Email address: rdon...@apache.org
+Comment: CODE SIGNING KEY
+You selected this USER-ID:
+"Robert Burrell Donkin (CODE SIGNING KEY) "
+
+Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? O
+You need a Passphrase to protect your secret key.
+```
+
+Upload Key
 
-The last step is to update the KEYS file with your code signing key
-https://www.apache.org/dev/openpgp.html#export-public-key;>https://www.apache.org/dev/openpgp.html#export-public-key
+After generating the key, we should upload the public key to a https://infra.apache.org/release-signing.html#keyserver;>public key 
server.
+Upload the public key either by:
+
+(Recommended)
+First, export all public keys to ASCII-armored public key by
+```
+:::console
+$ gpg --export --armor 
+```
+or export the specific public key if you know the https://infra.apache.org/release-signing.html#key-id;>key ID, e.g.,
+```
+:::console
+$ gpg --export --armor AD741727
+```
+(Please refer to https://infra.apache.org/openpgp.html#export-public-key;>export-public-key
 for details.)
+
+Second, copy-paste your ASCII-armored public key to http://keyserver.ubuntu.com:11371/#submitKey;>OpenPGP Keyserver and 
submit.
+
+or
+
+Use gpg command to upload, e.g.,
+
+```
+$ gpg --send-key B13131DE2

Review comment:
   Can we use the same id in the previous example `AD741727`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] cloud-fan commented on a change in pull request #356: Improve the guideline of Preparing gpg key

2021-08-18 Thread GitBox


cloud-fan commented on a change in pull request #356:
URL: https://github.com/apache/spark-website/pull/356#discussion_r691793389



##
File path: release-process.md
##
@@ -39,15 +39,82 @@ If you are a new Release Manager, you can read up on the 
process from the follow
 
 You can skip this section if you have already uploaded your key.
 
-After generating the gpg key, you need to upload your key to a public key 
server. Please refer to
-https://www.apache.org/dev/openpgp.html#generate-key;>https://www.apache.org/dev/openpgp.html#generate-key
-for details.
+Generate Key
 
-If you want to do the release on another machine, you can transfer your secret 
key to that machine
-via the `gpg --export-secret-keys` and `gpg --import` commands.
+Here's an example of gpg 2.0.12. If you use gpg version 1 series, please refer 
to https://www.apache.org/dev/openpgp.html#generate-key;>generate-key 
for details.
+
+```
+:::console
+$ gpg --full-gen-key
+gpg (GnuPG) 2.0.12; Copyright (C) 2009 Free Software Foundation, Inc.
+This is free software: you are free to change and redistribute it.
+There is NO WARRANTY, to the extent permitted by law.
+
+Please select what kind of key you want:
+   (1) RSA and RSA (default)
+   (2) DSA and Elgamal
+   (3) DSA (sign only)
+   (4) RSA (sign only)
+Your selection? 1
+RSA keys may be between 1024 and 4096 bits long.
+What keysize do you want? (2048) 4096
+Requested keysize is 4096 bits
+Please specify how long the key should be valid.
+ 0 = key does not expire
+= key expires in n days
+  w = key expires in n weeks
+  m = key expires in n months
+  y = key expires in n years
+Key is valid for? (0) 
+Key does not expire at all
+Is this correct? (y/N) y
+
+GnuPG needs to construct a user ID to identify your key.
+
+Real name: Robert Burrell Donkin
+Email address: rdon...@apache.org
+Comment: CODE SIGNING KEY
+You selected this USER-ID:
+"Robert Burrell Donkin (CODE SIGNING KEY) "
+
+Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? O
+You need a Passphrase to protect your secret key.
+```
+
+Upload Key
 
-The last step is to update the KEYS file with your code signing key
-https://www.apache.org/dev/openpgp.html#export-public-key;>https://www.apache.org/dev/openpgp.html#export-public-key
+After generating the key, we should upload the public key to a https://infra.apache.org/release-signing.html#keyserver;>public key 
server.
+Upload the public key either by:
+
+(Recommended)
+First, export all public keys to ASCII-armored public key by
+```
+:::console
+$ gpg --export --armor 
+```
+or export the specific public key if you know the https://infra.apache.org/release-signing.html#key-id;>key ID, e.g.,
+```
+:::console
+$ gpg --export --armor AD741727

Review comment:
   is there a way to find the id of the newly created CODE SIGNING KEY?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] Ngone51 commented on pull request #356: Improve the guideline of Preparing gpg key

2021-08-18 Thread GitBox


Ngone51 commented on pull request #356:
URL: https://github.com/apache/spark-website/pull/356#issuecomment-901591270


   cc @HyukjinKwon @cloud-fan @gengliangwang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] Ngone51 opened a new pull request #356: Improve the guideline of Preparing gpg key

2021-08-18 Thread GitBox


Ngone51 opened a new pull request #356:
URL: https://github.com/apache/spark-website/pull/356


   This PR proposes to improve the guideline of `Preparing gpg key` section in 
the release process. This's how it looks like before and after:
   
   ### Before
   
   https://user-images.githubusercontent.com/16397174/130005588-5e1f6b54-b996-410c-bd81-0e1885e742a2.png;>
   
   ### After
   
   https://user-images.githubusercontent.com/16397174/130005226-760a6901-c603-4532-b5e1-ee2f5e1e47b8.png;>
   https://user-images.githubusercontent.com/16397174/130005256-7a616564-7484-4dc3-9bc0-ce67948e88cf.png;>
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (07d173a -> c458edb)

2021-08-18 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 07d173a  [SPARK-33687][SQL][DOC][FOLLOWUP] Merge the doc pages of 
ANALYZE TABLE and ANALYZE TABLES
 add c458edb  [SPARK-36371][SQL] Support raw string literal

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-literals.md   | 13 +++-
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|  2 +
 .../spark/sql/catalyst/parser/ParserUtils.scala| 72 --
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala |  9 +++
 4 files changed, 61 insertions(+), 35 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] yutoacts closed pull request #350: [SPARK-36335] Add local-cluster docs to developer-tools.md

2021-08-18 Thread GitBox


yutoacts closed pull request #350:
URL: https://github.com/apache/spark-website/pull/350


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] yutoacts commented on pull request #350: [SPARK-36335] Add local-cluster docs to developer-tools.md

2021-08-18 Thread GitBox


yutoacts commented on pull request #350:
URL: https://github.com/apache/spark-website/pull/350#issuecomment-901576855


   It ended up as https://github.com/apache/spark/pull/33537.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.2 updated: [SPARK-33687][SQL][DOC][FOLLOWUP] Merge the doc pages of ANALYZE TABLE and ANALYZE TABLES

2021-08-18 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 8f3b4c4  [SPARK-33687][SQL][DOC][FOLLOWUP] Merge the doc pages of 
ANALYZE TABLE and ANALYZE TABLES
8f3b4c4 is described below

commit 8f3b4c4b7d717c5cfc922ce160a1da42303d5304
Author: Wenchen Fan 
AuthorDate: Thu Aug 19 11:04:05 2021 +0800

[SPARK-33687][SQL][DOC][FOLLOWUP] Merge the doc pages of ANALYZE TABLE and 
ANALYZE TABLES

### What changes were proposed in this pull request?

This is a followup of https://github.com/apache/spark/pull/30648

ANALYZE TABLE and TABLES are essentially the same command, it's weird to 
put them in 2 different doc pages. This PR proposes to merge them into one doc 
page.

### Why are the changes needed?

simplify the doc

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

N/A

Closes #33781 from cloud-fan/doc.

Authored-by: Wenchen Fan 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 07d173a8b0a19a2912905387bcda10e9db3c43c6)
Signed-off-by: Wenchen Fan 
---
 docs/sql-ref-syntax-aux-analyze-table.md  |  85 +++
 docs/sql-ref-syntax-aux-analyze-tables.md | 110 --
 docs/sql-ref-syntax-aux-analyze.md|  23 ---
 docs/sql-ref-syntax.md|   1 -
 4 files changed, 70 insertions(+), 149 deletions(-)

diff --git a/docs/sql-ref-syntax-aux-analyze-table.md 
b/docs/sql-ref-syntax-aux-analyze-table.md
index da53385..0e65de1 100644
--- a/docs/sql-ref-syntax-aux-analyze-table.md
+++ b/docs/sql-ref-syntax-aux-analyze-table.md
@@ -21,7 +21,8 @@ license: |
 
 ### Description
 
-The `ANALYZE TABLE` statement collects statistics about the table to be used 
by the query optimizer to find a better query execution plan.
+The `ANALYZE TABLE` statement collects statistics about one specific table or 
all the tables in one specified database,
+that are to be used by the query optimizer to find a better query execution 
plan.
 
 ### Syntax
 
@@ -30,6 +31,10 @@ ANALYZE TABLE table_identifier [ partition_spec ]
 COMPUTE STATISTICS [ NOSCAN | FOR COLUMNS col [ , ... ] | FOR ALL COLUMNS ]
 ```
 
+```sql
+ANALYZE TABLES [ { FROM | IN } database_name ] COMPUTE STATISTICS [ NOSCAN ]
+```
+
 ### Parameters
 
 * **table_identifier**
@@ -45,22 +50,31 @@ ANALYZE TABLE table_identifier [ partition_spec ]
 
 **Syntax:** `PARTITION ( partition_col_name [ = partition_col_val ] [ , 
... ] )`
 
-* **[ NOSCAN `|` FOR COLUMNS col [ , ... ] `|` FOR ALL COLUMNS ]**
+* **{ FROM `|` IN } database_name**
+
+  Specifies the name of the database to be analyzed. Without a database name, 
`ANALYZE` collects all tables in the current database that the current user has 
permission to analyze.
+
+* **NOSCAN**
+
+  Collects only the table's size in bytes (which does not require scanning the 
entire table).
 
- * If no analyze option is specified, `ANALYZE TABLE` collects the table's 
number of rows and size in bytes.
- * **NOSCAN**
+* **FOR COLUMNS col [ , ... ] `|` FOR ALL COLUMNS**
 
-   Collects only the table's size in bytes (which does not require 
scanning the entire table).
- * **FOR COLUMNS col [ , ... ] `|` FOR ALL COLUMNS**
+  Collects column statistics for each column specified, or alternatively for 
every column, as well as table statistics.
 
-   Collects column statistics for each column specified, or alternatively 
for every column, as well as table statistics.
+If no analyze option is specified, both number of rows and size in bytes are 
collected.
 
 ### Examples
 
 ```sql
+CREATE DATABASE school_db;
+USE school_db;
+
+CREATE TABLE teachers (name STRING, teacher_id INT);
+INSERT INTO teachers VALUES ('Tom', 1), ('Jerry', 2);
+
 CREATE TABLE students (name STRING, student_id INT) PARTITIONED BY 
(student_id);
-INSERT INTO students PARTITION (student_id = 11) VALUES ('Mark');
-INSERT INTO students PARTITION (student_id = 22) VALUES ('John');
+INSERT INTO students VALUES ('Mark', 11), ('John', 22);
 
 ANALYZE TABLE students COMPUTE STATISTICS NOSCAN;
 
@@ -73,7 +87,6 @@ DESC EXTENDED students;
 | ...| ...|...|
 |  Statistics|   864 bytes|   |
 | ...| ...|...|
-|  Partition Provider| Catalog|   |
 +++---+
 
 ANALYZE TABLE students COMPUTE STATISTICS;
@@ -87,7 +100,6 @@ DESC EXTENDED students;
 | ...| ...|...|
 |  Statistics|   864 bytes, 2 rows|   |
 | ...| ...|...|
-|  Partition Provider| Catalog|   |
 

[spark] branch master updated: [SPARK-33687][SQL][DOC][FOLLOWUP] Merge the doc pages of ANALYZE TABLE and ANALYZE TABLES

2021-08-18 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 07d173a  [SPARK-33687][SQL][DOC][FOLLOWUP] Merge the doc pages of 
ANALYZE TABLE and ANALYZE TABLES
07d173a is described below

commit 07d173a8b0a19a2912905387bcda10e9db3c43c6
Author: Wenchen Fan 
AuthorDate: Thu Aug 19 11:04:05 2021 +0800

[SPARK-33687][SQL][DOC][FOLLOWUP] Merge the doc pages of ANALYZE TABLE and 
ANALYZE TABLES

### What changes were proposed in this pull request?

This is a followup of https://github.com/apache/spark/pull/30648

ANALYZE TABLE and TABLES are essentially the same command, it's weird to 
put them in 2 different doc pages. This PR proposes to merge them into one doc 
page.

### Why are the changes needed?

simplify the doc

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

N/A

Closes #33781 from cloud-fan/doc.

Authored-by: Wenchen Fan 
Signed-off-by: Wenchen Fan 
---
 docs/sql-ref-syntax-aux-analyze-table.md  |  85 +++
 docs/sql-ref-syntax-aux-analyze-tables.md | 110 --
 docs/sql-ref-syntax-aux-analyze.md|  23 ---
 docs/sql-ref-syntax.md|   1 -
 4 files changed, 70 insertions(+), 149 deletions(-)

diff --git a/docs/sql-ref-syntax-aux-analyze-table.md 
b/docs/sql-ref-syntax-aux-analyze-table.md
index da53385..0e65de1 100644
--- a/docs/sql-ref-syntax-aux-analyze-table.md
+++ b/docs/sql-ref-syntax-aux-analyze-table.md
@@ -21,7 +21,8 @@ license: |
 
 ### Description
 
-The `ANALYZE TABLE` statement collects statistics about the table to be used 
by the query optimizer to find a better query execution plan.
+The `ANALYZE TABLE` statement collects statistics about one specific table or 
all the tables in one specified database,
+that are to be used by the query optimizer to find a better query execution 
plan.
 
 ### Syntax
 
@@ -30,6 +31,10 @@ ANALYZE TABLE table_identifier [ partition_spec ]
 COMPUTE STATISTICS [ NOSCAN | FOR COLUMNS col [ , ... ] | FOR ALL COLUMNS ]
 ```
 
+```sql
+ANALYZE TABLES [ { FROM | IN } database_name ] COMPUTE STATISTICS [ NOSCAN ]
+```
+
 ### Parameters
 
 * **table_identifier**
@@ -45,22 +50,31 @@ ANALYZE TABLE table_identifier [ partition_spec ]
 
 **Syntax:** `PARTITION ( partition_col_name [ = partition_col_val ] [ , 
... ] )`
 
-* **[ NOSCAN `|` FOR COLUMNS col [ , ... ] `|` FOR ALL COLUMNS ]**
+* **{ FROM `|` IN } database_name**
+
+  Specifies the name of the database to be analyzed. Without a database name, 
`ANALYZE` collects all tables in the current database that the current user has 
permission to analyze.
+
+* **NOSCAN**
+
+  Collects only the table's size in bytes (which does not require scanning the 
entire table).
 
- * If no analyze option is specified, `ANALYZE TABLE` collects the table's 
number of rows and size in bytes.
- * **NOSCAN**
+* **FOR COLUMNS col [ , ... ] `|` FOR ALL COLUMNS**
 
-   Collects only the table's size in bytes (which does not require 
scanning the entire table).
- * **FOR COLUMNS col [ , ... ] `|` FOR ALL COLUMNS**
+  Collects column statistics for each column specified, or alternatively for 
every column, as well as table statistics.
 
-   Collects column statistics for each column specified, or alternatively 
for every column, as well as table statistics.
+If no analyze option is specified, both number of rows and size in bytes are 
collected.
 
 ### Examples
 
 ```sql
+CREATE DATABASE school_db;
+USE school_db;
+
+CREATE TABLE teachers (name STRING, teacher_id INT);
+INSERT INTO teachers VALUES ('Tom', 1), ('Jerry', 2);
+
 CREATE TABLE students (name STRING, student_id INT) PARTITIONED BY 
(student_id);
-INSERT INTO students PARTITION (student_id = 11) VALUES ('Mark');
-INSERT INTO students PARTITION (student_id = 22) VALUES ('John');
+INSERT INTO students VALUES ('Mark', 11), ('John', 22);
 
 ANALYZE TABLE students COMPUTE STATISTICS NOSCAN;
 
@@ -73,7 +87,6 @@ DESC EXTENDED students;
 | ...| ...|...|
 |  Statistics|   864 bytes|   |
 | ...| ...|...|
-|  Partition Provider| Catalog|   |
 +++---+
 
 ANALYZE TABLE students COMPUTE STATISTICS;
@@ -87,7 +100,6 @@ DESC EXTENDED students;
 | ...| ...|...|
 |  Statistics|   864 bytes, 2 rows|   |
 | ...| ...|...|
-|  Partition Provider| Catalog|   |
 +++---+
 
 ANALYZE TABLE students PARTITION (student_id = 11) COMPUTE STATISTICS;
@@ -101,7 +113,6 @@ DESC 

[spark] branch master updated: [SPARK-36147][SQL] Warn if less files visible after stats write in BasicWriteStatsTracker

2021-08-18 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2fc9c0b  [SPARK-36147][SQL] Warn if less files visible after stats 
write in BasicWriteStatsTracker
2fc9c0b is described below

commit 2fc9c0bfb5d43a6ee1dbbf941b84e4c3dd74d8ef
Author: tooptoop4 <33283496+toopto...@users.noreply.github.com>
AuthorDate: Thu Aug 19 10:31:10 2021 +0900

[SPARK-36147][SQL] Warn if less files visible after stats write in 
BasicWriteStatsTracker

### What changes were proposed in this pull request?

This log should at least be WARN not INFO (in 
org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala )
"Expected $numSubmittedFiles files, but only saw $numFiles."

### Why are the changes needed?

INFO logs don't indicate possible issue but WARN logs should

### Does this PR introduce any user-facing change?

Yes, Log level changed.

### How was this patch tested?

manual, trivial change

Closes #2 from tooptoop4/warn.

Authored-by: tooptoop4 <33283496+toopto...@users.noreply.github.com>
Signed-off-by: Hyukjin Kwon 
---
 .../apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
index f3815ab..a8fae66 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala
@@ -169,7 +169,7 @@ class BasicWriteTaskStatsTracker(
 }
 
 if (numSubmittedFiles != numFiles) {
-  logInfo(s"Expected $numSubmittedFiles files, but only saw $numFiles. " +
+  logWarning(s"Expected $numSubmittedFiles files, but only saw $numFiles. 
" +
 "This could be due to the output format not writing empty files, " +
 "or files being not immediately visible in the filesystem.")
 }

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] HyukjinKwon commented on pull request #355: Use ASF mail archives not defunct nabble links

2021-08-18 Thread GitBox


HyukjinKwon commented on pull request #355:
URL: https://github.com/apache/spark-website/pull/355#issuecomment-901519830


   lgtm2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated (ae07b63 -> 8862657)

2021-08-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ae07b63  [HOTFIX] Add missing deps update for commit protocol change
 add 8862657  [SPARK-34949][CORE][3.0] Prevent BlockManager reregister when 
Executor is shutting down

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/executor/Executor.scala |  2 +-
 .../org/apache/spark/executor/ExecutorSuite.scala  | 68 +-
 2 files changed, 53 insertions(+), 17 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.1 updated (31d771d -> 79ea014)

2021-08-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 31d771d  [SPARK-36400][SPARK-36398][SQL][WEBUI] Make ThriftServer 
recognize spark.sql.redaction.string.regex
 add 79ea014  [SPARK-35011][CORE][3.1] Avoid Block Manager registrations 
when StopExecutor msg is in-flight

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/HeartbeatReceiver.scala |   4 +-
 .../spark/storage/BlockManagerMasterEndpoint.scala | 118 +++--
 .../main/scala/org/apache/spark/util/Utils.scala   |   7 ++
 .../apache/spark/storage/BlockManagerSuite.scala   |  20 +++-
 4 files changed, 109 insertions(+), 40 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3

2021-08-18 Thread ueshin
This is an automated email from the ASF dual-hosted git repository.

ueshin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f2e593b  [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow 
pandas 1.3
f2e593b is described below

commit f2e593bcf1a1aa8dde9f73b77e4863ceed5a7e28
Author: itholic 
AuthorDate: Wed Aug 18 11:38:59 2021 -0700

[SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3

### What changes were proposed in this pull request?

This PR proposes to fix the behavior of `astype` for `CategoricalDtype` to 
follow pandas 1.3.

**Before:**
```python
>>> pcat
0a
1b
2c
dtype: category
Categories (3, object): ['a', 'b', 'c']

>>> pcat.astype(CategoricalDtype(["b", "c", "a"]))
0a
1b
2c
dtype: category
Categories (3, object): ['b', 'c', 'a']
```

**After:**
```python
>>> pcat
0a
1b
2c
dtype: category
Categories (3, object): ['a', 'b', 'c']

>>> pcat.astype(CategoricalDtype(["b", "c", "a"]))
0a
1b
2c
dtype: category
Categories (3, object): ['a', 'b', 'c']  # CategoricalDtype is not updated 
if dtype is the same
```

`CategoricalDtype` is treated as a same `dtype` if the unique values are 
the same.

```python
>>> pcat1 = pser.astype(CategoricalDtype(["b", "c", "a"]))
>>> pcat2 = pser.astype(CategoricalDtype(["a", "b", "c"]))
>>> pcat1.dtype == pcat2.dtype
True
```

### Why are the changes needed?

We should follow the latest pandas as much as possible.

### Does this PR introduce _any_ user-facing change?

Yes, the behavior is changed as example in the PR description.

### How was this patch tested?

Unittest

Closes #33757 from itholic/SPARK-36368.

Authored-by: itholic 
Signed-off-by: Takuya UESHIN 
---
 python/pyspark/pandas/categorical.py |  3 ++-
 python/pyspark/pandas/data_type_ops/categorical_ops.py   |  4 +++-
 .../pandas/tests/data_type_ops/test_categorical_ops.py   |  6 ++
 python/pyspark/pandas/tests/indexes/test_category.py | 16 +++-
 python/pyspark/pandas/tests/test_categorical.py  | 16 +++-
 5 files changed, 21 insertions(+), 24 deletions(-)

diff --git a/python/pyspark/pandas/categorical.py 
b/python/pyspark/pandas/categorical.py
index 77a3cee..fa11228 100644
--- a/python/pyspark/pandas/categorical.py
+++ b/python/pyspark/pandas/categorical.py
@@ -22,6 +22,7 @@ from pandas.api.types import CategoricalDtype, is_dict_like, 
is_list_like
 
 from pyspark.pandas.internal import InternalField
 from pyspark.pandas.spark import functions as SF
+from pyspark.pandas.data_type_ops.categorical_ops import _to_cat
 from pyspark.sql import functions as F
 from pyspark.sql.types import StructField
 
@@ -735,7 +736,7 @@ class CategoricalAccessor(object):
 return self._data.copy()
 else:
 dtype = CategoricalDtype(categories=new_categories, 
ordered=ordered)
-psser = self._data.astype(dtype)
+psser = _to_cat(self._data).astype(dtype)
 
 if inplace:
 internal = self._data._psdf._internal.with_new_spark_column(
diff --git a/python/pyspark/pandas/data_type_ops/categorical_ops.py 
b/python/pyspark/pandas/data_type_ops/categorical_ops.py
index b524cdd..c1be683 100644
--- a/python/pyspark/pandas/data_type_ops/categorical_ops.py
+++ b/python/pyspark/pandas/data_type_ops/categorical_ops.py
@@ -57,7 +57,9 @@ class CategoricalOps(DataTypeOps):
 def astype(self, index_ops: IndexOpsLike, dtype: Union[str, type, Dtype]) 
-> IndexOpsLike:
 dtype, _ = pandas_on_spark_type(dtype)
 
-if isinstance(dtype, CategoricalDtype) and cast(CategoricalDtype, 
dtype).categories is None:
+if isinstance(dtype, CategoricalDtype) and (
+(dtype.categories is None) or (index_ops.dtype == dtype)
+):
 return index_ops.copy()
 
 return _to_cat(index_ops).astype(dtype)
diff --git a/python/pyspark/pandas/tests/data_type_ops/test_categorical_ops.py 
b/python/pyspark/pandas/tests/data_type_ops/test_categorical_ops.py
index 11871ea..5e79eb3 100644
--- a/python/pyspark/pandas/tests/data_type_ops/test_categorical_ops.py
+++ b/python/pyspark/pandas/tests/data_type_ops/test_categorical_ops.py
@@ -192,13 +192,11 @@ class CategoricalOpsTest(PandasOnSparkTestCase, 
TestCasesUtils):
 self.assert_eq(pser.astype("category"), psser.astype("category"))
 
 cat_type = CategoricalDtype(categories=[3, 1, 2])
+# CategoricalDtype is not updated if the dtype is same from pandas 1.3.
 if LooseVersion(pd.__version__) >= LooseVersion("1.3"):
-# TODO(SPARK-36367): Fix the 

[spark] branch master updated: [SPARK-36388][SPARK-36386][PYTHON][FOLLOWUP] Fix DataFrame groupby-rolling and groupby-expanding to follow pandas 1.3

2021-08-18 Thread ueshin
This is an automated email from the ASF dual-hosted git repository.

ueshin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c91ae54  [SPARK-36388][SPARK-36386][PYTHON][FOLLOWUP] Fix DataFrame 
groupby-rolling and groupby-expanding to follow pandas 1.3
c91ae54 is described below

commit c91ae544fdd44c67fe1e4c73825570dbe71a3206
Author: itholic 
AuthorDate: Wed Aug 18 11:17:01 2021 -0700

[SPARK-36388][SPARK-36386][PYTHON][FOLLOWUP] Fix DataFrame groupby-rolling 
and groupby-expanding to follow pandas 1.3

### What changes were proposed in this pull request?

This PR is followup for https://github.com/apache/spark/pull/33646 to add 
missing tests.

### Why are the changes needed?

Some tests are missing

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Unittest

Closes #33776 from itholic/SPARK-36388-followup.

Authored-by: itholic 
Signed-off-by: Takuya UESHIN 
---
 .../pandas/tests/test_ops_on_diff_frames_groupby_expanding.py| 9 ++---
 .../pandas/tests/test_ops_on_diff_frames_groupby_rolling.py  | 9 ++---
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git 
a/python/pyspark/pandas/tests/test_ops_on_diff_frames_groupby_expanding.py 
b/python/pyspark/pandas/tests/test_ops_on_diff_frames_groupby_expanding.py
index 223adea..634cbd7 100644
--- a/python/pyspark/pandas/tests/test_ops_on_diff_frames_groupby_expanding.py
+++ b/python/pyspark/pandas/tests/test_ops_on_diff_frames_groupby_expanding.py
@@ -52,14 +52,17 @@ class 
OpsOnDiffFramesGroupByExpandingTest(PandasOnSparkTestCase, TestUtils):
 psdf = ps.from_pandas(pdf)
 kkey = ps.from_pandas(pkey)
 
+# The behavior of GroupBy.expanding is changed from pandas 1.3.
 if LooseVersion(pd.__version__) >= LooseVersion("1.3"):
-# TODO(SPARK-36367): Fix the behavior to follow pandas >= 1.3
-pass
-else:
 self.assert_eq(
 getattr(psdf.groupby(kkey).expanding(2), f)().sort_index(),
 getattr(pdf.groupby(pkey).expanding(2), f)().sort_index(),
 )
+else:
+self.assert_eq(
+getattr(psdf.groupby(kkey).expanding(2), f)().sort_index(),
+getattr(pdf.groupby(pkey).expanding(2), f)().drop("a", 
axis=1).sort_index(),
+)
 
 self.assert_eq(
 getattr(psdf.groupby(kkey)["b"].expanding(2), f)().sort_index(),
diff --git 
a/python/pyspark/pandas/tests/test_ops_on_diff_frames_groupby_rolling.py 
b/python/pyspark/pandas/tests/test_ops_on_diff_frames_groupby_rolling.py
index 4f97769..04ea448 100644
--- a/python/pyspark/pandas/tests/test_ops_on_diff_frames_groupby_rolling.py
+++ b/python/pyspark/pandas/tests/test_ops_on_diff_frames_groupby_rolling.py
@@ -50,14 +50,17 @@ class 
OpsOnDiffFramesGroupByRollingTest(PandasOnSparkTestCase, TestUtils):
 psdf = ps.from_pandas(pdf)
 kkey = ps.from_pandas(pkey)
 
+# The behavior of GroupBy.rolling is changed from pandas 1.3.
 if LooseVersion(pd.__version__) >= LooseVersion("1.3"):
-# TODO(SPARK-36367): Fix the behavior to follow pandas >= 1.3
-pass
-else:
 self.assert_eq(
 getattr(psdf.groupby(kkey).rolling(2), f)().sort_index(),
 getattr(pdf.groupby(pkey).rolling(2), f)().sort_index(),
 )
+else:
+self.assert_eq(
+getattr(psdf.groupby(kkey).rolling(2), f)().sort_index(),
+getattr(pdf.groupby(pkey).rolling(2), f)().drop("a", 
axis=1).sort_index(),
+)
 
 self.assert_eq(
 getattr(psdf.groupby(kkey)["b"].rolling(2), f)().sort_index(),

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] srowen closed pull request #355: Use ASF mail archives not defunct nabble links

2021-08-18 Thread GitBox


srowen closed pull request #355:
URL: https://github.com/apache/spark-website/pull/355


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark-website] branch asf-site updated: Use ASF mail archives not defunct nabble links

2021-08-18 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new dc9faff  Use ASF mail archives not defunct nabble links
dc9faff is described below

commit dc9faff4a121070d58fc0f145d8f0a3521074fb3
Author: Sean Owen 
AuthorDate: Wed Aug 18 12:58:05 2021 -0500

Use ASF mail archives not defunct nabble links

Nabble archive links appear to not work anymore. Use ASF pony mail links 
instead for archives.

Author: Sean Owen 

Closes #355 from srowen/Nabble.
---
 community.md| 16 
 faq.md  |  2 +-
 site/community.html | 16 
 site/faq.html   |  2 +-
 4 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/community.md b/community.md
index e8f2cf7..ebc438a 100644
--- a/community.md
+++ b/community.md
@@ -24,8 +24,8 @@ Some quick tips when using StackOverflow:
   - Search StackOverflow's 
   https://stackoverflow.com/questions/tagged/apache-spark;>`apache-spark`
 tag to see if 
   your question has already been answered
-  - Search the nabble archive for
-  http://apache-spark-user-list.1001560.n3.nabble.com/;>u...@spark.apache.org
 
+  - Search the ASF archive for
+  https://lists.apache.org/list.html?u...@spark.apache.org;>u...@spark.apache.org
 
 - Please follow the StackOverflow https://stackoverflow.com/help/how-to-ask;>code of conduct  
 - Always use the `apache-spark` tag when asking questions
 - Please also use a secondary tag to specify components so subject matter 
experts can more easily find them.
@@ -42,16 +42,16 @@ project, and scenarios, it is recommended you use the 
u...@spark.apache.org mail
 
 
   
-http://apache-spark-user-list.1001560.n3.nabble.com;>u...@spark.apache.org
 is for usage questions, help, and announcements.
+https://lists.apache.org/list.html?u...@spark.apache.org;>u...@spark.apache.org
 is for usage questions, help, and announcements.
 mailto:user-subscr...@spark.apache.org?subject=(send%20this%20email%20to%20subscribe)">(subscribe)
 mailto:user-unsubscr...@spark.apache.org?subject=(send%20this%20email%20to%20unsubscribe)">(unsubscribe)
-http://apache-spark-user-list.1001560.n3.nabble.com;>(archives)
+https://lists.apache.org/list.html?u...@spark.apache.org;>(archives)
   
   
-http://apache-spark-developers-list.1001551.n3.nabble.com;>d...@spark.apache.org
 is for people who want to contribute code to Spark.
+https://lists.apache.org/list.html?d...@spark.apache.org;>d...@spark.apache.org
 is for people who want to contribute code to Spark.
 mailto:dev-subscr...@spark.apache.org?subject=(send%20this%20email%20to%20subscribe)">(subscribe)
 mailto:dev-unsubscr...@spark.apache.org?subject=(send%20this%20email%20to%20unsubscribe)">(unsubscribe)
-http://apache-spark-developers-list.1001551.n3.nabble.com;>(archives)
+https://lists.apache.org/list.html?d...@spark.apache.org;>(archives)
   
 
 
@@ -60,8 +60,8 @@ Some quick tips when using email:
 - Prior to asking submitting questions, please:
   - Search StackOverflow at https://stackoverflow.com/questions/tagged/apache-spark;>`apache-spark`
 
   to see if your question has already been answered
-  - Search the nabble archive for
-  http://apache-spark-user-list.1001560.n3.nabble.com/;>u...@spark.apache.org
 
+  - Search the ASF archive for
+  https://lists.apache.org/list.html?u...@spark.apache.org;>u...@spark.apache.org
 
 - Tagging the subject line of your email will help you get a faster response, 
e.g. 
 `[Spark SQL]: Does Spark SQL support LEFT SEMI JOIN?`
 - Tags may help identify a topic by:
diff --git a/faq.md b/faq.md
index af57f26..0275c18 100644
--- a/faq.md
+++ b/faq.md
@@ -71,4 +71,4 @@ Please also refer to our
 
 Where can I get more help?
 
-Please post on StackOverflow's https://stackoverflow.com/questions/tagged/apache-spark;>apache-spark
 tag or http://apache-spark-user-list.1001560.n3.nabble.com;>Spark 
Users mailing list.  For more information, please refer to https://spark.apache.org/community.html#have-questions;>Have 
Questions?.  We'll be glad to help!
+Please post on StackOverflow's https://stackoverflow.com/questions/tagged/apache-spark;>apache-spark
 tag or https://lists.apache.org/list.html?u...@spark.apache.org;>Spark Users 
mailing list.  For more information, please refer to https://spark.apache.org/community.html#have-questions;>Have 
Questions?.  We'll be glad to help!
diff --git a/site/community.html b/site/community.html
index b779d37..f4e1fcf 100644
--- a/site/community.html
+++ b/site/community.html
@@ -219,8 +219,8 @@ as it is an active forum for Spark users questions 
and answers.
   Search StackOverflows 
 https://stackoverflow.com/questions/tagged/apache-spark;>apache-spark tag to see 
if 
 your question has already been answered
-  Search the nabble 

[GitHub] [spark-website] zero323 edited a comment on pull request #355: Use ASF mail archives not defunct nabble links

2021-08-18 Thread GitBox


zero323 edited a comment on pull request #355:
URL: https://github.com/apache/spark-website/pull/355#issuecomment-901305574


   I was looking into it and [the 
following](https://support.nabble.com/Downsizing-Nabble-td7609715.html):
   
   > Forum owners who want their forum preserved can post to this support forum 
to let us know to move their forum to that one server.  Note that we no longer 
do mailing list archiving, so if you own an old mailing list archive, there is 
no point to preserving it. 
   
   So it is seems like it definitely not going back in our case. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] zero323 commented on pull request #355: Use ASF mail archives not defunct nabble links

2021-08-18 Thread GitBox


zero323 commented on pull request #355:
URL: https://github.com/apache/spark-website/pull/355#issuecomment-901305574


   I was looking into it and [found 
this](https://support.nabble.com/Downsizing-Nabble-td7609715.html):
   
   > Forum owners who want their forum preserved can post to this support forum 
to let us know to move their forum to that one server.  Note that we no longer 
do mailing list archiving, so if you own an old mailing list archive, there is 
no point to preserving it.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] srowen opened a new pull request #355: Use ASF mail archives not defunct nabble links

2021-08-18 Thread GitBox


srowen opened a new pull request #355:
URL: https://github.com/apache/spark-website/pull/355


   Nabble archive links appear to not work anymore. Use ASF pony mail links 
instead for archives.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (707eefa -> 1859d9b)

2021-08-18 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 707eefa  [SPARK-36428][SQL][FOLLOWUP] Simplify the implementation of 
make_timestamp
 add 1859d9b  [SPARK-36407][CORE][SQL] Convert int to long to avoid 
potential integer multiplications overflow risk

No new revisions were added by this update.

Summary of changes:
 .../shuffle/checksum/ShuffleChecksumHelper.java|  2 +-
 .../org/apache/spark/util/collection/TestTimSort.java  | 18 +-
 .../spark/sql/execution/UnsafeKVExternalSorter.java|  2 +-
 3 files changed, 11 insertions(+), 11 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.2 updated: [SPARK-36428][SQL][FOLLOWUP] Simplify the implementation of make_timestamp

2021-08-18 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 3d69d0d  [SPARK-36428][SQL][FOLLOWUP] Simplify the implementation of 
make_timestamp
3d69d0d is described below

commit 3d69d0d0038a1065bb5d24430bf30da9a3463184
Author: gengjiaan 
AuthorDate: Wed Aug 18 22:57:06 2021 +0800

[SPARK-36428][SQL][FOLLOWUP] Simplify the implementation of make_timestamp

### What changes were proposed in this pull request?
The implement of https://github.com/apache/spark/pull/33665 make 
`make_timestamp` could accepts integer type as the seconds parameter.
This PR let `make_timestamp` accepts `decimal(16,6)` type as the seconds 
parameter and cast integer to `decimal(16,6)` is safe, so we can simplify the 
code.

### Why are the changes needed?
Simplify `make_timestamp`.

### Does this PR introduce _any_ user-facing change?
'No'.

### How was this patch tested?
New tests.

Closes #33775 from beliefer/SPARK-36428-followup.

Lead-authored-by: gengjiaan 
Co-authored-by: Jiaan Geng 
Signed-off-by: Gengliang Wang 
(cherry picked from commit 707eefa3c706561f904dad65f3e347028dafb6ea)
Signed-off-by: Gengliang Wang 
---
 .../catalyst/expressions/datetimeExpressions.scala | 33 -
 .../expressions/DateExpressionsSuite.scala | 43 +-
 .../test/resources/sql-tests/inputs/timestamp.sql  |  3 ++
 .../sql-tests/results/ansi/timestamp.sql.out   | 28 +-
 .../sql-tests/results/datetime-legacy.sql.out  | 26 -
 .../resources/sql-tests/results/timestamp.sql.out  | 26 -
 .../results/timestampNTZ/timestamp-ansi.sql.out| 28 +-
 .../results/timestampNTZ/timestamp.sql.out | 26 -
 8 files changed, 157 insertions(+), 56 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
index 84dfb41..0e74eff 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
@@ -2557,22 +2557,16 @@ case class MakeTimestamp(
 
   override def children: Seq[Expression] = Seq(year, month, day, hour, min, 
sec) ++ timezone
   // Accept `sec` as DecimalType to avoid loosing precision of microseconds 
while converting
-  // them to the fractional part of `sec`.
+  // them to the fractional part of `sec`. For accepts IntegerType as `sec` 
and integer can be
+  // casted into decimal safely, we use DecimalType(16, 6) which is wider than 
DecimalType(10, 0).
   override def inputTypes: Seq[AbstractDataType] =
-Seq(IntegerType, IntegerType, IntegerType, IntegerType, IntegerType,
-  TypeCollection(DecimalType(8, 6), IntegerType, NullType)) ++ 
timezone.map(_ => StringType)
+Seq(IntegerType, IntegerType, IntegerType, IntegerType, IntegerType, 
DecimalType(16, 6)) ++
+  timezone.map(_ => StringType)
   override def nullable: Boolean = if (failOnError) 
children.exists(_.nullable) else true
 
   override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression =
 copy(timeZoneId = Option(timeZoneId))
 
-  private lazy val toDecimal = sec.dataType match {
-case DecimalType() =>
-  (secEval: Any) => secEval.asInstanceOf[Decimal]
-case IntegerType =>
-  (secEval: Any) => Decimal(BigDecimal(secEval.asInstanceOf[Int]), 8, 6)
-  }
-
   private def toMicros(
   year: Int,
   month: Int,
@@ -2585,8 +2579,6 @@ case class MakeTimestamp(
   assert(secAndMicros.scale == 6,
 s"Seconds fraction must have 6 digits for microseconds but got 
${secAndMicros.scale}")
   val unscaledSecFrac = secAndMicros.toUnscaledLong
-  assert(secAndMicros.precision <= 8,
-s"Seconds and fraction cannot have more than 8 digits but got 
${secAndMicros.precision}")
   val totalMicros = unscaledSecFrac.toInt // 8 digits cannot overflow Int
   val seconds = Math.floorDiv(totalMicros, MICROS_PER_SECOND.toInt)
   val nanos = Math.floorMod(totalMicros, MICROS_PER_SECOND.toInt) * 
NANOS_PER_MICROS.toInt
@@ -2627,7 +2619,7 @@ case class MakeTimestamp(
   day.asInstanceOf[Int],
   hour.asInstanceOf[Int],
   min.asInstanceOf[Int],
-  toDecimal(sec),
+  sec.asInstanceOf[Decimal],
   zid)
   }
 
@@ -2635,7 +2627,6 @@ case class MakeTimestamp(
 val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
 val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName)
 val d = Decimal.getClass.getName.stripSuffix("$")
-val decimalValue = ctx.freshName("decimalValue")
 val 

[spark] branch master updated: [SPARK-36428][SQL][FOLLOWUP] Simplify the implementation of make_timestamp

2021-08-18 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 707eefa  [SPARK-36428][SQL][FOLLOWUP] Simplify the implementation of 
make_timestamp
707eefa is described below

commit 707eefa3c706561f904dad65f3e347028dafb6ea
Author: gengjiaan 
AuthorDate: Wed Aug 18 22:57:06 2021 +0800

[SPARK-36428][SQL][FOLLOWUP] Simplify the implementation of make_timestamp

### What changes were proposed in this pull request?
The implement of https://github.com/apache/spark/pull/33665 make 
`make_timestamp` could accepts integer type as the seconds parameter.
This PR let `make_timestamp` accepts `decimal(16,6)` type as the seconds 
parameter and cast integer to `decimal(16,6)` is safe, so we can simplify the 
code.

### Why are the changes needed?
Simplify `make_timestamp`.

### Does this PR introduce _any_ user-facing change?
'No'.

### How was this patch tested?
New tests.

Closes #33775 from beliefer/SPARK-36428-followup.

Lead-authored-by: gengjiaan 
Co-authored-by: Jiaan Geng 
Signed-off-by: Gengliang Wang 
---
 .../catalyst/expressions/datetimeExpressions.scala | 33 -
 .../expressions/DateExpressionsSuite.scala | 43 +-
 .../test/resources/sql-tests/inputs/timestamp.sql  |  3 ++
 .../sql-tests/results/ansi/timestamp.sql.out   | 28 +-
 .../sql-tests/results/datetime-legacy.sql.out  | 26 -
 .../resources/sql-tests/results/timestamp.sql.out  | 26 -
 .../results/timestampNTZ/timestamp-ansi.sql.out| 28 +-
 .../results/timestampNTZ/timestamp.sql.out | 26 -
 8 files changed, 157 insertions(+), 56 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
index 84dfb41..0e74eff 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
@@ -2557,22 +2557,16 @@ case class MakeTimestamp(
 
   override def children: Seq[Expression] = Seq(year, month, day, hour, min, 
sec) ++ timezone
   // Accept `sec` as DecimalType to avoid loosing precision of microseconds 
while converting
-  // them to the fractional part of `sec`.
+  // them to the fractional part of `sec`. For accepts IntegerType as `sec` 
and integer can be
+  // casted into decimal safely, we use DecimalType(16, 6) which is wider than 
DecimalType(10, 0).
   override def inputTypes: Seq[AbstractDataType] =
-Seq(IntegerType, IntegerType, IntegerType, IntegerType, IntegerType,
-  TypeCollection(DecimalType(8, 6), IntegerType, NullType)) ++ 
timezone.map(_ => StringType)
+Seq(IntegerType, IntegerType, IntegerType, IntegerType, IntegerType, 
DecimalType(16, 6)) ++
+  timezone.map(_ => StringType)
   override def nullable: Boolean = if (failOnError) 
children.exists(_.nullable) else true
 
   override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression =
 copy(timeZoneId = Option(timeZoneId))
 
-  private lazy val toDecimal = sec.dataType match {
-case DecimalType() =>
-  (secEval: Any) => secEval.asInstanceOf[Decimal]
-case IntegerType =>
-  (secEval: Any) => Decimal(BigDecimal(secEval.asInstanceOf[Int]), 8, 6)
-  }
-
   private def toMicros(
   year: Int,
   month: Int,
@@ -2585,8 +2579,6 @@ case class MakeTimestamp(
   assert(secAndMicros.scale == 6,
 s"Seconds fraction must have 6 digits for microseconds but got 
${secAndMicros.scale}")
   val unscaledSecFrac = secAndMicros.toUnscaledLong
-  assert(secAndMicros.precision <= 8,
-s"Seconds and fraction cannot have more than 8 digits but got 
${secAndMicros.precision}")
   val totalMicros = unscaledSecFrac.toInt // 8 digits cannot overflow Int
   val seconds = Math.floorDiv(totalMicros, MICROS_PER_SECOND.toInt)
   val nanos = Math.floorMod(totalMicros, MICROS_PER_SECOND.toInt) * 
NANOS_PER_MICROS.toInt
@@ -2627,7 +2619,7 @@ case class MakeTimestamp(
   day.asInstanceOf[Int],
   hour.asInstanceOf[Int],
   min.asInstanceOf[Int],
-  toDecimal(sec),
+  sec.asInstanceOf[Decimal],
   zid)
   }
 
@@ -2635,7 +2627,6 @@ case class MakeTimestamp(
 val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
 val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName)
 val d = Decimal.getClass.getName.stripSuffix("$")
-val decimalValue = ctx.freshName("decimalValue")
 val failOnErrorBranch = if (failOnError) "throw e;" else s"${ev.isNull} = 
true;"
 nullSafeCodeGen(ctx, ev, (year, 

[spark] branch branch-3.2 updated: [SPARK-36532][CORE] Fix deadlock in CoarseGrainedExecutorBackend.onDisconnected to avoid executor shutdown hang

2021-08-18 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 181d33e  [SPARK-36532][CORE] Fix deadlock in 
CoarseGrainedExecutorBackend.onDisconnected to avoid executor shutdown hang
181d33e is described below

commit 181d33e16edfb6fa5abde29de87634bdf4ce7e61
Author: yi.wu 
AuthorDate: Wed Aug 18 22:46:48 2021 +0800

[SPARK-36532][CORE] Fix deadlock in 
CoarseGrainedExecutorBackend.onDisconnected to avoid executor shutdown hang

### What changes were proposed in this pull request?

Instead of exiting the executor within the RpcEnv's thread, exit the 
executor in a separate thread.

### Why are the changes needed?

The current exit way in `onDisconnected` can cause the deadlock, which has 
the exact same root cause with https://github.com/apache/spark/pull/12012:

* `onDisconnected` -> `System.exit` are called in sequence in the thread of 
`MessageLoop.threadpool`
* `System.exit` triggers shutdown hooks and `executor.stop` is one of the 
hooks.
* `executor.stop` stops the `Dispatcher`, which waits for the 
`MessageLoop.threadpool`  to shutdown further.
* Thus, the thread which runs `System.exit` waits for hooks to be done, but 
the `MessageLoop.threadpool` in the hook waits that thread to finish. Finally, 
this mutual dependence results in the deadlock.

### Does this PR introduce _any_ user-facing change?

Yes, the executor shutdown won't hang.

### How was this patch tested?

Pass existing tests.

Closes #33759 from Ngone51/fix-executor-shutdown-hang.

Authored-by: yi.wu 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 996551fecee8c3549438c4f536f8ab9607c644c5)
Signed-off-by: Wenchen Fan 
---
 .../spark/executor/CoarseGrainedExecutorBackend.scala | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
 
b/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
index d18ffaa..ffcb30d 100644
--- 
a/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
+++ 
b/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
@@ -202,11 +202,17 @@ private[spark] class CoarseGrainedExecutorBackend(
   stopping.set(true)
   new Thread("CoarseGrainedExecutorBackend-stop-executor") {
 override def run(): Unit = {
-  // executor.stop() will call `SparkEnv.stop()` which waits until 
RpcEnv stops totally.
-  // However, if `executor.stop()` runs in some thread of RpcEnv, 
RpcEnv won't be able to
-  // stop until `executor.stop()` returns, which becomes a dead-lock 
(See SPARK-14180).
-  // Therefore, we put this line in a new thread.
-  executor.stop()
+  // `executor` can be null if there's any error in 
`CoarseGrainedExecutorBackend.onStart`
+  // or fail to create `Executor`.
+  if (executor == null) {
+System.exit(1)
+  } else {
+// executor.stop() will call `SparkEnv.stop()` which waits until 
RpcEnv stops totally.
+// However, if `executor.stop()` runs in some thread of RpcEnv, 
RpcEnv won't be able to
+// stop until `executor.stop()` returns, which becomes a dead-lock 
(See SPARK-14180).
+// Therefore, we put this line in a new thread.
+executor.stop()
+  }
 }
   }.start()
 
@@ -282,8 +288,7 @@ private[spark] class CoarseGrainedExecutorBackend(
   if (notifyDriver && driver.nonEmpty) {
 driver.get.send(RemoveExecutor(executorId, new 
ExecutorLossReason(reason)))
   }
-
-  System.exit(code)
+  self.send(Shutdown)
 } else {
   logInfo("Skip exiting executor since it's been already asked to exit 
before.")
 }

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (a1ecf83 -> 996551f)

2021-08-18 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a1ecf83  [SPARK-36451][BUILD] Ivy skips looking for source and doc pom
 add 996551f  [SPARK-36532][CORE] Fix deadlock in 
CoarseGrainedExecutorBackend.onDisconnected to avoid executor shutdown hang

No new revisions were added by this update.

Summary of changes:
 .../spark/executor/CoarseGrainedExecutorBackend.scala | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (281b00a -> a1ecf83)

2021-08-18 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 281b00a  [SPARK-34309][BUILD][FOLLOWUP] Upgrade Caffeine to 2.9.2
 add a1ecf83  [SPARK-36451][BUILD] Ivy skips looking for source and doc pom

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala | 6 ++
 1 file changed, 6 insertions(+)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org