[spark] branch master updated: [SPARK-35991][SQL][FOLLOWUP] Add back protected modifier of sparkConf to TPCBase

2021-08-23 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e83f8a8  [SPARK-35991][SQL][FOLLOWUP] Add back protected modifier of 
sparkConf to TPCBase
e83f8a8 is described below

commit e83f8a872a16d4f049cefb1fc445f91cf84443ad
Author: Angerszh 
AuthorDate: Tue Aug 24 11:32:30 2021 +0900

[SPARK-35991][SQL][FOLLOWUP] Add back protected modifier of sparkConf to 
TPCBase

### What changes were proposed in this pull request?
Add back protected modifier of sparkConf to TPCBase according to 
https://github.com/apache/spark/pull/33736/files#r694054229

### Why are the changes needed?
Add back protected modifier of sparkConf to TPCBase

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Not need

Closes #33813 from AngersZh/SPARK-35991-FOLLOWUP.

Authored-by: Angerszh 
Signed-off-by: Hyukjin Kwon 
---
 sql/core/src/test/scala/org/apache/spark/sql/TPCBase.scala | 2 +-
 sql/core/src/test/scala/org/apache/spark/sql/TPCDSQuerySuite.scala | 2 +-
 sql/core/src/test/scala/org/apache/spark/sql/TPCDSQueryTestSuite.scala | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/sql/core/src/test/scala/org/apache/spark/sql/TPCBase.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/TPCBase.scala
index b1ea70d..1764584 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/TPCBase.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/TPCBase.scala
@@ -25,7 +25,7 @@ trait TPCBase extends SharedSparkSession {
 
   protected def injectStats: Boolean = false
 
-  override def sparkConf: SparkConf = {
+  override protected def sparkConf: SparkConf = {
 if (injectStats) {
   super.sparkConf
 .set(SQLConf.MAX_TO_STRING_FIELDS, Int.MaxValue)
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/TPCDSQuerySuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/TPCDSQuerySuite.scala
index cab117c..22e1b83 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/TPCDSQuerySuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/TPCDSQuerySuite.scala
@@ -73,6 +73,6 @@ class TPCDSQueryWithStatsSuite extends TPCDSQuerySuite {
 
 @ExtendedSQLTest
 class TPCDSQueryANSISuite extends TPCDSQuerySuite {
-  override def sparkConf: SparkConf =
+  override protected def sparkConf: SparkConf =
 super.sparkConf.set(SQLConf.ANSI_ENABLED, true)
 }
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/TPCDSQueryTestSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/TPCDSQueryTestSuite.scala
index 0f16d25..3e7f898 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/TPCDSQueryTestSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/TPCDSQueryTestSuite.scala
@@ -57,7 +57,7 @@ class TPCDSQueryTestSuite extends QueryTest with TPCDSBase 
with SQLQueryTestHelp
   private val regenerateGoldenFiles = 
sys.env.get("SPARK_GENERATE_GOLDEN_FILES").exists(_ == "1")
 
   // To make output results deterministic
-  override def sparkConf: SparkConf = super.sparkConf
+  override protected def sparkConf: SparkConf = super.sparkConf
 .set(SQLConf.SHUFFLE_PARTITIONS.key, "1")
 
   protected override def createSparkSession: TestSparkSession = {

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (cd23426 -> fa53aa0)

2021-08-23 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from cd23426  [SPARK-34952][SQL][FOLLOWUP] Move aggregates to a separate 
package
 add fa53aa0  [SPARK-36560][PYTHON][INFRA] Deflake PySpark coverage job

No new revisions were added by this update.

Summary of changes:
 python/pyspark/mllib/tests/test_streaming_algorithms.py | 2 +-
 python/pyspark/sql/tests/test_streaming.py  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (9f595c4 -> cd23426)

2021-08-23 Thread viirya
This is an automated email from the ASF dual-hosted git repository.

viirya pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9f595c4  [SPARK-36418][SPARK-36536][SQL][DOCS][FOLLOWUP] Update the 
SQL migration guide about using `CAST` in datetime parsing
 add cd23426  [SPARK-34952][SQL][FOLLOWUP] Move aggregates to a separate 
package

No new revisions were added by this update.

Summary of changes:
 .../sql/connector/expressions/{ => aggregate}/AggregateFunc.java   | 7 ---
 .../sql/connector/expressions/{ => aggregate}/Aggregation.java | 7 ---
 .../spark/sql/connector/expressions/{ => aggregate}/Count.java | 3 ++-
 .../spark/sql/connector/expressions/{ => aggregate}/CountStar.java | 2 +-
 .../spark/sql/connector/expressions/{ => aggregate}/Max.java   | 3 ++-
 .../spark/sql/connector/expressions/{ => aggregate}/Min.java   | 3 ++-
 .../spark/sql/connector/expressions/{ => aggregate}/Sum.java   | 3 ++-
 .../spark/sql/connector/read/SupportsPushDownAggregates.java   | 2 +-
 .../scala/org/apache/spark/sql/execution/DataSourceScanExec.scala  | 2 +-
 .../spark/sql/execution/datasources/DataSourceStrategy.scala   | 3 ++-
 .../org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala  | 2 +-
 .../apache/spark/sql/execution/datasources/v2/PushDownUtils.scala  | 3 ++-
 .../sql/execution/datasources/v2/V2ScanRelationPushDown.scala  | 2 +-
 .../spark/sql/execution/datasources/v2/jdbc/JDBCScanBuilder.scala  | 2 +-
 14 files changed, 26 insertions(+), 18 deletions(-)
 rename sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/{ 
=> aggregate}/AggregateFunc.java (89%)
 rename sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/{ 
=> aggregate}/Aggregation.java (91%)
 rename sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/{ 
=> aggregate}/Count.java (92%)
 rename sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/{ 
=> aggregate}/CountStar.java (94%)
 rename sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/{ 
=> aggregate}/Max.java (91%)
 rename sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/{ 
=> aggregate}/Min.java (91%)
 rename sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/{ 
=> aggregate}/Sum.java (92%)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.2 updated: [SPARK-34952][SQL][FOLLOWUP] Move aggregates to a separate package

2021-08-23 Thread viirya
This is an automated email from the ASF dual-hosted git repository.

viirya pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new e48de78  [SPARK-34952][SQL][FOLLOWUP] Move aggregates to a separate 
package
e48de78 is described below

commit e48de7884d218e2f156ee09031b8c9b05e7a2933
Author: Huaxin Gao 
AuthorDate: Mon Aug 23 15:31:13 2021 -0700

[SPARK-34952][SQL][FOLLOWUP] Move aggregates to a separate package

### What changes were proposed in this pull request?
Add `aggregate` package under 
`sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions` and 
move all the aggregates (e.g. `Count`, `Max`, `Min`, etc.) there.

### Why are the changes needed?
Right now these aggregates are under 
`sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions`. It 
looks OK now, but we plan to add a new `filter` package under `expressions` for 
all the DSV2 filters. It will look strange that filters have their own package, 
but aggregates don't.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing tests

Closes #33815 from huaxingao/agg_package.

Authored-by: Huaxin Gao 
Signed-off-by: Liang-Chi Hsieh 
(cherry picked from commit cd2342691d1182b14f6076f69793441d2aa03e85)
Signed-off-by: Liang-Chi Hsieh 
---
 .../sql/connector/expressions/{ => aggregate}/AggregateFunc.java   | 7 ---
 .../sql/connector/expressions/{ => aggregate}/Aggregation.java | 7 ---
 .../spark/sql/connector/expressions/{ => aggregate}/Count.java | 3 ++-
 .../spark/sql/connector/expressions/{ => aggregate}/CountStar.java | 2 +-
 .../spark/sql/connector/expressions/{ => aggregate}/Max.java   | 3 ++-
 .../spark/sql/connector/expressions/{ => aggregate}/Min.java   | 3 ++-
 .../spark/sql/connector/expressions/{ => aggregate}/Sum.java   | 3 ++-
 .../spark/sql/connector/read/SupportsPushDownAggregates.java   | 2 +-
 .../scala/org/apache/spark/sql/execution/DataSourceScanExec.scala  | 2 +-
 .../spark/sql/execution/datasources/DataSourceStrategy.scala   | 3 ++-
 .../org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala  | 2 +-
 .../apache/spark/sql/execution/datasources/v2/PushDownUtils.scala  | 3 ++-
 .../sql/execution/datasources/v2/V2ScanRelationPushDown.scala  | 2 +-
 .../spark/sql/execution/datasources/v2/jdbc/JDBCScanBuilder.scala  | 2 +-
 14 files changed, 26 insertions(+), 18 deletions(-)

diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/AggregateFunc.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/aggregate/AggregateFunc.java
similarity index 89%
rename from 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/AggregateFunc.java
rename to 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/aggregate/AggregateFunc.java
index eea8c31..6683f73 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/AggregateFunc.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/aggregate/AggregateFunc.java
@@ -15,12 +15,13 @@
  * limitations under the License.
  */
 
-package org.apache.spark.sql.connector.expressions;
-
-import org.apache.spark.annotation.Evolving;
+package org.apache.spark.sql.connector.expressions.aggregate;
 
 import java.io.Serializable;
 
+import org.apache.spark.annotation.Evolving;
+import org.apache.spark.sql.connector.expressions.Expression;
+
 /**
  * Base class of the Aggregate Functions.
  *
diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/Aggregation.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/aggregate/Aggregation.java
similarity index 91%
rename from 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/Aggregation.java
rename to 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/aggregate/Aggregation.java
index 8eb3491..0392523 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/Aggregation.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/aggregate/Aggregation.java
@@ -15,12 +15,13 @@
  * limitations under the License.
  */
 
-package org.apache.spark.sql.connector.expressions;
-
-import org.apache.spark.annotation.Evolving;
+package org.apache.spark.sql.connector.expressions.aggregate;
 
 import java.io.Serializable;
 
+import org.apache.spark.annotation.Evolving;
+import org.apache.spark.sql.connector.expressions.FieldReference;
+
 /**
  * Aggregation in SQL statement.
  *
diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/Count.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/aggregate/Count.java
similarity index 92%

[spark] branch master updated: [SPARK-36418][SPARK-36536][SQL][DOCS][FOLLOWUP] Update the SQL migration guide about using `CAST` in datetime parsing

2021-08-23 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9f595c4  [SPARK-36418][SPARK-36536][SQL][DOCS][FOLLOWUP] Update the 
SQL migration guide about using `CAST` in datetime parsing
9f595c4 is described below

commit 9f595c4ce34728f5d8f943eadea8d85a548b2d41
Author: Max Gekk 
AuthorDate: Mon Aug 23 13:07:37 2021 +0300

[SPARK-36418][SPARK-36536][SQL][DOCS][FOLLOWUP] Update the SQL migration 
guide about using `CAST` in datetime parsing

### What changes were proposed in this pull request?
In the PR, I propose the update the SQL migration guide about the changes 
introduced by the PRs https://github.com/apache/spark/pull/33709 and 
https://github.com/apache/spark/pull/33769.

https://user-images.githubusercontent.com/1580697/130419710-640f20b3-6a38-4eb1-a6d6-2e069dc5665c.png;>

### Why are the changes needed?
To inform users about the upcoming changes in parsing datetime strings. 
This should help users to migrate on the new release.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By generating the doc, and checking by eyes:
```
$ SKIP_API=1 SKIP_RDOC=1 SKIP_PYTHONDOC=1 SKIP_SCALADOC=1 bundle exec 
jekyll build
```

Closes #33809 from MaxGekk/datetime-cast-migr-guide.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 docs/sql-migration-guide.md | 20 
 1 file changed, 20 insertions(+)

diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index 7ad384f..47e7921 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -26,6 +26,26 @@ license: |
 
   - Since Spark 3.3, Spark turns a non-nullable schema into nullable for API 
`DataFrameReader.schema(schema: StructType).json(jsonDataset: Dataset[String])` 
and `DataFrameReader.schema(schema: StructType).csv(csvDataset: 
Dataset[String])` when the schema is specified by the user and contains 
non-nullable fields.
 
+  - Since Spark 3.3, when the date or timestamp pattern is not specified, 
Spark converts an input string to a date/timestamp using the `CAST` expression 
approach. The changes affect CSV/JSON datasources and parsing of partition 
values. In Spark 3.2 or earlier, when the date or timestamp pattern is not set, 
Spark uses the default patterns: `-MM-dd` for dates and `-MM-dd 
HH:mm:ss` for timestamps. After the changes, Spark still recognizes the pattern 
together with
+
+Date patterns:
+  * `[+-]*`
+  * `[+-]*-[m]m`
+  * `[+-]*-[m]m-[d]d`
+  * `[+-]*-[m]m-[d]d `
+  * `[+-]*-[m]m-[d]d *`
+  * `[+-]*-[m]m-[d]dT*`
+
+Timestamp patterns:
+  * `[+-]*`
+  * `[+-]*-[m]m`
+  * `[+-]*-[m]m-[d]d`
+  * `[+-]*-[m]m-[d]d `
+  * `[+-]*-[m]m-[d]d [h]h:[m]m:[s]s.[ms][ms][ms][us][us][us][zone_id]`
+  * `[+-]*-[m]m-[d]dT[h]h:[m]m:[s]s.[ms][ms][ms][us][us][us][zone_id]`
+  * `[h]h:[m]m:[s]s.[ms][ms][ms][us][us][us][zone_id]`
+  * `T[h]h:[m]m:[s]s.[ms][ms][ms][us][us][us][zone_id]`
+
 ## Upgrading from Spark SQL 3.1 to 3.2
 
   - Since Spark 3.2, ADD FILE/JAR/ARCHIVE commands require each path to be 
enclosed by `"` or `'` if the path contains whitespaces.

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] cloud-fan commented on a change in pull request #356: Improve the guideline of Preparing gpg key

2021-08-23 Thread GitBox


cloud-fan commented on a change in pull request #356:
URL: https://github.com/apache/spark-website/pull/356#discussion_r693773768



##
File path: release-process.md
##
@@ -39,15 +39,90 @@ If you are a new Release Manager, you can read up on the 
process from the follow
 
 You can skip this section if you have already uploaded your key.
 
-After generating the gpg key, you need to upload your key to a public key 
server. Please refer to
-https://www.apache.org/dev/openpgp.html#generate-key;>https://www.apache.org/dev/openpgp.html#generate-key
-for details.
+Generate Key
 
-If you want to do the release on another machine, you can transfer your secret 
key to that machine
-via the `gpg --export-secret-keys` and `gpg --import` commands.
+Here's an example of gpg 2.0.12. If you use gpg version 1 series, please refer 
to https://www.apache.org/dev/openpgp.html#generate-key;>generate-key 
for details.
+
+```
+:::console
+$ gpg --full-gen-key
+gpg (GnuPG) 2.0.12; Copyright (C) 2009 Free Software Foundation, Inc.
+This is free software: you are free to change and redistribute it.
+There is NO WARRANTY, to the extent permitted by law.
+
+Please select what kind of key you want:
+   (1) RSA and RSA (default)
+   (2) DSA and Elgamal
+   (3) DSA (sign only)
+   (4) RSA (sign only)
+Your selection? 1
+RSA keys may be between 1024 and 4096 bits long.
+What keysize do you want? (2048) 4096
+Requested keysize is 4096 bits
+Please specify how long the key should be valid.
+ 0 = key does not expire
+= key expires in n days
+  w = key expires in n weeks
+  m = key expires in n months
+  y = key expires in n years
+Key is valid for? (0) 
+Key does not expire at all
+Is this correct? (y/N) y
+
+GnuPG needs to construct a user ID to identify your key.
+
+Real name: Robert Burrell Donkin
+Email address: rdon...@apache.org
+Comment: CODE SIGNING KEY
+You selected this USER-ID:
+"Robert Burrell Donkin (CODE SIGNING KEY) "
+
+Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? O
+We need to generate a lot of random bytes. It is a good idea to perform
+some other action (type on the keyboard, move the mouse, utilize the
+disks) during the prime generation; this gives the random number
+generator a better chance to gain enough entropy.
+We need to generate a lot of random bytes. It is a good idea to perform
+some other action (type on the keyboard, move the mouse, utilize the
+disks) during the prime generation; this gives the random number
+generator a better chance to gain enough entropy.
+gpg: key 04B3B5C426A27D33 marked as ultimately trusted
+gpg: revocation certificate stored as 
'/home/ubuntu/.gnupg/openpgp-revocs.d/08071B1E23C8A7E2CA1E891A04B3B5C426A27D33.rev'
+public and secret key created and signed.
+
+pub   rsa4096 2021-08-19 [SC]
+  08071B1E23C8A7E2CA1E891A04B3B5C426A27D33
+uid  Jack (test) 
+sub   rsa4096 2021-08-19 [E]
+```
+
+Note that the last 8 digits (26A27D33) of the public key is the https://infra.apache.org/release-signing.html#key-id;>key ID.
 
-The last step is to update the KEYS file with your code signing key
-https://www.apache.org/dev/openpgp.html#export-public-key;>https://www.apache.org/dev/openpgp.html#export-public-key
+Upload Key
+
+After generating the public key, we should upload it to a https://infra.apache.org/release-signing.html#keyserver;>public key 
server.
+You can upload:
+
+either use gpg command:
+
+```
+$ gpg --keyserver keys.openpgp.org --send-key 26A27D33
+```
+
+or copy-paste the ASCII-armored public key to http://keyserver.ubuntu.com:11371/#submitKey;>OpenPGP Keyserver.
+The ASCII-armored public key can be generated by:
+
+```
+:::console
+$ gpg --export --armor 26A27D33
+```
+
+Please refer to https://infra.apache.org/release-signing.html#keyserver-upload;>keyserver-upload
 for details.
+
+Update KEYS file with your code signing key
+
+The code signing key is exactly the same with the ASCII-armored public key 
mentioned above.
+You should append it to https://dist.apache.org/repos/dist/dev/spark/KEYS;>KEYS by:

Review comment:
   ```suggestion
   You should append it to the KEYS file by:
   ```
   
   It doesn't seem necessary to add url for `KEYS`. People need to run the svn 
command below anyway.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (0b6af46 -> adc485a)

2021-08-23 Thread sarutak
This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 0b6af46  [SPARK-36470][PYTHON] Implement `CategoricalIndex.map` and 
`DatetimeIndex.map`
 add adc485a  [MINOR][DOCS] Mention Hadoop 3 in YARN introduction on 
cluster-overview.md

No new revisions were added by this update.

Summary of changes:
 docs/cluster-overview.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org