[GitHub] [spark-website] gengliangwang commented on pull request #400: [SPARK-39512] Document docker image release steps

2022-06-24 Thread GitBox


gengliangwang commented on PR #400:
URL: https://github.com/apache/spark-website/pull/400#issuecomment-1166192744

   FYI I just published docker images for Spark 3.3. release
   https://hub.docker.com/r/apache/spark
   https://hub.docker.com/r/apache/spark-py
   https://hub.docker.com/r/apache/spark-r
   
   I will do send an email to the dev/user list if no issues found during the 
weekend
   cc @holdenk 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] gengliangwang commented on pull request #400: [SPARK-39512] Document docker image release steps

2022-06-24 Thread GitBox


gengliangwang commented on PR #400:
URL: https://github.com/apache/spark-website/pull/400#issuecomment-1166190151

   > Maybe, unlike maven repos though we don't have a staging location set up, 
I think we could ask ASF Infra to make us a staging location?
   
   We can publish RC images with a different tag, e.g. v3.4.0-rc1.
   After release, the images can be deleted.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] holdenk commented on pull request #400: [SPARK-39512] Document docker image release steps

2022-06-24 Thread GitBox


holdenk commented on PR #400:
URL: https://github.com/apache/spark-website/pull/400#issuecomment-1166189409

   Maybe, unlike maven repos though we don't have a staging location set up, I 
think we could ask ASF Infra to make us a staging location?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] gengliangwang commented on pull request #400: [SPARK-39512] Document docker image release steps

2022-06-24 Thread GitBox


gengliangwang commented on PR #400:
URL: https://github.com/apache/spark-website/pull/400#issuecomment-116616

   BTW, I think we should add the docker images into the RC vote email and let 
the community test them as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] gengliangwang commented on pull request #400: [SPARK-39512] Document docker image release steps

2022-06-24 Thread GitBox


gengliangwang commented on PR #400:
URL: https://github.com/apache/spark-website/pull/400#issuecomment-1166188788

   @holdenk I followed the steps and it works!
   I have built docker images on https://hub.docker.com/u/gengliangwang
   If @MaxGekk doesn't have permission to publish it, I can do it for him this 
time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] gengliangwang commented on a diff in pull request #400: [SPARK-39512] Document docker image release steps

2022-06-24 Thread GitBox


gengliangwang commented on code in PR #400:
URL: https://github.com/apache/spark-website/pull/400#discussion_r906444239


##
site/sitemap.xml:
##
@@ -941,27 +941,27 @@
   weekly
 
 
-  https://spark.apache.org/graphx/
+  https://spark.apache.org/news/

Review Comment:
   +1 @srowen. The changes on this file seem not necessary.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-39576][INFRA] Support GitHub Actions generate benchmark results using Scala 2.13

2022-06-24 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4bc6e19dde0 [SPARK-39576][INFRA] Support GitHub Actions generate 
benchmark results using Scala 2.13
4bc6e19dde0 is described below

commit 4bc6e19dde0eae5d100b7bfdfcf22e719fd59cb5
Author: yangjie01 
AuthorDate: Fri Jun 24 09:09:34 2022 -0700

[SPARK-39576][INFRA] Support GitHub Actions generate benchmark results 
using Scala 2.13

### What changes were proposed in this pull request?
This pr aims let `benchmark` GitHub Actions support the specified Scala 
version, then it can produce benchmark results using Scala 2.13.

### Why are the changes needed?
Help us check the microbenchmark results using Scala 2.13 and ensure they 
are not slower than using Scala 2.12

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

- Pass Github Actions
-

Closes #36975 from LuciferYang/213-bench.

Authored-by: yangjie01 
Signed-off-by: Dongjoon Hyun 
---
 .github/workflows/benchmark.yml | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/.github/workflows/benchmark.yml b/.github/workflows/benchmark.yml
index 91e168210fb..a322fe065b5 100644
--- a/.github/workflows/benchmark.yml
+++ b/.github/workflows/benchmark.yml
@@ -30,6 +30,10 @@ on:
 description: 'JDK version: 8, 11 or 17'
 required: true
 default: '8'
+  scala:
+description: 'Scala version: 2.12 or 2.13'
+required: true
+default: '2.12'
   failfast:
 description: 'Failfast: true or false'
 required: true
@@ -53,7 +57,7 @@ jobs:
   run: echo "::set-output name=matrix::["`seq -s, 1 
$SPARK_BENCHMARK_NUM_SPLITS`"]"
 
   benchmark:
-name: "Run benchmarks: ${{ github.event.inputs.class }} (JDK ${{ 
github.event.inputs.jdk }}, ${{ matrix.split }} out of ${{ 
github.event.inputs.num-splits }} splits)"
+name: "Run benchmarks: ${{ github.event.inputs.class }} (JDK ${{ 
github.event.inputs.jdk }}, Scala ${{ github.event.inputs.scala }}, ${{ 
matrix.split }} out of ${{ github.event.inputs.num-splits }} splits)"
 needs: matrix-gen
 # Ubuntu 20.04 is the latest LTS. The next LTS is 22.04.
 runs-on: ubuntu-20.04
@@ -99,7 +103,8 @@ jobs:
 java-version: ${{ github.event.inputs.jdk }}
 - name: Run benchmarks
   run: |
-./build/sbt -Pyarn -Pmesos -Pkubernetes -Phive -Phive-thriftserver 
-Phadoop-cloud -Pkinesis-asl -Pspark-ganglia-lgpl test:package
+dev/change-scala-version.sh ${{ github.event.inputs.scala }}
+./build/sbt -Pscala-${{ github.event.inputs.scala }} -Pyarn -Pmesos 
-Pkubernetes -Phive -Phive-thriftserver -Phadoop-cloud -Pkinesis-asl 
-Pspark-ganglia-lgpl test:package
 # Make less noisy
 cp conf/log4j2.properties.template conf/log4j2.properties
 sed -i 's/rootLogger.level = info/rootLogger.level = warn/g' 
conf/log4j2.properties
@@ -109,13 +114,15 @@ jobs:
   --jars "`find . -name '*-SNAPSHOT-tests.jar' -o -name 
'*avro*-SNAPSHOT.jar' | paste -sd ',' -`" \
   "`find . -name 'spark-core*-SNAPSHOT-tests.jar'`" \
   "${{ github.event.inputs.class }}"
+# Revert to default Scala version to clean up unnecessary git diff
+dev/change-scala-version.sh 2.12
 # To keep the directory structure and file permissions, tar them
 # See also 
https://github.com/actions/upload-artifact#maintaining-file-permissions-and-case-sensitive-files
 echo "Preparing the benchmark results:"
-tar -cvf benchmark-results-${{ github.event.inputs.jdk }}.tar `git 
diff --name-only` `git ls-files --others --exclude-standard`
+tar -cvf benchmark-results-${{ github.event.inputs.jdk }}-${{ 
github.event.inputs.scala }}.tar `git diff --name-only` `git ls-files --others 
--exclude-standard`
 - name: Upload benchmark results
   uses: actions/upload-artifact@v2
   with:
-name: benchmark-results-${{ github.event.inputs.jdk }}-${{ 
matrix.split }}
-path: benchmark-results-${{ github.event.inputs.jdk }}.tar
+name: benchmark-results-${{ github.event.inputs.jdk }}-${{ 
github.event.inputs.scala }}-${{ matrix.split }}
+path: benchmark-results-${{ github.event.inputs.jdk }}-${{ 
github.event.inputs.scala }}.tar
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (1bb272de332 -> 299cdfad881)

2022-06-24 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 1bb272de332 [SPARK-39453][SQL] DS V2 supports push down misc 
non-aggregate functions(non ANSI)
 add 299cdfad881 [SPARK-39506][SQL] Make CacheTable, isCached, 
UncacheTable, setCurrentCatalog, currentCatalog, listCatalogs 3l namespace 
compatible

No new revisions were added by this update.

Summary of changes:
 project/MimaExcludes.scala |  7 +-
 .../org/apache/spark/sql/catalog/Catalog.scala | 21 ++
 .../org/apache/spark/sql/catalog/interface.scala   | 19 +
 .../apache/spark/sql/internal/CatalogImpl.scala| 54 --
 .../apache/spark/sql/internal/CatalogSuite.scala   | 86 --
 5 files changed, 141 insertions(+), 46 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (4ad7386eefe -> 1bb272de332)

2022-06-24 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 4ad7386eefe [SPARK-38978][SQL] DS V2 supports push down OFFSET operator
 add 1bb272de332 [SPARK-39453][SQL] DS V2 supports push down misc 
non-aggregate functions(non ANSI)

No new revisions were added by this update.

Summary of changes:
 .../expressions/GeneralScalarExpression.java   | 18 ++
 .../sql/connector/util/V2ExpressionSQLBuilder.java |  3 +++
 .../sql/catalyst/util/V2ExpressionBuilder.scala| 28 ++
 .../org/apache/spark/sql/jdbc/H2Dialect.scala  |  4 ++--
 .../org/apache/spark/sql/jdbc/JDBCV2Suite.scala| 20 
 5 files changed, 71 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-38978][SQL] DS V2 supports push down OFFSET operator

2022-06-24 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4ad7386eefe [SPARK-38978][SQL] DS V2 supports push down OFFSET operator
4ad7386eefe is described below

commit 4ad7386eefe0856e500d1a11e2bb992a045ff217
Author: Jiaan Geng 
AuthorDate: Fri Jun 24 17:33:07 2022 +0800

[SPARK-38978][SQL] DS V2 supports push down OFFSET operator

### What changes were proposed in this pull request?
Currently, DS V2 push-down supports `LIMIT` but `OFFSET`.
If we can pushing down `OFFSET` to JDBC data source, it will be better 
performance.

### Why are the changes needed?
push down `OFFSET` could improves the performance.

### Does this PR introduce _any_ user-facing change?
'No'.
New feature.

### How was this patch tested?
New tests.

Closes #36295 from beliefer/SPARK-38978.

Authored-by: Jiaan Geng 
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/connector/read/ScanBuilder.java  |   3 +-
 .../sql/connector/read/SupportsPushDownLimit.java  |   4 +-
 ...canBuilder.java => SupportsPushDownOffset.java} |  17 +-
 .../sql/connector/read/SupportsPushDownTopN.java   |  23 +-
 .../main/scala/org/apache/spark/sql/Dataset.scala  |   2 +-
 .../spark/sql/execution/DataSourceScanExec.scala   |   9 +-
 .../execution/datasources/DataSourceStrategy.scala |   6 +-
 .../execution/datasources/jdbc/JDBCOptions.scala   |   5 +
 .../sql/execution/datasources/jdbc/JDBCRDD.scala   |  12 +-
 .../execution/datasources/jdbc/JDBCRelation.scala  |   6 +-
 .../execution/datasources/v2/PushDownUtils.scala   |  15 +-
 .../datasources/v2/PushedDownOperators.scala   |   1 +
 .../datasources/v2/V2ScanRelationPushDown.scala|  75 -
 .../execution/datasources/v2/jdbc/JDBCScan.scala   |   5 +-
 .../datasources/v2/jdbc/JDBCScanBuilder.scala  |  20 +-
 .../org/apache/spark/sql/jdbc/JdbcDialects.scala   |   7 +
 .../org/apache/spark/sql/jdbc/JDBCV2Suite.scala| 352 -
 17 files changed, 514 insertions(+), 48 deletions(-)

diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/ScanBuilder.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/ScanBuilder.java
index 27ee534d804..f5ce604148b 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/ScanBuilder.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/ScanBuilder.java
@@ -23,7 +23,8 @@ import org.apache.spark.annotation.Evolving;
  * An interface for building the {@link Scan}. Implementations can mixin 
SupportsPushDownXYZ
  * interfaces to do operator push down, and keep the operator push down result 
in the returned
  * {@link Scan}. When pushing down operators, the push down order is:
- * sample - filter - aggregate - limit - column pruning.
+ * sample - filter - aggregate - limit/top-n(sort + limit) - 
offset -
+ * column pruning.
  *
  * @since 3.0.0
  */
diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsPushDownLimit.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsPushDownLimit.java
index 035154d0845..8a725cd7ed7 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsPushDownLimit.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsPushDownLimit.java
@@ -21,8 +21,8 @@ import org.apache.spark.annotation.Evolving;
 
 /**
  * A mix-in interface for {@link ScanBuilder}. Data sources can implement this 
interface to
- * push down LIMIT. Please note that the combination of LIMIT with other 
operations
- * such as AGGREGATE, GROUP BY, SORT BY, CLUSTER BY, DISTRIBUTE BY, etc. is 
NOT pushed down.
+ * push down LIMIT. We can push down LIMIT with many other operations if they 
follow the
+ * operator order we defined in {@link ScanBuilder}'s class doc.
  *
  * @since 3.3.0
  */
diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/ScanBuilder.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsPushDownOffset.java
similarity index 68%
copy from 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/ScanBuilder.java
copy to 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsPushDownOffset.java
index 27ee534d804..ffa2cad3715 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/ScanBuilder.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsPushDownOffset.java
@@ -20,14 +20,17 @@ package org.apache.spark.sql.connector.read;
 import org.apache.spark.annotation.Evolving;
 
 /**
- * An interface for building the {@link Scan}. Implementations can mixin 
SupportsPushDownXYZ
- * interfaces to do operator push down, and keep the operator push down result 
in the returned