date:20211016

[GitHub] [spark-website] dongjoon-hyun commented on pull request #361: [BLOCKED BY PYPI FOR NOW] Add 3.2.0 release note and news and update links

2021-10-16 Thread GitBox



dongjoon-hyun commented on pull request #361:
URL: https://github.com/apache/spark-website/pull/361#issuecomment-945038523


   Do you think that is a blocker for Apache Foundation Software, @gatorsmile ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] dongjoon-hyun edited a comment on pull request #361: [BLOCKED BY PYPI FOR NOW] Add 3.2.0 release note and news and update links

2021-10-16 Thread GitBox



dongjoon-hyun edited a comment on pull request #361:
URL: https://github.com/apache/spark-website/pull/361#issuecomment-945038523


   Do you think that is a blocker for Apache Foundation Software release, 
@gatorsmile ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] gatorsmile commented on a change in pull request #361: [BLOCKED BY PYPI FOR NOW] Add 3.2.0 release note and news and update links

2021-10-16 Thread GitBox



gatorsmile commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r730217461



##
File path: releases/_posts/2021-10-13-spark-release-3-2-0.md
##
@@ -0,0 +1,318 @@
+---
+layout: post
+title: Spark Release 3.2.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+_edit_last: '4'
+_wpas_done_all: '1'
+---
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users 
can scale out their applications on Spark with one line code change. Other 
major updates include RocksDB StateStore support, session window support, 
push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query 
Execution (AQE) by default, and ANSI SQL mode GA.
+
+To download Apache Spark 3.2.0, visit the 
[downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA 
for the [detailed 
changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407).
 We have curated a list of high level changes here, grouped by major modules.
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+### Highlights
+
+* Support Pandas API layer on PySpark 
([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849))
+* Support push-based shuffle to improve shuffle efficiency 
([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602))
+* Add RocksDB StateStore implementation 
([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198))
+* EventTime based sessionization (session window) 
([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816))
+* ANSI SQL mode GA 
([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030))
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* Enable adaptive query execution by default 
([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679))
+* Query compilation latency reduction 
([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), 
[SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), 
[SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+* Support Scala 2.13 
([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218))
+
+
+### Core and Spark SQL
+
+**ANSI SQL Compatibility Enhancements**
+
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* New type coercion syntax rules in ANSI mode 
([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246))
+* ANSI mode: IntegralDivide throws exception on overflow 
([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152))
+* ANSI mode: Check for overflow in Average 
([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955))
+* Block count(table.*) to follow ANSI standard and other SQL engines 
([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199))
+
+**Performance**
+
+* Query compilation latency
+  * Support traversal pruning in transform/resolve functions and their call 
sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042))
+  * Improve the performance of mapChildren and withNewChildren methods 
([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+  * Improve the performance of type coercion rules 
([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103))
+* Query optimization
+  * Remove redundant aggregates in the Optimizer 
([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122))
+  * Push down limit through Project with Join 
([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622))
+  * Push down limit for LEFT SEMI and LEFT ANTI join 
([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), 
[SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514))
+  * Push down limit through window when partitionSpec is empty 
([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575))
+  * Use a relative cost comparison function in the CBO 
([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922))
+  * Cardinality estimation of union, sort and range operator 
([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411))
+  * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as 
broadcast join 
([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081))
+  * UnwrapCastInBinaryComparison support In/InSet predicate 
([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316))
+  * Subexpression elimination enhancements 
([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448))
+  * Keep necessary stats after partition pruning 
([SPARK-34119](https://issues.apache.org/jira/browse/SPARK-34119))
+  * Decouple bucket filter pruning and

[spark] branch master updated (ee2647e -> f9cc7fb)

2021-10-16 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ee2647e  [SPARK-37008][SQL][TEST] Replace `UseCompressedOops` with 
`UseCompressedClassPointers` to pass `WholeStageCodegenSparkSubmitSuite` with 
Java 17
 add f9cc7fb  [SPARK-36992][SQL] Improve byte array sort perf by unify 
getPrefix function of UTF8String and ByteArray

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/unsafe/types/ByteArray.java   | 36 
 .../org/apache/spark/unsafe/types/UTF8String.java  | 38 +
 .../{LongArraySuite.java => ByteArraySuite.java}   | 39 +-
 3 files changed, 54 insertions(+), 59 deletions(-)
 copy 
common/unsafe/src/test/java/org/apache/spark/unsafe/array/{LongArraySuite.java 
=> ByteArraySuite.java} (53%)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (cf43623 -> ee2647e)

2021-10-16 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from cf43623  [SPARK-36900][SPARK-36464][CORE][TEST] Refactor `: size 
returns correct positive number even with over 2GB data` to pass with Java 8, 
11 and 17
 add ee2647e  [SPARK-37008][SQL][TEST] Replace `UseCompressedOops` with 
`UseCompressedClassPointers` to pass `WholeStageCodegenSparkSubmitSuite` with 
Java 17

No new revisions were added by this update.

Summary of changes:
 .../execution/WholeStageCodegenSparkSubmitSuite.scala   | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-36900][SPARK-36464][CORE][TEST] Refactor `: size returns correct positive number even with over 2GB data` to pass with Java 8, 11 and 17

2021-10-16 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new cf43623  [SPARK-36900][SPARK-36464][CORE][TEST] Refactor `: size 
returns correct positive number even with over 2GB data` to pass with Java 8, 
11 and 17
cf43623 is described below

commit cf436233072b75e083a4455dc53b22edba0b3957
Author: yangjie01 
AuthorDate: Sat Oct 16 09:10:06 2021 -0500

[SPARK-36900][SPARK-36464][CORE][TEST] Refactor `: size returns correct 
positive number even with over 2GB data` to pass with Java 8, 11 and 17

### What changes were proposed in this pull request?
Refactor `SPARK-36464: size returns correct positive number even with over 
2GB data` in `ChunkedByteBufferOutputStreamSuite` to reduce the total use of 
memory for this test case, then this case can pass with Java 8, Java 11 and 
Java 17 use `-Xmx4g`.

### Why are the changes needed?
`SPARK-36464: size returns correct positive number even with over 2GB data` 
pass with Java 8 but OOM with Java 11 and Java 17.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?

- Pass the Jenkins or GitHub Action
- Manual test
```
mvn clean install -pl core -am -Dtest=none 
-DwildcardSuites=org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite
```
with Java 8, Java 11 and Java 17, all tests passed.

Closes #34284 from LuciferYang/SPARK-36900.

Authored-by: yangjie01 
Signed-off-by: Sean Owen 
---
 .../spark/util/io/ChunkedByteBufferOutputStreamSuite.scala| 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git 
a/core/src/test/scala/org/apache/spark/util/io/ChunkedByteBufferOutputStreamSuite.scala
 
b/core/src/test/scala/org/apache/spark/util/io/ChunkedByteBufferOutputStreamSuite.scala
index 29443e2..0a61488 100644
--- 
a/core/src/test/scala/org/apache/spark/util/io/ChunkedByteBufferOutputStreamSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/util/io/ChunkedByteBufferOutputStreamSuite.scala
@@ -121,12 +121,13 @@ class ChunkedByteBufferOutputStreamSuite extends 
SparkFunSuite {
   }
 
   test("SPARK-36464: size returns correct positive number even with over 2GB 
data") {
-val ref = new Array[Byte](1024 * 1024 * 1024)
-val o = new ChunkedByteBufferOutputStream(1024 * 1024, ByteBuffer.allocate)
-o.write(ref)
-o.write(ref)
+val data4M = 1024 * 1024 * 4
+val writeTimes = 513
+val ref = new Array[Byte](data4M)
+val o = new ChunkedByteBufferOutputStream(data4M, ByteBuffer.allocate)
+(0 until writeTimes).foreach(_ => o.write(ref))
 o.close()
 assert(o.size > 0L) // make sure it is not overflowing
-assert(o.size == ref.length.toLong * 2)
+assert(o.size == ref.length.toLong * writeTimes)
   }
 }

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-36915][INFRA] Pin actions to a full length commit SHA

2021-10-16 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 00b87c9  [SPARK-36915][INFRA] Pin actions to a full length commit SHA
00b87c9 is described below

commit 00b87c967ff8217b64e597400f3248c375a74879
Author: Hyukjin Kwon 
AuthorDate: Sat Oct 16 08:53:19 2021 -0500

[SPARK-36915][INFRA] Pin actions to a full length commit SHA

### What changes were proposed in this pull request?
Pinning github actions to a SHA

### Why are the changes needed?
Pinning an action to a full length commit SHA is currently the only way to 
use an action as
an immutable release. Pinning to a particular SHA helps mitigate the risk 
of a bad actor adding a
backdoor to the action's repository, as they would need to generate a SHA-1 
collision for
a valid Git object payload.


https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions#using-third-party-actions


https://github.com/ossf/scorecard/blob/main/docs/checks.md#pinned-dependencies

### Does this PR introduce _any_ user-facing change?
Running github action and checking the SHA with the existing repository

### How was this patch tested?
Running the GitHub action

Closes #34163 from naveensrinivasan/naveen/feat/pin-github-actions.

Lead-authored-by: Hyukjin Kwon 
Co-authored-by: naveen <172697+naveensriniva...@users.noreply.github.com>
Signed-off-by: Sean Owen 
---
 .github/workflows/cancel_duplicate_workflow_runs.yml | 2 +-
 .github/workflows/labeler.yml| 2 +-
 .github/workflows/notify_test_workflow.yml   | 2 +-
 .github/workflows/publish_snapshot.yml   | 6 +++---
 .github/workflows/stale.yml  | 2 +-
 .github/workflows/test_report.yml| 4 ++--
 .github/workflows/update_build_status.yml| 2 +-
 7 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/.github/workflows/cancel_duplicate_workflow_runs.yml 
b/.github/workflows/cancel_duplicate_workflow_runs.yml
index 1077371..525c7e7 100644
--- a/.github/workflows/cancel_duplicate_workflow_runs.yml
+++ b/.github/workflows/cancel_duplicate_workflow_runs.yml
@@ -29,7 +29,7 @@ jobs:
 name: "Cancel duplicate workflow runs"
 runs-on: ubuntu-latest
 steps:
-  - uses: 
potiuk/cancel-workflow-runs@953e057dc81d3458935a18d1184c386b0f6b5738 # @master
+  - uses: 
potiuk/cancel-workflow-runs@4723494a065d162f8e9efd071b98e0126e00f866 # @master
 name: "Cancel duplicate workflow runs"
 with:
   cancelMode: allDuplicates
diff --git a/.github/workflows/labeler.yml b/.github/workflows/labeler.yml
index 98855f4..88d17bf 100644
--- a/.github/workflows/labeler.yml
+++ b/.github/workflows/labeler.yml
@@ -44,7 +44,7 @@ jobs:
 #
 # However, these are not in a published release and the current `main` 
branch
 # has some issues upon testing.
-- uses: actions/labeler@2.2.0
+- uses: actions/labeler@5f867a63be70efff62b767459b009290364495eb # 
pin@2.2.0
   with:
 repo-token: "${{ secrets.GITHUB_TOKEN }}"
 sync-labels: true
diff --git a/.github/workflows/notify_test_workflow.yml 
b/.github/workflows/notify_test_workflow.yml
index cc2b7a2..08c50cc 100644
--- a/.github/workflows/notify_test_workflow.yml
+++ b/.github/workflows/notify_test_workflow.yml
@@ -33,7 +33,7 @@ jobs:
 runs-on: ubuntu-20.04
 steps:
   - name: "Notify test workflow"
-uses: actions/github-script@v3
+uses: actions/github-script@f05a81df23035049204b043b50c3322045ce7eb3 # 
pin@v3
 if: ${{ github.base_ref == 'master' }}
 with:
   github-token: ${{ secrets.GITHUB_TOKEN }}
diff --git a/.github/workflows/publish_snapshot.yml 
b/.github/workflows/publish_snapshot.yml
index 46f4f7a..bd75e26 100644
--- a/.github/workflows/publish_snapshot.yml
+++ b/.github/workflows/publish_snapshot.yml
@@ -36,18 +36,18 @@ jobs:
   - branch-3.1
 steps:
 - name: Checkout Spark repository
-  uses: actions/checkout@master
+  uses: actions/checkout@61b9e3751b92087fd0b06925ba6dd6314e06f089 # 
pin@master
   with:
 ref: ${{ matrix.branch }}
 - name: Cache Maven local repository
-  uses: actions/cache@v2
+  uses: actions/cache@c64c572235d810460d0d6876e9c705ad5002b353 # pin@v2
   with:
 path: ~/.m2/repository
 key: snapshot-maven-${{ hashFiles('**/pom.xml') }}
 restore-keys: |
   snapshot-maven-
 - name: Install Java 8
-  uses: actions/setup-java@v1
+  uses: actions/setup-java@d202f5dbf7256730fb690ec59f6381650114feb2 # 
pin@v1
   with:
 java-version: 8
 - name: Publish snapshot
diff --git a/.github/workflows/stale.yml b/.github/workflows/stale.yml
index

[spark] branch master updated (ef3eb90 -> 67b547a)

2021-10-16 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ef3eb90  [SPARK-36276][BUILD][FOLLOWUP] Match the version of SBT's 
checkstyle plugin to Maven's one
 add 67b547a  [SPARK-36230][SPARK-36232][PYTHON] Add regression for hasnan 
of Decimal(NaN)

No new revisions were added by this update.

Summary of changes:
 python/pyspark/pandas/tests/test_series.py | 6 ++
 1 file changed, 6 insertions(+)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (722ac1b -> ef3eb90)

2021-10-16 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 722ac1b  [SPARK-36910][PYTHON] Inline type hints for 
python/pyspark/sql/types.py
 add ef3eb90  [SPARK-36276][BUILD][FOLLOWUP] Match the version of SBT's 
checkstyle plugin to Maven's one

No new revisions were added by this update.

Summary of changes:
 pom.xml | 4 
 project/plugins.sbt | 4 +++-
 2 files changed, 7 insertions(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] dongjoon-hyun commented on pull request #361: [BLOCKED BY PYPI FOR NOW] Add 3.2.0 release note and news and update links

[GitHub] [spark-website] dongjoon-hyun edited a comment on pull request #361: [BLOCKED BY PYPI FOR NOW] Add 3.2.0 release note and news and update links

[GitHub] [spark-website] gatorsmile commented on a change in pull request #361: [BLOCKED BY PYPI FOR NOW] Add 3.2.0 release note and news and update links

[spark] branch master updated (ee2647e -> f9cc7fb)

[spark] branch master updated (cf43623 -> ee2647e)

[spark] branch master updated: [SPARK-36900][SPARK-36464][CORE][TEST] Refactor `: size returns correct positive number even with over 2GB data` to pass with Java 8, 11 and 17

[spark] branch master updated: [SPARK-36915][INFRA] Pin actions to a full length commit SHA

[spark] branch master updated (ef3eb90 -> 67b547a)

[spark] branch master updated (722ac1b -> ef3eb90)

9 matches

Site Navigation

Mail list logo

Footer information