date:20211018

[GitHub] [spark-website] gengliangwang opened a new pull request #362: Update DocSearch facet filter of 3.2.0 documentation

2021-10-18 Thread GitBox



gengliangwang opened a new pull request #362:
URL: https://github.com/apache/spark-website/pull/362


   
   
   There is a bug in updating the DocSearch facet filter on 
https://github.com/apache/spark/blob/master/dev/create-release/release-tag.sh. 
The version is not updated as the release version.
   
   This PR is to fix the search function of 
https://spark.apache.org/docs/3.2.0/. I will open another PR to fix the script 
on Spark.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-36834][SHUFFLE] Add support for namespacing log lines emitted by external shuffle service

2021-10-18 Thread mridulm80

This is an automated email from the ASF dual-hosted git repository.

mridulm80 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4072a22  [SPARK-36834][SHUFFLE] Add support for namespacing log lines 
emitted by external shuffle service
4072a22 is described below

commit 4072a22aa2bf15e95d3043f937a3468057f4fd36
Author: Thejdeep Gudivada 
AuthorDate: Mon Oct 18 21:40:55 2021 -0500

[SPARK-36834][SHUFFLE] Add support for namespacing log lines emitted by 
external shuffle service

### What changes were proposed in this pull request?
Added a config `spark.yarn.shuffle.service.logs.namespace` which can be 
used to add a namespace suffix to log lines emitted by the External Shuffle 
Service.

### Why are the changes needed?
Since many instances of ESS can be running on the same NM, it would be 
easier to distinguish between them.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
N/A

Closes #34079 from thejdeep/SPARK-36834.

Authored-by: Thejdeep Gudivada 
Signed-off-by: Mridul Muralidharan gmail.com>
---
 .../apache/spark/network/yarn/YarnShuffleService.java  | 18 --
 docs/running-on-yarn.md| 11 +++
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git 
a/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
 
b/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
index ac16369..f1b8941 100644
--- 
a/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
+++ 
b/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
@@ -93,7 +93,8 @@ import 
org.apache.spark.network.yarn.util.HadoopConfigProvider;
  * This {@code classpath} configuration is only supported on YARN versions >= 
2.9.0.
  */
 public class YarnShuffleService extends AuxiliaryService {
-  private static final Logger logger = 
LoggerFactory.getLogger(YarnShuffleService.class);
+  private static final Logger defaultLogger = 
LoggerFactory.getLogger(YarnShuffleService.class);
+  private Logger logger = defaultLogger;
 
   // Port on which the shuffle server listens for fetch requests
   private static final String SPARK_SHUFFLE_SERVICE_PORT_KEY = 
"spark.shuffle.service.port";
@@ -107,6 +108,12 @@ public class YarnShuffleService extends AuxiliaryService {
   "spark.yarn.shuffle.service.metrics.namespace";
   private static final String DEFAULT_SPARK_SHUFFLE_SERVICE_METRICS_NAME = 
"sparkShuffleService";
 
+  /**
+   * The namespace to use for the logs produced by the shuffle service
+   */
+  static final String SPARK_SHUFFLE_SERVICE_LOGS_NAMESPACE_KEY =
+  "spark.yarn.shuffle.service.logs.namespace";
+
   // Whether the shuffle server should authenticate fetch requests
   private static final String SPARK_AUTHENTICATE_KEY = "spark.authenticate";
   private static final boolean DEFAULT_SPARK_AUTHENTICATE = false;
@@ -204,6 +211,13 @@ public class YarnShuffleService extends AuxiliaryService {
   confOverlayUrl);
   _conf.addResource(confOverlayUrl);
 }
+
+String logsNamespace = _conf.get(SPARK_SHUFFLE_SERVICE_LOGS_NAMESPACE_KEY, 
"");
+if (!logsNamespace.isEmpty()) {
+  String className = YarnShuffleService.class.getName();
+  logger = LoggerFactory.getLogger(className + "." + logsNamespace);
+}
+
 super.serviceInit(_conf);
 
 boolean stopOnFailure = _conf.getBoolean(STOP_ON_FAILURE_KEY, 
DEFAULT_STOP_ON_FAILURE);
@@ -284,7 +298,7 @@ public class YarnShuffleService extends AuxiliaryService {
   // will also need the transport configuration.
   return 
mergeManagerSubClazz.getConstructor(TransportConf.class).newInstance(conf);
 } catch (Exception e) {
-  logger.error("Unable to create an instance of {}", 
mergeManagerImplClassName);
+  defaultLogger.error("Unable to create an instance of {}", 
mergeManagerImplClassName);
   return new NoOpMergedShuffleFileManager(conf);
 }
   }
diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md
index 8b7ed18..52d365a 100644
--- a/docs/running-on-yarn.md
+++ b/docs/running-on-yarn.md
@@ -806,6 +806,17 @@ The following extra configuration options are available 
when the shuffle service
 NodeManager.
   
 
+
+  spark.yarn.shuffle.service.logs.namespace
+  (not set)
+  
+A namespace which will be appended to the class name when forming the 
logger name to use for
+emitting logs from the YARN shuffle service, like
+
org.apache.spark.network.yarn.YarnShuffleService.logsNamespaceValue.
 Since some logging frameworks
+may expect the logger name to look like a class name, it's generally 
recommended to provide a value which
+would be a valid Java package or class name and not include spaces.

[spark] branch master updated (1ef6c13 -> c2ba498)

2021-10-18 Thread ueshin

This is an automated email from the ASF dual-hosted git repository.

ueshin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 1ef6c13  [SPARK-36933][CORE] Clean up 
TaskMemoryManager.acquireExecutionMemory()
 add c2ba498  [SPARK-36945][PYTHON] Inline type hints for 
python/pyspark/sql/udf.py

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/functions.py |   2 +-
 python/pyspark/sql/udf.py   | 129 
 python/pyspark/sql/udf.pyi  |  58 --
 3 files changed, 91 insertions(+), 98 deletions(-)
 delete mode 100644 python/pyspark/sql/udf.pyi

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-36933][CORE] Clean up TaskMemoryManager.acquireExecutionMemory()

2021-10-18 Thread joshrosen

This is an automated email from the ASF dual-hosted git repository.

joshrosen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1ef6c13  [SPARK-36933][CORE] Clean up 
TaskMemoryManager.acquireExecutionMemory()
1ef6c13 is described below

commit 1ef6c13e37bfb64b0f9dd9b624b436064ea86593
Author: Tim Armstrong 
AuthorDate: Mon Oct 18 14:51:24 2021 -0700

[SPARK-36933][CORE] Clean up TaskMemoryManager.acquireExecutionMemory()

### What changes were proposed in this pull request?
* Factor out a method `trySpillAndAcquire()` from 
`acquireExecutionMemory()` that handles the details of how to spill a 
`MemoryConsumer` and acquire the spilled memory. This logic was duplicated 
twice.
* Combine the two loops (spill other consumers and self-spill) into a 
single loop that implements equivalent logic. I made self-spill the lowest 
priority consumer and this is exactly equivalent.
* Consolidate comments a little to explain what the policy is trying to 
achieve and how at a high level
* Add a couple more debug log messages to make it easier to follow

### Why are the changes needed?
Reduce code duplication and better separate the policy decision of which 
MemoryConsumer to spill from the mechanism of requesting it to spill.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added some unit tests to verify the details of the spilling decisions in 
some scenarios that are not covered by current unit tests. Ran these on Spark 
master without the TaskMemoryManager changes to confirm that the behaviour is 
the same before and after my refactoring.

The SPARK-35486 test also provides some coverage for the retry loop.

Closes #34186 from timarmstrong/cleanup-task-memory-manager.

Authored-by: Tim Armstrong 
Signed-off-by: Josh Rosen 
---
 .../org/apache/spark/memory/TaskMemoryManager.java | 149 +++--
 .../spark/memory/TaskMemoryManagerSuite.java   |  79 +++
 2 files changed, 158 insertions(+), 70 deletions(-)

diff --git a/core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java 
b/core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java
index 7a1e8c4..e2e44a5 100644
--- a/core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java
+++ b/core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java
@@ -135,10 +135,10 @@ public class TaskMemoryManager {
*
* @return number of bytes successfully granted (<= N).
*/
-  public long acquireExecutionMemory(long required, MemoryConsumer consumer) {
+  public long acquireExecutionMemory(long required, MemoryConsumer 
requestingConsumer) {
 assert(required >= 0);
-assert(consumer != null);
-MemoryMode mode = consumer.getMode();
+assert(requestingConsumer != null);
+MemoryMode mode = requestingConsumer.getMode();
 // If we are allocating Tungsten pages off-heap and receive a request to 
allocate on-heap
 // memory here, then it may not make sense to spill since that would only 
end up freeing
 // off-heap memory. This is subject to change, though, so it may be risky 
to make this
@@ -149,96 +149,105 @@ public class TaskMemoryManager {
   // Try to release memory from other consumers first, then we can reduce 
the frequency of
   // spilling, avoid to have too many spilled files.
   if (got < required) {
-// Call spill() on other consumers to release memory
-// Sort the consumers according their memory usage. So we avoid 
spilling the same consumer
-// which is just spilled in last few times and re-spilling on it will 
produce many small
-// spill files.
+logger.debug("Task {} need to spill {} for {}", taskAttemptId,
+  Utils.bytesToString(required - got), requestingConsumer);
+// We need to call spill() on consumers to free up more memory. We 
want to optimize for two
+// things:
+// * Minimize the number of spill calls, to reduce the number of spill 
files and avoid small
+//   spill files.
+// * Avoid spilling more data than necessary - if we only need a 
little more memory, we may
+//   not want to spill as much data as possible. Many consumers spill 
more than the
+//   requested amount, so we can take that into account in our 
decisions.
+// We use a heuristic that selects the smallest memory consumer with 
at least `required`
+// bytes of memory in an attempt to balance these factors. It may work 
well if there are
+// fewer larger requests, but can result in many small spills if there 
are many smaller
+// requests.
+
+// Build a map of consumer in order of memory usage to prioritize 
spilling. Assign current
+// consumer (if present) a nominal memory usage of 0 so that

[spark] branch master updated (21fa3ce -> 25fc495)

2021-10-18 Thread ueshin

This is an automated email from the ASF dual-hosted git repository.

ueshin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 21fa3ce  [SPARK-35925][SQL] Support DayTimeIntervalType in 
width-bucket function
 add 25fc495  [SPARK-36886][PYTHON] Inline type hints for 
python/pyspark/sql/context.py

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/context.py | 221 +-
 python/pyspark/sql/context.pyi| 140 
 python/pyspark/sql/dataframe.py   |   2 +-
 python/pyspark/sql/observation.py |   3 +-
 4 files changed, 176 insertions(+), 190 deletions(-)
 delete mode 100644 python/pyspark/sql/context.pyi

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] gengliangwang commented on a change in pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



gengliangwang commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r731104138



##
File path: releases/_posts/2021-10-13-spark-release-3-2-0.md
##
@@ -0,0 +1,319 @@
+---
+layout: post
+title: Spark Release 3.2.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+_edit_last: '4'
+_wpas_done_all: '1'
+---
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users 
can scale out their applications on Spark with one line code change. Other 
major updates include RocksDB StateStore support, session window support, 
push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query 
Execution (AQE) by default, and ANSI SQL mode GA.
+
+To download Apache Spark 3.2.0, visit the 
[downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA 
for the [detailed 
changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407).
 We have curated a list of high level changes here, grouped by major modules.
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+### Highlights
+
+* Support Pandas API layer on PySpark 
([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849))
+* Support push-based shuffle to improve shuffle efficiency 
([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602))
+* Add RocksDB StateStore implementation 
([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198))
+* EventTime based sessionization (session window) 
([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816))
+* ANSI SQL mode GA 
([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030))
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* Enable adaptive query execution by default 
([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679))
+* Query compilation latency reduction 
([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), 
[SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), 
[SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+* Support Scala 2.13 
([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218))
+
+
+### Core and Spark SQL
+
+**ANSI SQL Compatibility Enhancements**
+
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* New type coercion syntax rules in ANSI mode 
([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246))
+* Support LATERAL subqueries 
([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382))
+* ANSI mode: IntegralDivide throws an exception on overflow 
([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152))
+* ANSI mode: Check for overflow in Average 
([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955))
+* Block count(table.*) to follow ANSI standard and other SQL engines 
([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199))
+
+**Performance**
+
+* Query compilation latency
+  * Support traversal pruning in transform/resolve functions and their call 
sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042))
+  * Improve the performance of mapChildren and withNewChildren methods 
([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+  * Improve the performance of type coercion rules 
([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103))
+* Query optimization
+  * Remove redundant aggregates in the Optimizer 
([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122))
+  * Push down limit through Project with Join 
([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622))
+  * Push down limit for LEFT SEMI and LEFT ANTI join 
([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), 
[SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514))
+  * Push down limit through WINDOW when partition spec is empty 
([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575))
+  * Use a relative cost comparison function in the CBO 
([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922))
+  * Cardinality estimation of union, sort, and range operator 
([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411))
+  * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as 
broadcast join 
([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081))
+  * UnwrapCastInBinaryComparison support In/InSet predicate 
([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316))
+  * Subexpression elimination enhancements 
([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448))
+  * Keep necessary stats after partition pruning

[GitHub] [spark-website] sunchao commented on a change in pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



sunchao commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r731092808



##
File path: releases/_posts/2021-10-13-spark-release-3-2-0.md
##
@@ -0,0 +1,319 @@
+---
+layout: post
+title: Spark Release 3.2.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+_edit_last: '4'
+_wpas_done_all: '1'
+---
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users 
can scale out their applications on Spark with one line code change. Other 
major updates include RocksDB StateStore support, session window support, 
push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query 
Execution (AQE) by default, and ANSI SQL mode GA.
+
+To download Apache Spark 3.2.0, visit the 
[downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA 
for the [detailed 
changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407).
 We have curated a list of high level changes here, grouped by major modules.
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+### Highlights
+
+* Support Pandas API layer on PySpark 
([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849))
+* Support push-based shuffle to improve shuffle efficiency 
([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602))
+* Add RocksDB StateStore implementation 
([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198))
+* EventTime based sessionization (session window) 
([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816))
+* ANSI SQL mode GA 
([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030))
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* Enable adaptive query execution by default 
([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679))
+* Query compilation latency reduction 
([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), 
[SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), 
[SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+* Support Scala 2.13 
([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218))
+
+
+### Core and Spark SQL
+
+**ANSI SQL Compatibility Enhancements**
+
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* New type coercion syntax rules in ANSI mode 
([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246))
+* Support LATERAL subqueries 
([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382))
+* ANSI mode: IntegralDivide throws an exception on overflow 
([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152))
+* ANSI mode: Check for overflow in Average 
([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955))
+* Block count(table.*) to follow ANSI standard and other SQL engines 
([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199))
+
+**Performance**
+
+* Query compilation latency
+  * Support traversal pruning in transform/resolve functions and their call 
sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042))
+  * Improve the performance of mapChildren and withNewChildren methods 
([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+  * Improve the performance of type coercion rules 
([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103))
+* Query optimization
+  * Remove redundant aggregates in the Optimizer 
([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122))
+  * Push down limit through Project with Join 
([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622))
+  * Push down limit for LEFT SEMI and LEFT ANTI join 
([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), 
[SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514))
+  * Push down limit through WINDOW when partition spec is empty 
([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575))
+  * Use a relative cost comparison function in the CBO 
([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922))
+  * Cardinality estimation of union, sort, and range operator 
([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411))
+  * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as 
broadcast join 
([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081))
+  * UnwrapCastInBinaryComparison support In/InSet predicate 
([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316))
+  * Subexpression elimination enhancements 
([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448))
+  * Keep necessary stats after partition pruning

[GitHub] [spark-website] gengliangwang commented on a change in pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



gengliangwang commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r731092815



##
File path: releases/_posts/2021-10-13-spark-release-3-2-0.md
##
@@ -0,0 +1,319 @@
+---
+layout: post
+title: Spark Release 3.2.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+_edit_last: '4'
+_wpas_done_all: '1'
+---
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users 
can scale out their applications on Spark with one line code change. Other 
major updates include RocksDB StateStore support, session window support, 
push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query 
Execution (AQE) by default, and ANSI SQL mode GA.
+
+To download Apache Spark 3.2.0, visit the 
[downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA 
for the [detailed 
changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407).
 We have curated a list of high level changes here, grouped by major modules.
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+### Highlights
+
+* Support Pandas API layer on PySpark 
([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849))
+* Support push-based shuffle to improve shuffle efficiency 
([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602))
+* Add RocksDB StateStore implementation 
([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198))
+* EventTime based sessionization (session window) 
([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816))
+* ANSI SQL mode GA 
([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030))
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* Enable adaptive query execution by default 
([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679))

Review comment:
   Make it the second highlight?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] sunchao commented on a change in pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



sunchao commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r731085852



##
File path: releases/_posts/2021-10-13-spark-release-3-2-0.md
##
@@ -0,0 +1,318 @@
+---
+layout: post
+title: Spark Release 3.2.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+_edit_last: '4'
+_wpas_done_all: '1'
+---
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users 
can scale out their applications on Spark with one line code change. Other 
major updates include RocksDB StateStore support, session window support, 
push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query 
Execution (AQE) by default, and ANSI SQL mode GA.
+
+To download Apache Spark 3.2.0, visit the 
[downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA 
for the [detailed 
changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407).
 We have curated a list of high level changes here, grouped by major modules.
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+### Highlights
+
+* Support Pandas API layer on PySpark 
([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849))
+* Support push-based shuffle to improve shuffle efficiency 
([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602))
+* Add RocksDB StateStore implementation 
([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198))
+* EventTime based sessionization (session window) 
([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816))
+* ANSI SQL mode GA 
([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030))
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* Enable adaptive query execution by default 
([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679))
+* Query compilation latency reduction 
([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), 
[SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), 
[SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+* Support Scala 2.13 
([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218))
+
+
+### Core and Spark SQL
+
+**ANSI SQL Compatibility Enhancements**
+
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* New type coercion syntax rules in ANSI mode 
([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246))
+* Support LATERAL subqueries 
([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382))
+* ANSI mode: IntegralDivide throws an exception on overflow 
([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152))
+* ANSI mode: Check for overflow in Average 
([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955))
+* Block count(table.*) to follow ANSI standard and other SQL engines 
([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199))
+
+**Performance**
+
+* Query compilation latency
+  * Support traversal pruning in transform/resolve functions and their call 
sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042))
+  * Improve the performance of mapChildren and withNewChildren methods 
([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+  * Improve the performance of type coercion rules 
([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103))
+* Query optimization
+  * Remove redundant aggregates in the Optimizer 
([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122))
+  * Push down limit through Project with Join 
([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622))
+  * Push down limit for LEFT SEMI and LEFT ANTI join 
([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), 
[SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514))
+  * Push down limit through WINDOW when partition spec is empty 
([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575))
+  * Use a relative cost comparison function in the CBO 
([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922))
+  * Cardinality estimation of union, sort, and range operator 
([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411))
+  * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as 
broadcast join 
([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081))
+  * UnwrapCastInBinaryComparison support In/InSet predicate 
([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316))
+  * Subexpression elimination enhancements 
([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448))
+  * Keep necessary stats after partition pruning

[GitHub] [spark-website] sunchao commented on a change in pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



sunchao commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r731084843



##
File path: releases/_posts/2021-10-13-spark-release-3-2-0.md
##
@@ -0,0 +1,318 @@
+---
+layout: post
+title: Spark Release 3.2.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+_edit_last: '4'
+_wpas_done_all: '1'
+---
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users 
can scale out their applications on Spark with one line code change. Other 
major updates include RocksDB StateStore support, session window support, 
push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query 
Execution (AQE) by default, and ANSI SQL mode GA.
+
+To download Apache Spark 3.2.0, visit the 
[downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA 
for the [detailed 
changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407).
 We have curated a list of high level changes here, grouped by major modules.
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+### Highlights
+
+* Support Pandas API layer on PySpark 
([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849))
+* Support push-based shuffle to improve shuffle efficiency 
([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602))
+* Add RocksDB StateStore implementation 
([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198))
+* EventTime based sessionization (session window) 
([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816))
+* ANSI SQL mode GA 
([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030))
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* Enable adaptive query execution by default 
([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679))
+* Query compilation latency reduction 
([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), 
[SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), 
[SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+* Support Scala 2.13 
([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218))
+
+
+### Core and Spark SQL
+
+**ANSI SQL Compatibility Enhancements**
+
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* New type coercion syntax rules in ANSI mode 
([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246))
+* Support LATERAL subqueries 
([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382))
+* ANSI mode: IntegralDivide throws an exception on overflow 
([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152))
+* ANSI mode: Check for overflow in Average 
([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955))
+* Block count(table.*) to follow ANSI standard and other SQL engines 
([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199))
+
+**Performance**
+
+* Query compilation latency
+  * Support traversal pruning in transform/resolve functions and their call 
sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042))
+  * Improve the performance of mapChildren and withNewChildren methods 
([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+  * Improve the performance of type coercion rules 
([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103))
+* Query optimization
+  * Remove redundant aggregates in the Optimizer 
([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122))
+  * Push down limit through Project with Join 
([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622))
+  * Push down limit for LEFT SEMI and LEFT ANTI join 
([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), 
[SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514))
+  * Push down limit through WINDOW when partition spec is empty 
([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575))
+  * Use a relative cost comparison function in the CBO 
([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922))
+  * Cardinality estimation of union, sort, and range operator 
([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411))
+  * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as 
broadcast join 
([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081))
+  * UnwrapCastInBinaryComparison support In/InSet predicate 
([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316))
+  * Subexpression elimination enhancements 
([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448))
+  * Keep necessary stats after partition pruning

[GitHub] [spark-website] gatorsmile commented on a change in pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



gatorsmile commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r731078539



##
File path: releases/_posts/2021-10-13-spark-release-3-2-0.md
##
@@ -0,0 +1,319 @@
+---
+layout: post
+title: Spark Release 3.2.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+_edit_last: '4'
+_wpas_done_all: '1'
+---
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users 
can scale out their applications on Spark with one line code change. Other 
major updates include RocksDB StateStore support, session window support, 
push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query 
Execution (AQE) by default, and ANSI SQL mode GA.
+
+To download Apache Spark 3.2.0, visit the 
[downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA 
for the [detailed 
changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407).
 We have curated a list of high level changes here, grouped by major modules.
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+### Highlights
+
+* Support Pandas API layer on PySpark 
([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849))
+* Support push-based shuffle to improve shuffle efficiency 
([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602))
+* Add RocksDB StateStore implementation 
([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198))
+* EventTime based sessionization (session window) 
([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816))
+* ANSI SQL mode GA 
([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030))
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* Enable adaptive query execution by default 
([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679))
+* Query compilation latency reduction 
([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), 
[SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), 
[SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+* Support Scala 2.13 
([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218))
+
+
+### Core and Spark SQL
+
+**ANSI SQL Compatibility Enhancements**
+
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* New type coercion syntax rules in ANSI mode 
([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246))
+* Support LATERAL subqueries 
([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382))
+* ANSI mode: IntegralDivide throws an exception on overflow 
([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152))
+* ANSI mode: Check for overflow in Average 
([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955))
+* Block count(table.*) to follow ANSI standard and other SQL engines 
([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199))
+
+**Performance**
+
+* Query compilation latency
+  * Support traversal pruning in transform/resolve functions and their call 
sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042))
+  * Improve the performance of mapChildren and withNewChildren methods 
([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+  * Improve the performance of type coercion rules 
([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103))
+* Query optimization
+  * Remove redundant aggregates in the Optimizer 
([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122))
+  * Push down limit through Project with Join 
([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622))
+  * Push down limit for LEFT SEMI and LEFT ANTI join 
([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), 
[SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514))
+  * Push down limit through WINDOW when partition spec is empty 
([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575))
+  * Use a relative cost comparison function in the CBO 
([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922))
+  * Cardinality estimation of union, sort, and range operator 
([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411))
+  * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as 
broadcast join 
([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081))
+  * UnwrapCastInBinaryComparison support In/InSet predicate 
([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316))
+  * Subexpression elimination enhancements 
([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448))
+  * Keep necessary stats after partition pruning

[GitHub] [spark-website] gatorsmile commented on a change in pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



gatorsmile commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r731070322



##
File path: releases/_posts/2021-10-13-spark-release-3-2-0.md
##
@@ -0,0 +1,319 @@
+---
+layout: post
+title: Spark Release 3.2.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+_edit_last: '4'
+_wpas_done_all: '1'
+---
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users 
can scale out their applications on Spark with one line code change. Other 
major updates include RocksDB StateStore support, session window support, 
push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query 
Execution (AQE) by default, and ANSI SQL mode GA.
+
+To download Apache Spark 3.2.0, visit the 
[downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA 
for the [detailed 
changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407).
 We have curated a list of high level changes here, grouped by major modules.
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+### Highlights
+
+* Support Pandas API layer on PySpark 
([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849))
+* Support push-based shuffle to improve shuffle efficiency 
([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602))
+* Add RocksDB StateStore implementation 
([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198))
+* EventTime based sessionization (session window) 
([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816))
+* ANSI SQL mode GA 
([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030))
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* Enable adaptive query execution by default 
([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679))

Review comment:
   The impact of this feature is more important than the other SQL and Core 
features, IMO. Can you adjust the order?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] gengliangwang commented on a change in pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



gengliangwang commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r730932576



##
File path: releases/_posts/2021-10-13-spark-release-3-2-0.md
##
@@ -0,0 +1,317 @@
+---
+layout: post
+title: Spark Release 3.2.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+_edit_last: '4'
+_wpas_done_all: '1'
+---
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users 
can scale out their applications on Spark with one line code change. Other 
major updates include RocksDB StateStore support, session window support, 
push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query 
Execution (AQE) by default, and ANSI SQL mode GA.
+
+To download Apache Spark 3.2.0, visit the 
[downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA 
for the [detailed 
changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407).
 We have curated a list of high level changes here, grouped by major modules.
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+### Highlights
+
+* Support Pandas API layer on PySpark 
([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849))
+* Support push-based shuffle to improve shuffle efficiency 
([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602))
+* Add RocksDB StateStore implementation 
([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198))
+* EventTime based sessionization (session window) 
([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816))
+* ANSI SQL mode GA 
([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030))
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* Enable adaptive query execution by default 
([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679))
+* Query compilation latency reduction 
([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), 
[SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), 
[SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+* Support Scala 2.13 
([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218))
+
+
+### Core and Spark SQL
+
+**ANSI SQL Compatibility Enhancements**
+
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* New type coercion syntax rules in ANSI mode 
([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246))
+* Support LATERAL subqueries 
([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382))
+* ANSI mode: IntegralDivide throws an exception on overflow 
([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152))
+* ANSI mode: Check for overflow in Average 
([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955))
+* Block count(table.*) to follow ANSI standard and other SQL engines 
([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199))
+
+**Performance**
+
+* Query compilation latency
+  * Support traversal pruning in transform/resolve functions and their call 
sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042))
+  * Improve the performance of mapChildren and withNewChildren methods 
([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+  * Improve the performance of type coercion rules 
([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103))
+* Query optimization
+  * Remove redundant aggregates in the Optimizer 
([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122))
+  * Push down limit through Project with Join 
([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622))
+  * Push down limit for LEFT SEMI and LEFT ANTI join 
([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), 
[SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514))
+  * Push down limit through WINDOW when partition spec is empty 
([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575))
+  * Use a relative cost comparison function in the CBO 
([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922))
+  * Cardinality estimation of union, sort, and range operator 
([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411))
+  * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as 
broadcast join 
([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081))
+  * UnwrapCastInBinaryComparison support In/InSet predicate 
([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316))
+  * Subexpression elimination enhancements 
([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448))
+  * Keep necessary stats after partition pruning

[spark] branch master updated (c29bb02 -> 21fa3ce)

2021-10-18 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c29bb02  [SPARK-36965][PYTHON] Extend python test runner by logging 
out the temp output files
 add 21fa3ce  [SPARK-35925][SQL] Support DayTimeIntervalType in 
width-bucket function

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/expressions/mathExpressions.scala   | 12 +---
 .../catalyst/expressions/MathExpressionsSuite.scala  | 20 
 .../src/test/resources/sql-tests/inputs/interval.sql |  2 ++
 .../sql-tests/results/ansi/interval.sql.out  | 18 +-
 .../resources/sql-tests/results/interval.sql.out | 18 +-
 5 files changed, 65 insertions(+), 5 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] h-vetinari commented on pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



h-vetinari commented on pull request #361:
URL: https://github.com/apache/spark-website/pull/361#issuecomment-945711644


   Update 2: pyspark 3.2.0 has been uploaded to 
https://anaconda.org/conda-forge/pyspark/files, will make its way through the 
CDN in about an hour.
   
   > @h-vetinari yes please raise a PR to update 
https://github.com/apache/spark/blob/master/python/docs/source/getting_started/install.rst,
 thanks!
   
   Done here: https://github.com/apache/spark/pull/34315  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] Ngone51 commented on a change in pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



Ngone51 commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r730861107



##
File path: releases/_posts/2021-10-13-spark-release-3-2-0.md
##
@@ -0,0 +1,317 @@
+---
+layout: post
+title: Spark Release 3.2.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+_edit_last: '4'
+_wpas_done_all: '1'
+---
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users 
can scale out their applications on Spark with one line code change. Other 
major updates include RocksDB StateStore support, session window support, 
push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query 
Execution (AQE) by default, and ANSI SQL mode GA.
+
+To download Apache Spark 3.2.0, visit the 
[downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA 
for the [detailed 
changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407).
 We have curated a list of high level changes here, grouped by major modules.
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+### Highlights
+
+* Support Pandas API layer on PySpark 
([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849))
+* Support push-based shuffle to improve shuffle efficiency 
([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602))
+* Add RocksDB StateStore implementation 
([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198))
+* EventTime based sessionization (session window) 
([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816))
+* ANSI SQL mode GA 
([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030))
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* Enable adaptive query execution by default 
([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679))
+* Query compilation latency reduction 
([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), 
[SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), 
[SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+* Support Scala 2.13 
([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218))
+
+
+### Core and Spark SQL
+
+**ANSI SQL Compatibility Enhancements**
+
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* New type coercion syntax rules in ANSI mode 
([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246))
+* Support LATERAL subqueries 
([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382))
+* ANSI mode: IntegralDivide throws an exception on overflow 
([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152))
+* ANSI mode: Check for overflow in Average 
([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955))
+* Block count(table.*) to follow ANSI standard and other SQL engines 
([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199))
+
+**Performance**
+
+* Query compilation latency
+  * Support traversal pruning in transform/resolve functions and their call 
sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042))
+  * Improve the performance of mapChildren and withNewChildren methods 
([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+  * Improve the performance of type coercion rules 
([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103))
+* Query optimization
+  * Remove redundant aggregates in the Optimizer 
([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122))
+  * Push down limit through Project with Join 
([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622))
+  * Push down limit for LEFT SEMI and LEFT ANTI join 
([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), 
[SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514))
+  * Push down limit through WINDOW when partition spec is empty 
([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575))
+  * Use a relative cost comparison function in the CBO 
([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922))
+  * Cardinality estimation of union, sort, and range operator 
([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411))
+  * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as 
broadcast join 
([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081))
+  * UnwrapCastInBinaryComparison support In/InSet predicate 
([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316))
+  * Subexpression elimination enhancements 
([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448))
+  * Keep necessary stats after partition pruning

[GitHub] [spark-website] Ngone51 commented on a change in pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



Ngone51 commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r730858991



##
File path: releases/_posts/2021-10-13-spark-release-3-2-0.md
##
@@ -0,0 +1,317 @@
+---
+layout: post
+title: Spark Release 3.2.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+_edit_last: '4'
+_wpas_done_all: '1'
+---
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users 
can scale out their applications on Spark with one line code change. Other 
major updates include RocksDB StateStore support, session window support, 
push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query 
Execution (AQE) by default, and ANSI SQL mode GA.
+
+To download Apache Spark 3.2.0, visit the 
[downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA 
for the [detailed 
changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407).
 We have curated a list of high level changes here, grouped by major modules.
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+### Highlights
+
+* Support Pandas API layer on PySpark 
([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849))
+* Support push-based shuffle to improve shuffle efficiency 
([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602))
+* Add RocksDB StateStore implementation 
([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198))
+* EventTime based sessionization (session window) 
([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816))
+* ANSI SQL mode GA 
([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030))
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* Enable adaptive query execution by default 
([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679))
+* Query compilation latency reduction 
([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), 
[SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), 
[SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+* Support Scala 2.13 
([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218))
+
+
+### Core and Spark SQL
+
+**ANSI SQL Compatibility Enhancements**
+
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* New type coercion syntax rules in ANSI mode 
([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246))
+* Support LATERAL subqueries 
([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382))
+* ANSI mode: IntegralDivide throws an exception on overflow 
([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152))
+* ANSI mode: Check for overflow in Average 
([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955))
+* Block count(table.*) to follow ANSI standard and other SQL engines 
([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199))
+
+**Performance**
+
+* Query compilation latency
+  * Support traversal pruning in transform/resolve functions and their call 
sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042))
+  * Improve the performance of mapChildren and withNewChildren methods 
([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+  * Improve the performance of type coercion rules 
([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103))
+* Query optimization
+  * Remove redundant aggregates in the Optimizer 
([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122))
+  * Push down limit through Project with Join 
([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622))
+  * Push down limit for LEFT SEMI and LEFT ANTI join 
([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), 
[SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514))
+  * Push down limit through WINDOW when partition spec is empty 
([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575))
+  * Use a relative cost comparison function in the CBO 
([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922))
+  * Cardinality estimation of union, sort, and range operator 
([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411))
+  * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as 
broadcast join 
([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081))
+  * UnwrapCastInBinaryComparison support In/InSet predicate 
([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316))
+  * Subexpression elimination enhancements 
([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448))
+  * Keep necessary stats after partition pruning

[GitHub] [spark-website] Ngone51 commented on a change in pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



Ngone51 commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r730858991



##
File path: releases/_posts/2021-10-13-spark-release-3-2-0.md
##
@@ -0,0 +1,317 @@
+---
+layout: post
+title: Spark Release 3.2.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+_edit_last: '4'
+_wpas_done_all: '1'
+---
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users 
can scale out their applications on Spark with one line code change. Other 
major updates include RocksDB StateStore support, session window support, 
push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query 
Execution (AQE) by default, and ANSI SQL mode GA.
+
+To download Apache Spark 3.2.0, visit the 
[downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA 
for the [detailed 
changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407).
 We have curated a list of high level changes here, grouped by major modules.
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+### Highlights
+
+* Support Pandas API layer on PySpark 
([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849))
+* Support push-based shuffle to improve shuffle efficiency 
([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602))
+* Add RocksDB StateStore implementation 
([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198))
+* EventTime based sessionization (session window) 
([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816))
+* ANSI SQL mode GA 
([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030))
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* Enable adaptive query execution by default 
([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679))
+* Query compilation latency reduction 
([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), 
[SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), 
[SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+* Support Scala 2.13 
([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218))
+
+
+### Core and Spark SQL
+
+**ANSI SQL Compatibility Enhancements**
+
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* New type coercion syntax rules in ANSI mode 
([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246))
+* Support LATERAL subqueries 
([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382))
+* ANSI mode: IntegralDivide throws an exception on overflow 
([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152))
+* ANSI mode: Check for overflow in Average 
([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955))
+* Block count(table.*) to follow ANSI standard and other SQL engines 
([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199))
+
+**Performance**
+
+* Query compilation latency
+  * Support traversal pruning in transform/resolve functions and their call 
sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042))
+  * Improve the performance of mapChildren and withNewChildren methods 
([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+  * Improve the performance of type coercion rules 
([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103))
+* Query optimization
+  * Remove redundant aggregates in the Optimizer 
([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122))
+  * Push down limit through Project with Join 
([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622))
+  * Push down limit for LEFT SEMI and LEFT ANTI join 
([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), 
[SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514))
+  * Push down limit through WINDOW when partition spec is empty 
([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575))
+  * Use a relative cost comparison function in the CBO 
([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922))
+  * Cardinality estimation of union, sort, and range operator 
([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411))
+  * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as 
broadcast join 
([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081))
+  * UnwrapCastInBinaryComparison support In/InSet predicate 
([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316))
+  * Subexpression elimination enhancements 
([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448))
+  * Keep necessary stats after partition pruning

[GitHub] [spark-website] gengliangwang commented on pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



gengliangwang commented on pull request #361:
URL: https://github.com/apache/spark-website/pull/361#issuecomment-945660978


   @h-vetinari yes please raise a PR to update 
https://github.com/apache/spark/blob/master/python/docs/source/getting_started/install.rst,
 thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] h-vetinari commented on pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



h-vetinari commented on pull request #361:
URL: https://github.com/apache/spark-website/pull/361#issuecomment-945636499


   Little update, the conda-forge build of pyspark is now waiting to be merged 
by the feedstock team resp. conda-forge/core.
   
   However, I noticed that the 
[install](https://spark.apache.org/docs/3.2.0/api/python/getting_started/install.html)
 instructions for conda are... not ideal. In particular, mixing pip & conda is 
strongly discouraged, because pip can trample on the conda-environment and 
break it.
   
   Should I raise a PR under https://github.com/apache/spark/? Would be good if 
this could then be backported to 3.2 (presumably that's necessary for it to 
appear in the 3.2.0 docs)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (0bba90b -> c29bb02)

2021-10-18 Thread attilapiros

This is an automated email from the ASF dual-hosted git repository.

attilapiros pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 0bba90b  [SPARK-36978][SQL] InferConstraints rule should create 
IsNotNull constraints on the accessed nested field instead of the root nested 
type
 add c29bb02  [SPARK-36965][PYTHON] Extend python test runner by logging 
out the temp output files

No new revisions were added by this update.

Summary of changes:
 python/run-tests.py | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (e7815b1 -> 0bba90b)

2021-10-18 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from e7815b1  [SPARK-37032][SQL] Fix broken SQL syntax link in SQL 
Reference page
 add 0bba90b  [SPARK-36978][SQL] InferConstraints rule should create 
IsNotNull constraints on the accessed nested field instead of the root nested 
type

No new revisions were added by this update.

Summary of changes:
 .../expressions/complexTypeExtractors.scala| 11 +++--
 .../plans/logical/QueryPlanConstraints.scala   | 18 +++--
 .../InferFiltersFromConstraintsSuite.scala | 47 +-
 3 files changed, 66 insertions(+), 10 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] gengliangwang commented on a change in pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



gengliangwang commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r730748515



##
File path: js/downloads.js
##
@@ -22,7 +22,7 @@ var packagesV10 = [hadoop2p7, hadoop3p2, hadoopFree, sources];
 // 3.1.0+
 var packagesV11 = [hadoop3p2, hadoop2p7, hadoopFree, sources];
 
-
+addRelease("3.2.0", new Date("10/13/2021"), packagesV11, true);

Review comment:
   Thanks, updated.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] gengliangwang commented on a change in pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



gengliangwang commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r730744351



##
File path: releases/_posts/2021-10-13-spark-release-3-2-0.md
##
@@ -0,0 +1,318 @@
+---
+layout: post
+title: Spark Release 3.2.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+_edit_last: '4'
+_wpas_done_all: '1'
+---
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users 
can scale out their applications on Spark with one line code change. Other 
major updates include RocksDB StateStore support, session window support, 
push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query 
Execution (AQE) by default, and ANSI SQL mode GA.
+
+To download Apache Spark 3.2.0, visit the 
[downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA 
for the [detailed 
changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407).
 We have curated a list of high level changes here, grouped by major modules.
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+### Highlights
+
+* Support Pandas API layer on PySpark 
([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849))
+* Support push-based shuffle to improve shuffle efficiency 
([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602))
+* Add RocksDB StateStore implementation 
([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198))
+* EventTime based sessionization (session window) 
([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816))
+* ANSI SQL mode GA 
([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030))
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* Enable adaptive query execution by default 
([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679))
+* Query compilation latency reduction 
([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), 
[SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), 
[SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+* Support Scala 2.13 
([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218))
+
+
+### Core and Spark SQL
+
+**ANSI SQL Compatibility Enhancements**
+
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* New type coercion syntax rules in ANSI mode 
([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246))
+* Support LATERAL subqueries 
([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382))
+* ANSI mode: IntegralDivide throws an exception on overflow 
([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152))
+* ANSI mode: Check for overflow in Average 
([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955))
+* Block count(table.*) to follow ANSI standard and other SQL engines 
([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199))
+
+**Performance**
+
+* Query compilation latency
+  * Support traversal pruning in transform/resolve functions and their call 
sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042))
+  * Improve the performance of mapChildren and withNewChildren methods 
([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+  * Improve the performance of type coercion rules 
([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103))
+* Query optimization
+  * Remove redundant aggregates in the Optimizer 
([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122))
+  * Push down limit through Project with Join 
([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622))
+  * Push down limit for LEFT SEMI and LEFT ANTI join 
([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), 
[SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514))
+  * Push down limit through WINDOW when partition spec is empty 
([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575))
+  * Use a relative cost comparison function in the CBO 
([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922))
+  * Cardinality estimation of union, sort, and range operator 
([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411))
+  * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as 
broadcast join 
([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081))
+  * UnwrapCastInBinaryComparison support In/InSet predicate 
([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316))
+  * Subexpression elimination enhancements 
([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448))
+  * Keep necessary stats after partition pruning

[GitHub] [spark-website] yaooqinn commented on a change in pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



yaooqinn commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r730726311



##
File path: js/downloads.js
##
@@ -22,7 +22,7 @@ var packagesV10 = [hadoop2p7, hadoop3p2, hadoopFree, sources];
 // 3.1.0+
 var packagesV11 = [hadoop3p2, hadoop2p7, hadoopFree, sources];
 
-
+addRelease("3.2.0", new Date("10/13/2021"), packagesV11, true);

Review comment:
   So this is not right then
   
   ```
   var hadoop3p2 = {pretty: "Pre-built for Apache Hadoop 3.2 and later", tag: 
"hadoop3.2"};
   
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] yaooqinn commented on a change in pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



yaooqinn commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r730725129



##
File path: releases/_posts/2021-10-13-spark-release-3-2-0.md
##
@@ -0,0 +1,318 @@
+---
+layout: post
+title: Spark Release 3.2.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+_edit_last: '4'
+_wpas_done_all: '1'
+---
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users 
can scale out their applications on Spark with one line code change. Other 
major updates include RocksDB StateStore support, session window support, 
push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query 
Execution (AQE) by default, and ANSI SQL mode GA.
+
+To download Apache Spark 3.2.0, visit the 
[downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA 
for the [detailed 
changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407).
 We have curated a list of high level changes here, grouped by major modules.
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+### Highlights
+
+* Support Pandas API layer on PySpark 
([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849))
+* Support push-based shuffle to improve shuffle efficiency 
([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602))
+* Add RocksDB StateStore implementation 
([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198))
+* EventTime based sessionization (session window) 
([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816))
+* ANSI SQL mode GA 
([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030))
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* Enable adaptive query execution by default 
([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679))
+* Query compilation latency reduction 
([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), 
[SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), 
[SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+* Support Scala 2.13 
([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218))
+
+
+### Core and Spark SQL
+
+**ANSI SQL Compatibility Enhancements**
+
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* New type coercion syntax rules in ANSI mode 
([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246))
+* Support LATERAL subqueries 
([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382))
+* ANSI mode: IntegralDivide throws an exception on overflow 
([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152))
+* ANSI mode: Check for overflow in Average 
([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955))
+* Block count(table.*) to follow ANSI standard and other SQL engines 
([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199))
+
+**Performance**
+
+* Query compilation latency
+  * Support traversal pruning in transform/resolve functions and their call 
sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042))
+  * Improve the performance of mapChildren and withNewChildren methods 
([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+  * Improve the performance of type coercion rules 
([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103))
+* Query optimization
+  * Remove redundant aggregates in the Optimizer 
([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122))
+  * Push down limit through Project with Join 
([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622))
+  * Push down limit for LEFT SEMI and LEFT ANTI join 
([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), 
[SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514))
+  * Push down limit through WINDOW when partition spec is empty 
([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575))
+  * Use a relative cost comparison function in the CBO 
([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922))
+  * Cardinality estimation of union, sort, and range operator 
([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411))
+  * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as 
broadcast join 
([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081))
+  * UnwrapCastInBinaryComparison support In/InSet predicate 
([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316))
+  * Subexpression elimination enhancements 
([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448))
+  * Keep necessary stats after partition pruning

[GitHub] [spark-website] yaooqinn commented on a change in pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



yaooqinn commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r730725129



##
File path: releases/_posts/2021-10-13-spark-release-3-2-0.md
##
@@ -0,0 +1,318 @@
+---
+layout: post
+title: Spark Release 3.2.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+_edit_last: '4'
+_wpas_done_all: '1'
+---
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users 
can scale out their applications on Spark with one line code change. Other 
major updates include RocksDB StateStore support, session window support, 
push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query 
Execution (AQE) by default, and ANSI SQL mode GA.
+
+To download Apache Spark 3.2.0, visit the 
[downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA 
for the [detailed 
changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407).
 We have curated a list of high level changes here, grouped by major modules.
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+### Highlights
+
+* Support Pandas API layer on PySpark 
([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849))
+* Support push-based shuffle to improve shuffle efficiency 
([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602))
+* Add RocksDB StateStore implementation 
([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198))
+* EventTime based sessionization (session window) 
([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816))
+* ANSI SQL mode GA 
([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030))
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* Enable adaptive query execution by default 
([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679))
+* Query compilation latency reduction 
([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), 
[SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), 
[SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+* Support Scala 2.13 
([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218))
+
+
+### Core and Spark SQL
+
+**ANSI SQL Compatibility Enhancements**
+
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* New type coercion syntax rules in ANSI mode 
([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246))
+* Support LATERAL subqueries 
([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382))
+* ANSI mode: IntegralDivide throws an exception on overflow 
([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152))
+* ANSI mode: Check for overflow in Average 
([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955))
+* Block count(table.*) to follow ANSI standard and other SQL engines 
([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199))
+
+**Performance**
+
+* Query compilation latency
+  * Support traversal pruning in transform/resolve functions and their call 
sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042))
+  * Improve the performance of mapChildren and withNewChildren methods 
([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+  * Improve the performance of type coercion rules 
([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103))
+* Query optimization
+  * Remove redundant aggregates in the Optimizer 
([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122))
+  * Push down limit through Project with Join 
([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622))
+  * Push down limit for LEFT SEMI and LEFT ANTI join 
([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), 
[SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514))
+  * Push down limit through WINDOW when partition spec is empty 
([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575))
+  * Use a relative cost comparison function in the CBO 
([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922))
+  * Cardinality estimation of union, sort, and range operator 
([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411))
+  * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as 
broadcast join 
([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081))
+  * UnwrapCastInBinaryComparison support In/InSet predicate 
([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316))
+  * Subexpression elimination enhancements 
([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448))
+  * Keep necessary stats after partition pruning

[GitHub] [spark-website] gengliangwang commented on a change in pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



gengliangwang commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r730722931



##
File path: releases/_posts/2021-10-13-spark-release-3-2-0.md
##
@@ -0,0 +1,318 @@
+---
+layout: post
+title: Spark Release 3.2.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+_edit_last: '4'
+_wpas_done_all: '1'
+---
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users 
can scale out their applications on Spark with one line code change. Other 
major updates include RocksDB StateStore support, session window support, 
push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query 
Execution (AQE) by default, and ANSI SQL mode GA.
+
+To download Apache Spark 3.2.0, visit the 
[downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA 
for the [detailed 
changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407).
 We have curated a list of high level changes here, grouped by major modules.
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+### Highlights
+
+* Support Pandas API layer on PySpark 
([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849))
+* Support push-based shuffle to improve shuffle efficiency 
([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602))
+* Add RocksDB StateStore implementation 
([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198))
+* EventTime based sessionization (session window) 
([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816))
+* ANSI SQL mode GA 
([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030))
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* Enable adaptive query execution by default 
([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679))
+* Query compilation latency reduction 
([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), 
[SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), 
[SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+* Support Scala 2.13 
([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218))
+
+
+### Core and Spark SQL
+
+**ANSI SQL Compatibility Enhancements**
+
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* New type coercion syntax rules in ANSI mode 
([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246))
+* Support LATERAL subqueries 
([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382))
+* ANSI mode: IntegralDivide throws an exception on overflow 
([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152))
+* ANSI mode: Check for overflow in Average 
([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955))
+* Block count(table.*) to follow ANSI standard and other SQL engines 
([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199))
+
+**Performance**
+
+* Query compilation latency
+  * Support traversal pruning in transform/resolve functions and their call 
sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042))
+  * Improve the performance of mapChildren and withNewChildren methods 
([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+  * Improve the performance of type coercion rules 
([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103))
+* Query optimization
+  * Remove redundant aggregates in the Optimizer 
([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122))
+  * Push down limit through Project with Join 
([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622))
+  * Push down limit for LEFT SEMI and LEFT ANTI join 
([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), 
[SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514))
+  * Push down limit through WINDOW when partition spec is empty 
([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575))
+  * Use a relative cost comparison function in the CBO 
([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922))
+  * Cardinality estimation of union, sort, and range operator 
([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411))
+  * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as 
broadcast join 
([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081))
+  * UnwrapCastInBinaryComparison support In/InSet predicate 
([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316))
+  * Subexpression elimination enhancements 
([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448))
+  * Keep necessary stats after partition pruning

[GitHub] [spark-website] gengliangwang commented on a change in pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



gengliangwang commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r730722931



##
File path: releases/_posts/2021-10-13-spark-release-3-2-0.md
##
@@ -0,0 +1,318 @@
+---
+layout: post
+title: Spark Release 3.2.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+_edit_last: '4'
+_wpas_done_all: '1'
+---
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users 
can scale out their applications on Spark with one line code change. Other 
major updates include RocksDB StateStore support, session window support, 
push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query 
Execution (AQE) by default, and ANSI SQL mode GA.
+
+To download Apache Spark 3.2.0, visit the 
[downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA 
for the [detailed 
changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407).
 We have curated a list of high level changes here, grouped by major modules.
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+### Highlights
+
+* Support Pandas API layer on PySpark 
([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849))
+* Support push-based shuffle to improve shuffle efficiency 
([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602))
+* Add RocksDB StateStore implementation 
([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198))
+* EventTime based sessionization (session window) 
([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816))
+* ANSI SQL mode GA 
([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030))
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* Enable adaptive query execution by default 
([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679))
+* Query compilation latency reduction 
([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), 
[SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), 
[SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+* Support Scala 2.13 
([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218))
+
+
+### Core and Spark SQL
+
+**ANSI SQL Compatibility Enhancements**
+
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* New type coercion syntax rules in ANSI mode 
([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246))
+* Support LATERAL subqueries 
([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382))
+* ANSI mode: IntegralDivide throws an exception on overflow 
([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152))
+* ANSI mode: Check for overflow in Average 
([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955))
+* Block count(table.*) to follow ANSI standard and other SQL engines 
([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199))
+
+**Performance**
+
+* Query compilation latency
+  * Support traversal pruning in transform/resolve functions and their call 
sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042))
+  * Improve the performance of mapChildren and withNewChildren methods 
([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+  * Improve the performance of type coercion rules 
([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103))
+* Query optimization
+  * Remove redundant aggregates in the Optimizer 
([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122))
+  * Push down limit through Project with Join 
([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622))
+  * Push down limit for LEFT SEMI and LEFT ANTI join 
([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), 
[SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514))
+  * Push down limit through WINDOW when partition spec is empty 
([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575))
+  * Use a relative cost comparison function in the CBO 
([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922))
+  * Cardinality estimation of union, sort, and range operator 
([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411))
+  * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as 
broadcast join 
([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081))
+  * UnwrapCastInBinaryComparison support In/InSet predicate 
([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316))
+  * Subexpression elimination enhancements 
([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448))
+  * Keep necessary stats after partition pruning

[GitHub] [spark-website] gengliangwang commented on a change in pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



gengliangwang commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r730721947



##
File path: releases/_posts/2021-10-13-spark-release-3-2-0.md
##
@@ -0,0 +1,318 @@
+---
+layout: post
+title: Spark Release 3.2.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+_edit_last: '4'
+_wpas_done_all: '1'
+---
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users 
can scale out their applications on Spark with one line code change. Other 
major updates include RocksDB StateStore support, session window support, 
push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query 
Execution (AQE) by default, and ANSI SQL mode GA.
+
+To download Apache Spark 3.2.0, visit the 
[downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA 
for the [detailed 
changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407).
 We have curated a list of high level changes here, grouped by major modules.
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+### Highlights
+
+* Support Pandas API layer on PySpark 
([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849))
+* Support push-based shuffle to improve shuffle efficiency 
([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602))
+* Add RocksDB StateStore implementation 
([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198))
+* EventTime based sessionization (session window) 
([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816))
+* ANSI SQL mode GA 
([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030))
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* Enable adaptive query execution by default 
([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679))
+* Query compilation latency reduction 
([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), 
[SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), 
[SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+* Support Scala 2.13 
([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218))
+
+
+### Core and Spark SQL
+
+**ANSI SQL Compatibility Enhancements**
+
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* New type coercion syntax rules in ANSI mode 
([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246))
+* Support LATERAL subqueries 
([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382))
+* ANSI mode: IntegralDivide throws an exception on overflow 
([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152))
+* ANSI mode: Check for overflow in Average 
([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955))
+* Block count(table.*) to follow ANSI standard and other SQL engines 
([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199))
+
+**Performance**
+
+* Query compilation latency
+  * Support traversal pruning in transform/resolve functions and their call 
sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042))
+  * Improve the performance of mapChildren and withNewChildren methods 
([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+  * Improve the performance of type coercion rules 
([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103))
+* Query optimization
+  * Remove redundant aggregates in the Optimizer 
([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122))
+  * Push down limit through Project with Join 
([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622))
+  * Push down limit for LEFT SEMI and LEFT ANTI join 
([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), 
[SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514))
+  * Push down limit through WINDOW when partition spec is empty 
([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575))
+  * Use a relative cost comparison function in the CBO 
([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922))
+  * Cardinality estimation of union, sort, and range operator 
([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411))
+  * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as 
broadcast join 
([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081))
+  * UnwrapCastInBinaryComparison support In/InSet predicate 
([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316))
+  * Subexpression elimination enhancements 
([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448))
+  * Keep necessary stats after partition pruning

[GitHub] [spark-website] gengliangwang commented on a change in pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



gengliangwang commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r730721796



##
File path: releases/_posts/2021-10-13-spark-release-3-2-0.md
##
@@ -0,0 +1,318 @@
+---
+layout: post
+title: Spark Release 3.2.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+_edit_last: '4'
+_wpas_done_all: '1'
+---
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users 
can scale out their applications on Spark with one line code change. Other 
major updates include RocksDB StateStore support, session window support, 
push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query 
Execution (AQE) by default, and ANSI SQL mode GA.
+
+To download Apache Spark 3.2.0, visit the 
[downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA 
for the [detailed 
changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407).
 We have curated a list of high level changes here, grouped by major modules.
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+### Highlights
+
+* Support Pandas API layer on PySpark 
([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849))
+* Support push-based shuffle to improve shuffle efficiency 
([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602))
+* Add RocksDB StateStore implementation 
([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198))
+* EventTime based sessionization (session window) 
([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816))
+* ANSI SQL mode GA 
([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030))
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* Enable adaptive query execution by default 
([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679))
+* Query compilation latency reduction 
([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), 
[SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), 
[SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+* Support Scala 2.13 
([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218))
+
+
+### Core and Spark SQL
+
+**ANSI SQL Compatibility Enhancements**
+
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* New type coercion syntax rules in ANSI mode 
([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246))
+* Support LATERAL subqueries 
([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382))
+* ANSI mode: IntegralDivide throws an exception on overflow 
([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152))
+* ANSI mode: Check for overflow in Average 
([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955))
+* Block count(table.*) to follow ANSI standard and other SQL engines 
([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199))
+
+**Performance**
+
+* Query compilation latency
+  * Support traversal pruning in transform/resolve functions and their call 
sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042))
+  * Improve the performance of mapChildren and withNewChildren methods 
([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+  * Improve the performance of type coercion rules 
([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103))
+* Query optimization
+  * Remove redundant aggregates in the Optimizer 
([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122))
+  * Push down limit through Project with Join 
([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622))
+  * Push down limit for LEFT SEMI and LEFT ANTI join 
([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), 
[SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514))
+  * Push down limit through WINDOW when partition spec is empty 
([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575))
+  * Use a relative cost comparison function in the CBO 
([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922))
+  * Cardinality estimation of union, sort, and range operator 
([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411))
+  * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as 
broadcast join 
([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081))
+  * UnwrapCastInBinaryComparison support In/InSet predicate 
([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316))
+  * Subexpression elimination enhancements 
([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448))
+  * Keep necessary stats after partition pruning

[GitHub] [spark-website] cloud-fan commented on a change in pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



cloud-fan commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r730721766



##
File path: releases/_posts/2021-10-13-spark-release-3-2-0.md
##
@@ -0,0 +1,318 @@
+---
+layout: post
+title: Spark Release 3.2.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+_edit_last: '4'
+_wpas_done_all: '1'
+---
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users 
can scale out their applications on Spark with one line code change. Other 
major updates include RocksDB StateStore support, session window support, 
push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query 
Execution (AQE) by default, and ANSI SQL mode GA.
+
+To download Apache Spark 3.2.0, visit the 
[downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA 
for the [detailed 
changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407).
 We have curated a list of high level changes here, grouped by major modules.
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+### Highlights
+
+* Support Pandas API layer on PySpark 
([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849))
+* Support push-based shuffle to improve shuffle efficiency 
([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602))
+* Add RocksDB StateStore implementation 
([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198))
+* EventTime based sessionization (session window) 
([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816))
+* ANSI SQL mode GA 
([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030))
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* Enable adaptive query execution by default 
([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679))
+* Query compilation latency reduction 
([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), 
[SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), 
[SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+* Support Scala 2.13 
([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218))
+
+
+### Core and Spark SQL
+
+**ANSI SQL Compatibility Enhancements**
+
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* New type coercion syntax rules in ANSI mode 
([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246))
+* Support LATERAL subqueries 
([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382))
+* ANSI mode: IntegralDivide throws an exception on overflow 
([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152))
+* ANSI mode: Check for overflow in Average 
([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955))
+* Block count(table.*) to follow ANSI standard and other SQL engines 
([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199))
+
+**Performance**
+
+* Query compilation latency
+  * Support traversal pruning in transform/resolve functions and their call 
sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042))
+  * Improve the performance of mapChildren and withNewChildren methods 
([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+  * Improve the performance of type coercion rules 
([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103))
+* Query optimization
+  * Remove redundant aggregates in the Optimizer 
([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122))
+  * Push down limit through Project with Join 
([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622))
+  * Push down limit for LEFT SEMI and LEFT ANTI join 
([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), 
[SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514))
+  * Push down limit through WINDOW when partition spec is empty 
([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575))
+  * Use a relative cost comparison function in the CBO 
([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922))
+  * Cardinality estimation of union, sort, and range operator 
([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411))
+  * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as 
broadcast join 
([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081))
+  * UnwrapCastInBinaryComparison support In/InSet predicate 
([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316))
+  * Subexpression elimination enhancements 
([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448))
+  * Keep necessary stats after partition pruning

[GitHub] [spark-website] yaooqinn commented on a change in pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



yaooqinn commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r730716668



##
File path: releases/_posts/2021-10-13-spark-release-3-2-0.md
##
@@ -0,0 +1,318 @@
+---
+layout: post
+title: Spark Release 3.2.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+_edit_last: '4'
+_wpas_done_all: '1'
+---
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users 
can scale out their applications on Spark with one line code change. Other 
major updates include RocksDB StateStore support, session window support, 
push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query 
Execution (AQE) by default, and ANSI SQL mode GA.
+
+To download Apache Spark 3.2.0, visit the 
[downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA 
for the [detailed 
changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407).
 We have curated a list of high level changes here, grouped by major modules.
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+### Highlights
+
+* Support Pandas API layer on PySpark 
([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849))
+* Support push-based shuffle to improve shuffle efficiency 
([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602))
+* Add RocksDB StateStore implementation 
([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198))
+* EventTime based sessionization (session window) 
([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816))
+* ANSI SQL mode GA 
([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030))
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* Enable adaptive query execution by default 
([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679))
+* Query compilation latency reduction 
([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), 
[SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), 
[SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+* Support Scala 2.13 
([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218))
+
+
+### Core and Spark SQL
+
+**ANSI SQL Compatibility Enhancements**
+
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* New type coercion syntax rules in ANSI mode 
([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246))
+* Support LATERAL subqueries 
([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382))
+* ANSI mode: IntegralDivide throws an exception on overflow 
([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152))
+* ANSI mode: Check for overflow in Average 
([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955))
+* Block count(table.*) to follow ANSI standard and other SQL engines 
([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199))
+
+**Performance**
+
+* Query compilation latency
+  * Support traversal pruning in transform/resolve functions and their call 
sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042))
+  * Improve the performance of mapChildren and withNewChildren methods 
([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+  * Improve the performance of type coercion rules 
([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103))
+* Query optimization
+  * Remove redundant aggregates in the Optimizer 
([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122))
+  * Push down limit through Project with Join 
([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622))
+  * Push down limit for LEFT SEMI and LEFT ANTI join 
([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), 
[SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514))
+  * Push down limit through WINDOW when partition spec is empty 
([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575))
+  * Use a relative cost comparison function in the CBO 
([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922))
+  * Cardinality estimation of union, sort, and range operator 
([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411))
+  * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as 
broadcast join 
([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081))
+  * UnwrapCastInBinaryComparison support In/InSet predicate 
([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316))
+  * Subexpression elimination enhancements 
([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448))
+  * Keep necessary stats after partition pruning

[GitHub] [spark-website] gengliangwang commented on pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



gengliangwang commented on pull request #361:
URL: https://github.com/apache/spark-website/pull/361#issuecomment-945532927


   FYI the PySpark 3.2 is available on PyPI now: 
https://pypi.org/project/pyspark/3.2.0/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] cloud-fan commented on a change in pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



cloud-fan commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r730688457



##
File path: releases/_posts/2021-10-13-spark-release-3-2-0.md
##
@@ -0,0 +1,318 @@
+---
+layout: post
+title: Spark Release 3.2.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+_edit_last: '4'
+_wpas_done_all: '1'
+---
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users 
can scale out their applications on Spark with one line code change. Other 
major updates include RocksDB StateStore support, session window support, 
push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query 
Execution (AQE) by default, and ANSI SQL mode GA.
+
+To download Apache Spark 3.2.0, visit the 
[downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA 
for the [detailed 
changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407).
 We have curated a list of high level changes here, grouped by major modules.
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+### Highlights
+
+* Support Pandas API layer on PySpark 
([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849))
+* Support push-based shuffle to improve shuffle efficiency 
([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602))
+* Add RocksDB StateStore implementation 
([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198))
+* EventTime based sessionization (session window) 
([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816))
+* ANSI SQL mode GA 
([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030))
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* Enable adaptive query execution by default 
([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679))
+* Query compilation latency reduction 
([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), 
[SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), 
[SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+* Support Scala 2.13 
([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218))
+
+
+### Core and Spark SQL
+
+**ANSI SQL Compatibility Enhancements**
+
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* New type coercion syntax rules in ANSI mode 
([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246))
+* Support LATERAL subqueries 
([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382))
+* ANSI mode: IntegralDivide throws an exception on overflow 
([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152))
+* ANSI mode: Check for overflow in Average 
([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955))
+* Block count(table.*) to follow ANSI standard and other SQL engines 
([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199))
+
+**Performance**
+
+* Query compilation latency
+  * Support traversal pruning in transform/resolve functions and their call 
sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042))
+  * Improve the performance of mapChildren and withNewChildren methods 
([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+  * Improve the performance of type coercion rules 
([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103))
+* Query optimization
+  * Remove redundant aggregates in the Optimizer 
([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122))
+  * Push down limit through Project with Join 
([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622))
+  * Push down limit for LEFT SEMI and LEFT ANTI join 
([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), 
[SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514))
+  * Push down limit through WINDOW when partition spec is empty 
([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575))
+  * Use a relative cost comparison function in the CBO 
([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922))
+  * Cardinality estimation of union, sort, and range operator 
([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411))
+  * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as 
broadcast join 
([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081))
+  * UnwrapCastInBinaryComparison support In/InSet predicate 
([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316))
+  * Subexpression elimination enhancements 
([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448))
+  * Keep necessary stats after partition pruning

[GitHub] [spark-website] cloud-fan commented on a change in pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



cloud-fan commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r730685882



##
File path: releases/_posts/2021-10-13-spark-release-3-2-0.md
##
@@ -0,0 +1,318 @@
+---
+layout: post
+title: Spark Release 3.2.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+_edit_last: '4'
+_wpas_done_all: '1'
+---
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users 
can scale out their applications on Spark with one line code change. Other 
major updates include RocksDB StateStore support, session window support, 
push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query 
Execution (AQE) by default, and ANSI SQL mode GA.
+
+To download Apache Spark 3.2.0, visit the 
[downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA 
for the [detailed 
changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407).
 We have curated a list of high level changes here, grouped by major modules.
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+### Highlights
+
+* Support Pandas API layer on PySpark 
([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849))
+* Support push-based shuffle to improve shuffle efficiency 
([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602))
+* Add RocksDB StateStore implementation 
([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198))
+* EventTime based sessionization (session window) 
([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816))
+* ANSI SQL mode GA 
([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030))
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* Enable adaptive query execution by default 
([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679))
+* Query compilation latency reduction 
([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), 
[SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), 
[SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+* Support Scala 2.13 
([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218))
+
+
+### Core and Spark SQL
+
+**ANSI SQL Compatibility Enhancements**
+
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* New type coercion syntax rules in ANSI mode 
([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246))
+* Support LATERAL subqueries 
([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382))
+* ANSI mode: IntegralDivide throws an exception on overflow 
([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152))
+* ANSI mode: Check for overflow in Average 
([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955))
+* Block count(table.*) to follow ANSI standard and other SQL engines 
([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199))
+
+**Performance**
+
+* Query compilation latency
+  * Support traversal pruning in transform/resolve functions and their call 
sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042))
+  * Improve the performance of mapChildren and withNewChildren methods 
([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+  * Improve the performance of type coercion rules 
([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103))
+* Query optimization
+  * Remove redundant aggregates in the Optimizer 
([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122))
+  * Push down limit through Project with Join 
([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622))
+  * Push down limit for LEFT SEMI and LEFT ANTI join 
([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), 
[SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514))
+  * Push down limit through WINDOW when partition spec is empty 
([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575))
+  * Use a relative cost comparison function in the CBO 
([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922))
+  * Cardinality estimation of union, sort, and range operator 
([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411))
+  * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as 
broadcast join 
([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081))
+  * UnwrapCastInBinaryComparison support In/InSet predicate 
([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316))
+  * Subexpression elimination enhancements 
([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448))
+  * Keep necessary stats after partition pruning

[GitHub] [spark-website] cloud-fan commented on a change in pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



cloud-fan commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r730684820



##
File path: releases/_posts/2021-10-13-spark-release-3-2-0.md
##
@@ -0,0 +1,318 @@
+---
+layout: post
+title: Spark Release 3.2.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+_edit_last: '4'
+_wpas_done_all: '1'
+---
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users 
can scale out their applications on Spark with one line code change. Other 
major updates include RocksDB StateStore support, session window support, 
push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query 
Execution (AQE) by default, and ANSI SQL mode GA.
+
+To download Apache Spark 3.2.0, visit the 
[downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA 
for the [detailed 
changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407).
 We have curated a list of high level changes here, grouped by major modules.
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+### Highlights
+
+* Support Pandas API layer on PySpark 
([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849))
+* Support push-based shuffle to improve shuffle efficiency 
([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602))
+* Add RocksDB StateStore implementation 
([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198))
+* EventTime based sessionization (session window) 
([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816))
+* ANSI SQL mode GA 
([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030))
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* Enable adaptive query execution by default 
([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679))
+* Query compilation latency reduction 
([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), 
[SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), 
[SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+* Support Scala 2.13 
([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218))
+
+
+### Core and Spark SQL
+
+**ANSI SQL Compatibility Enhancements**
+
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* New type coercion syntax rules in ANSI mode 
([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246))
+* Support LATERAL subqueries 
([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382))
+* ANSI mode: IntegralDivide throws an exception on overflow 
([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152))
+* ANSI mode: Check for overflow in Average 
([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955))
+* Block count(table.*) to follow ANSI standard and other SQL engines 
([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199))
+
+**Performance**
+
+* Query compilation latency
+  * Support traversal pruning in transform/resolve functions and their call 
sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042))
+  * Improve the performance of mapChildren and withNewChildren methods 
([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+  * Improve the performance of type coercion rules 
([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103))
+* Query optimization
+  * Remove redundant aggregates in the Optimizer 
([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122))
+  * Push down limit through Project with Join 
([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622))
+  * Push down limit for LEFT SEMI and LEFT ANTI join 
([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), 
[SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514))
+  * Push down limit through WINDOW when partition spec is empty 
([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575))
+  * Use a relative cost comparison function in the CBO 
([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922))
+  * Cardinality estimation of union, sort, and range operator 
([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411))
+  * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as 
broadcast join 
([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081))
+  * UnwrapCastInBinaryComparison support In/InSet predicate 
([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316))
+  * Subexpression elimination enhancements 
([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448))
+  * Keep necessary stats after partition pruning

[GitHub] [spark-website] h-vetinari commented on pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



h-vetinari commented on pull request #361:
URL: https://github.com/apache/spark-website/pull/361#issuecomment-945483919


   > @gengliangwang Have we pushed it to Conda?
   
   Hey all, I started a pull-request to package pyspark 3.2 for conda-forge 
(normally this would have been done sooner already, but the automated bot was 
waiting for the PyPI upload): 
https://github.com/conda-forge/pyspark-feedstock/pull/31
   
   It turns out that pyspark specifies `py4j==0.10.9.2`, whereas conda-forge 
currently only has `py4j==0.10.9`. Building both these packages should take a 
couple of hours, depending on how fast I can get people to merge the PRs.
   
   After the packages have been built (and the packages become available 
through the content delivery network; takes about an hour upon CI completion), 
it would be possible to install pyspark through conda-forge as follows:
   ```
   conda install -c conda-forge pyspark=3.2
   ```
   
   Hope this helps.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] gengliangwang commented on a change in pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



gengliangwang commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r730621527



##
File path: releases/_posts/2021-10-13-spark-release-3-2-0.md
##
@@ -0,0 +1,318 @@
+---
+layout: post
+title: Spark Release 3.2.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+_edit_last: '4'
+_wpas_done_all: '1'
+---
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users 
can scale out their applications on Spark with one line code change. Other 
major updates include RocksDB StateStore support, session window support, 
push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query 
Execution (AQE) by default, and ANSI SQL mode GA.
+
+To download Apache Spark 3.2.0, visit the 
[downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA 
for the [detailed 
changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407).
 We have curated a list of high level changes here, grouped by major modules.
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+### Highlights
+
+* Support Pandas API layer on PySpark 
([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849))
+* Support push-based shuffle to improve shuffle efficiency 
([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602))
+* Add RocksDB StateStore implementation 
([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198))
+* EventTime based sessionization (session window) 
([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816))
+* ANSI SQL mode GA 
([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030))
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* Enable adaptive query execution by default 
([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679))
+* Query compilation latency reduction 
([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), 
[SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), 
[SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+* Support Scala 2.13 
([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218))
+
+
+### Core and Spark SQL
+
+**ANSI SQL Compatibility Enhancements**
+
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* New type coercion syntax rules in ANSI mode 
([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246))
+* Support LATERAL subqueries 
([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382))
+* ANSI mode: IntegralDivide throws an exception on overflow 
([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152))
+* ANSI mode: Check for overflow in Average 
([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955))
+* Block count(table.*) to follow ANSI standard and other SQL engines 
([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199))
+
+**Performance**
+
+* Query compilation latency
+  * Support traversal pruning in transform/resolve functions and their call 
sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042))
+  * Improve the performance of mapChildren and withNewChildren methods 
([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+  * Improve the performance of type coercion rules 
([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103))
+* Query optimization
+  * Remove redundant aggregates in the Optimizer 
([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122))
+  * Push down limit through Project with Join 
([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622))
+  * Push down limit for LEFT SEMI and LEFT ANTI join 
([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), 
[SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514))
+  * Push down limit through WINDOW when partition spec is empty 
([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575))
+  * Use a relative cost comparison function in the CBO 
([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922))
+  * Cardinality estimation of union, sort, and range operator 
([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411))
+  * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as 
broadcast join 
([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081))
+  * UnwrapCastInBinaryComparison support In/InSet predicate 
([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316))
+  * Subexpression elimination enhancements 
([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448))
+  * Keep necessary stats after partition pruning

[GitHub] [spark-website] gengliangwang commented on a change in pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



gengliangwang commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r730621346



##
File path: releases/_posts/2021-10-13-spark-release-3-2-0.md
##
@@ -0,0 +1,318 @@
+---
+layout: post
+title: Spark Release 3.2.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+_edit_last: '4'
+_wpas_done_all: '1'
+---
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users 
can scale out their applications on Spark with one line code change. Other 
major updates include RocksDB StateStore support, session window support, 
push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query 
Execution (AQE) by default, and ANSI SQL mode GA.
+
+To download Apache Spark 3.2.0, visit the 
[downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA 
for the [detailed 
changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407).
 We have curated a list of high level changes here, grouped by major modules.
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+### Highlights
+
+* Support Pandas API layer on PySpark 
([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849))
+* Support push-based shuffle to improve shuffle efficiency 
([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602))
+* Add RocksDB StateStore implementation 
([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198))
+* EventTime based sessionization (session window) 
([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816))
+* ANSI SQL mode GA 
([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030))
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* Enable adaptive query execution by default 
([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679))
+* Query compilation latency reduction 
([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), 
[SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), 
[SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+* Support Scala 2.13 
([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218))
+
+
+### Core and Spark SQL
+
+**ANSI SQL Compatibility Enhancements**
+
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* New type coercion syntax rules in ANSI mode 
([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246))
+* Support LATERAL subqueries 
([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382))
+* ANSI mode: IntegralDivide throws an exception on overflow 
([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152))
+* ANSI mode: Check for overflow in Average 
([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955))
+* Block count(table.*) to follow ANSI standard and other SQL engines 
([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199))
+
+**Performance**
+
+* Query compilation latency
+  * Support traversal pruning in transform/resolve functions and their call 
sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042))
+  * Improve the performance of mapChildren and withNewChildren methods 
([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+  * Improve the performance of type coercion rules 
([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103))
+* Query optimization
+  * Remove redundant aggregates in the Optimizer 
([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122))
+  * Push down limit through Project with Join 
([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622))
+  * Push down limit for LEFT SEMI and LEFT ANTI join 
([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), 
[SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514))
+  * Push down limit through WINDOW when partition spec is empty 
([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575))
+  * Use a relative cost comparison function in the CBO 
([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922))
+  * Cardinality estimation of union, sort, and range operator 
([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411))
+  * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as 
broadcast join 
([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081))
+  * UnwrapCastInBinaryComparison support In/InSet predicate 
([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316))
+  * Subexpression elimination enhancements 
([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448))
+  * Keep necessary stats after partition pruning

[GitHub] [spark-website] gengliangwang commented on a change in pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



gengliangwang commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r730617091



##
File path: site/releases/spark-release-3-2-0.html
##
@@ -0,0 +1,623 @@
+
+
+
+  
+  
+  
+
+  
+ Spark Release 3.2.0 | Apache Spark
+
+  
+
+  
+
+  
+
+  https://cdn.jsdelivr.net/npm/bootstrap@5.0.2/dist/css/bootstrap.min.css; 
rel="stylesheet"
+
integrity="sha384-EVSTQN3/azprG1Anm3QDgpJLIm9Nao0Yz1ztcQTwFspd3yD65VohhpuuCOmLASjC"
 crossorigin="anonymous">
+  https://fonts.googleapis.com;>
+  https://fonts.gstatic.com; crossorigin>
+  https://fonts.googleapis.com/css2?family=DM+Sans:ital,wght@0,400;0,500;0,700;1,400;1,500;1,700+Prime:wght@400;700=swap;
 rel="stylesheet">
+  
+  
+  
+  
+
+  
+  
+  var _gaq = _gaq || [];
+  _gaq.push(['_setAccount', 'UA-32518208-2']);
+  _gaq.push(['_trackPageview']);
+  (function() {
+var ga = document.createElement('script'); ga.type = 'text/javascript'; 
ga.async = true;
+ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 
'http://www') + '.google-analytics.com/ga.js';
+var s = document.getElementsByTagName('script')[0]; 
s.parentNode.insertBefore(ga, s);
+  })();
+
+  
+  function trackOutboundLink(link, category, action) {
+try {
+  _gaq.push(['_trackEvent', category , action]);
+} catch(err){}
+
+setTimeout(function() {
+  document.location.href = link.href;
+}, 100);
+  }
+  
+
+
+
+  
+
+  
+  
+
+  
+  
+
+
+  
+Download
+  
+  
+
+  Libraries
+
+
+  SQL and DataFrames
+  Spark 
Streaming
+  MLlib (machine 
learning)
+  GraphX (graph)
+  
+
+  
+  Third-Party Projects
+
+  
+  
+
+  Documentation
+
+
+  Latest Release 
(Spark 3.2.0)
+  Older 
Versions and Other Resources
+  Frequently Asked 
Questions
+
+  
+  
+Examples
+  
+  
+
+  Community
+
+
+  Mailing Lists 
 Resources
+  Contributing 
to Spark
+  Improvement Proposals (SPIP)
+  
+  https://issues.apache.org/jira/browse/SPARK;>Issue Tracker
+  
+  Powered 
By
+  Project 
Committers
+  Project 
History
+
+  
+  
+
+  Developers
+
+
+  Useful 
Developer Tools
+  Versioning Policy
+  Release 
Process
+  Security
+
+  
+
+
+  
+
+  Apache Software Foundation
+
+
+  https://www.apache.org/;>Apache 
Homepage
+  https://www.apache.org/licenses/;>License
+  https://www.apache.org/foundation/sponsorship.html;>Sponsorship
+  https://www.apache.org/foundation/thanks.html;>Thanks
+  https://www.apache.org/security/;>Security
+
+  
+
+  
+
+
+
+  
+
+  Spark Release 3.2.0
+
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users 
can scale out their applications on Spark with one line code change. Other 
major updates include RocksDB StateStore support, session window support, 
push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query 
Execution (AQE) by default, and ANSI SQL mode GA.

Review comment:
   Yes, I am following the official website.
   cc @HyukjinKwon @zero323 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] sarutak commented on a change in pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



sarutak commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r730603780



##
File path: site/releases/spark-release-3-2-0.html
##
@@ -0,0 +1,623 @@
+
+
+
+  
+  
+  
+
+  
+ Spark Release 3.2.0 | Apache Spark
+
+  
+
+  
+
+  
+
+  https://cdn.jsdelivr.net/npm/bootstrap@5.0.2/dist/css/bootstrap.min.css; 
rel="stylesheet"
+
integrity="sha384-EVSTQN3/azprG1Anm3QDgpJLIm9Nao0Yz1ztcQTwFspd3yD65VohhpuuCOmLASjC"
 crossorigin="anonymous">
+  https://fonts.googleapis.com;>
+  https://fonts.gstatic.com; crossorigin>
+  https://fonts.googleapis.com/css2?family=DM+Sans:ital,wght@0,400;0,500;0,700;1,400;1,500;1,700+Prime:wght@400;700=swap;
 rel="stylesheet">
+  
+  
+  
+  
+
+  
+  
+  var _gaq = _gaq || [];
+  _gaq.push(['_setAccount', 'UA-32518208-2']);
+  _gaq.push(['_trackPageview']);
+  (function() {
+var ga = document.createElement('script'); ga.type = 'text/javascript'; 
ga.async = true;
+ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 
'http://www') + '.google-analytics.com/ga.js';
+var s = document.getElementsByTagName('script')[0]; 
s.parentNode.insertBefore(ga, s);
+  })();
+
+  
+  function trackOutboundLink(link, category, action) {
+try {
+  _gaq.push(['_trackEvent', category , action]);
+} catch(err){}
+
+setTimeout(function() {
+  document.location.href = link.href;
+}, 100);
+  }
+  
+
+
+
+  
+
+  
+  
+
+  
+  
+
+
+  
+Download
+  
+  
+
+  Libraries
+
+
+  SQL and DataFrames
+  Spark 
Streaming
+  MLlib (machine 
learning)
+  GraphX (graph)
+  
+
+  
+  Third-Party Projects
+
+  
+  
+
+  Documentation
+
+
+  Latest Release 
(Spark 3.2.0)
+  Older 
Versions and Other Resources
+  Frequently Asked 
Questions
+
+  
+  
+Examples
+  
+  
+
+  Community
+
+
+  Mailing Lists 
 Resources
+  Contributing 
to Spark
+  Improvement Proposals (SPIP)
+  
+  https://issues.apache.org/jira/browse/SPARK;>Issue Tracker
+  
+  Powered 
By
+  Project 
Committers
+  Project 
History
+
+  
+  
+
+  Developers
+
+
+  Useful 
Developer Tools
+  Versioning Policy
+  Release 
Process
+  Security
+
+  
+
+
+  
+
+  Apache Software Foundation
+
+
+  https://www.apache.org/;>Apache 
Homepage
+  https://www.apache.org/licenses/;>License
+  https://www.apache.org/foundation/sponsorship.html;>Sponsorship
+  https://www.apache.org/foundation/thanks.html;>Thanks
+  https://www.apache.org/security/;>Security
+
+  
+
+  
+
+
+
+  
+
+  Spark Release 3.2.0
+
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users 
can scale out their applications on Spark with one line code change. Other 
major updates include RocksDB StateStore support, session window support, 
push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query 
Execution (AQE) by default, and ANSI SQL mode GA.
+
+To download Apache Spark 3.2.0, visit the https://spark.apache.org/downloads.html;>downloads page. You can 
consult JIRA for the https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420version=12349407;>detailed
 changes. We have curated a list of high level changes here, grouped by 
major modules.
+
+
+  Highlights
+  Core 
and Spark SQL
+  Structured Streaming
+  PySpark
+  MLLIB
+  SparkR
+  GraphX
+  Deprecations and Removals
+  Known 
Issues
+  Credits
+
+
+Highlights
+
+
+  Support Pandas API layer on PySpark (https://issues.apache.org/jira/browse/SPARK-34849;>SPARK-34849)
+  Support push-based shuffle to improve shuffle efficiency (https://issues.apache.org/jira/browse/SPARK-30602;>SPARK-30602)
+  Add RocksDB StateStore implementation (https://issues.apache.org/jira/browse/SPARK-34198;>SPARK-34198)
+  EventTime based sessionization (session window) (https://issues.apache.org/jira/browse/SPARK-10816;>SPARK-10816)
+  ANSI SQL mode GA (https://issues.apache.org/jira/browse/SPARK-35030;>SPARK-35030)
+  Support for ANSI SQL INTERVAL types (https://issues.apache.org/jira/browse/SPARK-27790;>SPARK-27790)
+  Enable adaptive query execution by default

[GitHub] [spark-website] gengliangwang commented on a change in pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



gengliangwang commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r730612896



##
File path: releases/_posts/2021-10-13-spark-release-3-2-0.md
##
@@ -0,0 +1,318 @@
+---
+layout: post
+title: Spark Release 3.2.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+_edit_last: '4'
+_wpas_done_all: '1'
+---
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users 
can scale out their applications on Spark with one line code change. Other 
major updates include RocksDB StateStore support, session window support, 
push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query 
Execution (AQE) by default, and ANSI SQL mode GA.
+
+To download Apache Spark 3.2.0, visit the 
[downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA 
for the [detailed 
changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407).
 We have curated a list of high level changes here, grouped by major modules.

Review comment:
   It means the JIRA website. This is the same with the release note of 
3.1.1 and 3.0.0.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] gengliangwang commented on a change in pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



gengliangwang commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r730606615



##
File path: releases/_posts/2021-10-13-spark-release-3-2-0.md
##
@@ -0,0 +1,318 @@
+---
+layout: post
+title: Spark Release 3.2.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+_edit_last: '4'
+_wpas_done_all: '1'
+---
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users 
can scale out their applications on Spark with one line code change. Other 
major updates include RocksDB StateStore support, session window support, 
push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query 
Execution (AQE) by default, and ANSI SQL mode GA.
+
+To download Apache Spark 3.2.0, visit the 
[downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA 
for the [detailed 
changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407).
 We have curated a list of high level changes here, grouped by major modules.
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+### Highlights
+
+* Support Pandas API layer on PySpark 
([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849))
+* Support push-based shuffle to improve shuffle efficiency 
([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602))
+* Add RocksDB StateStore implementation 
([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198))
+* EventTime based sessionization (session window) 
([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816))
+* ANSI SQL mode GA 
([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030))
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* Enable adaptive query execution by default 
([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679))
+* Query compilation latency reduction 
([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), 
[SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), 
[SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+* Support Scala 2.13 
([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218))
+
+
+### Core and Spark SQL
+
+**ANSI SQL Compatibility Enhancements**
+
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* New type coercion syntax rules in ANSI mode 
([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246))
+* ANSI mode: IntegralDivide throws exception on overflow 
([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152))
+* ANSI mode: Check for overflow in Average 
([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955))
+* Block count(table.*) to follow ANSI standard and other SQL engines 
([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199))
+
+**Performance**
+
+* Query compilation latency
+  * Support traversal pruning in transform/resolve functions and their call 
sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042))
+  * Improve the performance of mapChildren and withNewChildren methods 
([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+  * Improve the performance of type coercion rules 
([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103))
+* Query optimization
+  * Remove redundant aggregates in the Optimizer 
([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122))
+  * Push down limit through Project with Join 
([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622))
+  * Push down limit for LEFT SEMI and LEFT ANTI join 
([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), 
[SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514))
+  * Push down limit through window when partitionSpec is empty 
([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575))
+  * Use a relative cost comparison function in the CBO 
([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922))
+  * Cardinality estimation of union, sort and range operator 
([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411))
+  * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as 
broadcast join 
([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081))
+  * UnwrapCastInBinaryComparison support In/InSet predicate 
([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316))
+  * Subexpression elimination enhancements 
([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448))
+  * Keep necessary stats after partition pruning 
([SPARK-34119](https://issues.apache.org/jira/browse/SPARK-34119))
+  * Decouple bucket filter pruning

[GitHub] [spark-website] sarutak commented on a change in pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



sarutak commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r730600301



##
File path: site/releases/spark-release-3-2-0.html
##
@@ -0,0 +1,623 @@
+
+
+
+  
+  
+  
+
+  
+ Spark Release 3.2.0 | Apache Spark
+
+  
+
+  
+
+  
+
+  https://cdn.jsdelivr.net/npm/bootstrap@5.0.2/dist/css/bootstrap.min.css; 
rel="stylesheet"
+
integrity="sha384-EVSTQN3/azprG1Anm3QDgpJLIm9Nao0Yz1ztcQTwFspd3yD65VohhpuuCOmLASjC"
 crossorigin="anonymous">
+  https://fonts.googleapis.com;>
+  https://fonts.gstatic.com; crossorigin>
+  https://fonts.googleapis.com/css2?family=DM+Sans:ital,wght@0,400;0,500;0,700;1,400;1,500;1,700+Prime:wght@400;700=swap;
 rel="stylesheet">
+  
+  
+  
+  
+
+  
+  
+  var _gaq = _gaq || [];
+  _gaq.push(['_setAccount', 'UA-32518208-2']);
+  _gaq.push(['_trackPageview']);
+  (function() {
+var ga = document.createElement('script'); ga.type = 'text/javascript'; 
ga.async = true;
+ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 
'http://www') + '.google-analytics.com/ga.js';
+var s = document.getElementsByTagName('script')[0]; 
s.parentNode.insertBefore(ga, s);
+  })();
+
+  
+  function trackOutboundLink(link, category, action) {
+try {
+  _gaq.push(['_trackEvent', category , action]);
+} catch(err){}
+
+setTimeout(function() {
+  document.location.href = link.href;
+}, 100);
+  }
+  
+
+
+
+  
+
+  
+  
+
+  
+  
+
+
+  
+Download
+  
+  
+
+  Libraries
+
+
+  SQL and DataFrames
+  Spark 
Streaming
+  MLlib (machine 
learning)
+  GraphX (graph)
+  
+
+  
+  Third-Party Projects
+
+  
+  
+
+  Documentation
+
+
+  Latest Release 
(Spark 3.2.0)
+  Older 
Versions and Other Resources
+  Frequently Asked 
Questions
+
+  
+  
+Examples
+  
+  
+
+  Community
+
+
+  Mailing Lists 
 Resources
+  Contributing 
to Spark
+  Improvement Proposals (SPIP)
+  
+  https://issues.apache.org/jira/browse/SPARK;>Issue Tracker
+  
+  Powered 
By
+  Project 
Committers
+  Project 
History
+
+  
+  
+
+  Developers
+
+
+  Useful 
Developer Tools
+  Versioning Policy
+  Release 
Process
+  Security
+
+  
+
+
+  
+
+  Apache Software Foundation
+
+
+  https://www.apache.org/;>Apache 
Homepage
+  https://www.apache.org/licenses/;>License
+  https://www.apache.org/foundation/sponsorship.html;>Sponsorship
+  https://www.apache.org/foundation/thanks.html;>Thanks
+  https://www.apache.org/security/;>Security
+
+  
+
+  
+
+
+
+  
+
+  Spark Release 3.2.0
+
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users 
can scale out their applications on Spark with one line code change. Other 
major updates include RocksDB StateStore support, session window support, 
push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query 
Execution (AQE) by default, and ANSI SQL mode GA.

Review comment:
   Should we write `pandas` instead of `Pandas`?
   I don't know whether the capitalized notation is officially recognized or 
not, but all the occurrence of `pandas` in the official web site is not 
capitalized.
   https://pandas.pydata.org/




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] sarutak commented on a change in pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



sarutak commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r730600301



##
File path: site/releases/spark-release-3-2-0.html
##
@@ -0,0 +1,623 @@
+
+
+
+  
+  
+  
+
+  
+ Spark Release 3.2.0 | Apache Spark
+
+  
+
+  
+
+  
+
+  https://cdn.jsdelivr.net/npm/bootstrap@5.0.2/dist/css/bootstrap.min.css; 
rel="stylesheet"
+
integrity="sha384-EVSTQN3/azprG1Anm3QDgpJLIm9Nao0Yz1ztcQTwFspd3yD65VohhpuuCOmLASjC"
 crossorigin="anonymous">
+  https://fonts.googleapis.com;>
+  https://fonts.gstatic.com; crossorigin>
+  https://fonts.googleapis.com/css2?family=DM+Sans:ital,wght@0,400;0,500;0,700;1,400;1,500;1,700+Prime:wght@400;700=swap;
 rel="stylesheet">
+  
+  
+  
+  
+
+  
+  
+  var _gaq = _gaq || [];
+  _gaq.push(['_setAccount', 'UA-32518208-2']);
+  _gaq.push(['_trackPageview']);
+  (function() {
+var ga = document.createElement('script'); ga.type = 'text/javascript'; 
ga.async = true;
+ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 
'http://www') + '.google-analytics.com/ga.js';
+var s = document.getElementsByTagName('script')[0]; 
s.parentNode.insertBefore(ga, s);
+  })();
+
+  
+  function trackOutboundLink(link, category, action) {
+try {
+  _gaq.push(['_trackEvent', category , action]);
+} catch(err){}
+
+setTimeout(function() {
+  document.location.href = link.href;
+}, 100);
+  }
+  
+
+
+
+  
+
+  
+  
+
+  
+  
+
+
+  
+Download
+  
+  
+
+  Libraries
+
+
+  SQL and DataFrames
+  Spark 
Streaming
+  MLlib (machine 
learning)
+  GraphX (graph)
+  
+
+  
+  Third-Party Projects
+
+  
+  
+
+  Documentation
+
+
+  Latest Release 
(Spark 3.2.0)
+  Older 
Versions and Other Resources
+  Frequently Asked 
Questions
+
+  
+  
+Examples
+  
+  
+
+  Community
+
+
+  Mailing Lists 
 Resources
+  Contributing 
to Spark
+  Improvement Proposals (SPIP)
+  
+  https://issues.apache.org/jira/browse/SPARK;>Issue Tracker
+  
+  Powered 
By
+  Project 
Committers
+  Project 
History
+
+  
+  
+
+  Developers
+
+
+  Useful 
Developer Tools
+  Versioning Policy
+  Release 
Process
+  Security
+
+  
+
+
+  
+
+  Apache Software Foundation
+
+
+  https://www.apache.org/;>Apache 
Homepage
+  https://www.apache.org/licenses/;>License
+  https://www.apache.org/foundation/sponsorship.html;>Sponsorship
+  https://www.apache.org/foundation/thanks.html;>Thanks
+  https://www.apache.org/security/;>Security
+
+  
+
+  
+
+
+
+  
+
+  Spark Release 3.2.0
+
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users 
can scale out their applications on Spark with one line code change. Other 
major updates include RocksDB StateStore support, session window support, 
push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query 
Execution (AQE) by default, and ANSI SQL mode GA.

Review comment:
   Should we write `pandas` instead of `Pandas`?
   I don't know whether the capitalized notation is officially recognized or 
not, but all the occurrence of `pandas` in the official web site is not 
capitalized.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] viirya commented on a change in pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



viirya commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r730593631



##
File path: releases/_posts/2021-10-13-spark-release-3-2-0.md
##
@@ -0,0 +1,318 @@
+---
+layout: post
+title: Spark Release 3.2.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+_edit_last: '4'
+_wpas_done_all: '1'
+---
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users 
can scale out their applications on Spark with one line code change. Other 
major updates include RocksDB StateStore support, session window support, 
push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query 
Execution (AQE) by default, and ANSI SQL mode GA.
+
+To download Apache Spark 3.2.0, visit the 
[downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA 
for the [detailed 
changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407).
 We have curated a list of high level changes here, grouped by major modules.

Review comment:
   consult JIRA or consult JIRAs?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] viirya commented on a change in pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



viirya commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r730597510



##
File path: releases/_posts/2021-10-13-spark-release-3-2-0.md
##
@@ -0,0 +1,318 @@
+---
+layout: post
+title: Spark Release 3.2.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+_edit_last: '4'
+_wpas_done_all: '1'
+---
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users 
can scale out their applications on Spark with one line code change. Other 
major updates include RocksDB StateStore support, session window support, 
push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query 
Execution (AQE) by default, and ANSI SQL mode GA.
+
+To download Apache Spark 3.2.0, visit the 
[downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA 
for the [detailed 
changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407).
 We have curated a list of high level changes here, grouped by major modules.
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+### Highlights
+
+* Support Pandas API layer on PySpark 
([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849))
+* Support push-based shuffle to improve shuffle efficiency 
([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602))
+* Add RocksDB StateStore implementation 
([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198))
+* EventTime based sessionization (session window) 
([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816))
+* ANSI SQL mode GA 
([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030))
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* Enable adaptive query execution by default 
([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679))
+* Query compilation latency reduction 
([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), 
[SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), 
[SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+* Support Scala 2.13 
([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218))
+
+
+### Core and Spark SQL
+
+**ANSI SQL Compatibility Enhancements**
+
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* New type coercion syntax rules in ANSI mode 
([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246))
+* Support LATERAL subqueries 
([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382))
+* ANSI mode: IntegralDivide throws an exception on overflow 
([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152))
+* ANSI mode: Check for overflow in Average 
([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955))
+* Block count(table.*) to follow ANSI standard and other SQL engines 
([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199))
+
+**Performance**
+
+* Query compilation latency
+  * Support traversal pruning in transform/resolve functions and their call 
sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042))
+  * Improve the performance of mapChildren and withNewChildren methods 
([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+  * Improve the performance of type coercion rules 
([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103))
+* Query optimization
+  * Remove redundant aggregates in the Optimizer 
([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122))
+  * Push down limit through Project with Join 
([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622))
+  * Push down limit for LEFT SEMI and LEFT ANTI join 
([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), 
[SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514))
+  * Push down limit through WINDOW when partition spec is empty 
([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575))
+  * Use a relative cost comparison function in the CBO 
([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922))
+  * Cardinality estimation of union, sort, and range operator 
([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411))
+  * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as 
broadcast join 
([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081))
+  * UnwrapCastInBinaryComparison support In/InSet predicate 
([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316))
+  * Subexpression elimination enhancements 
([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448))
+  * Keep necessary stats after partition pruning

[GitHub] [spark-website] viirya commented on a change in pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



viirya commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r730596615



##
File path: releases/_posts/2021-10-13-spark-release-3-2-0.md
##
@@ -0,0 +1,318 @@
+---
+layout: post
+title: Spark Release 3.2.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+_edit_last: '4'
+_wpas_done_all: '1'
+---
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users 
can scale out their applications on Spark with one line code change. Other 
major updates include RocksDB StateStore support, session window support, 
push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query 
Execution (AQE) by default, and ANSI SQL mode GA.
+
+To download Apache Spark 3.2.0, visit the 
[downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA 
for the [detailed 
changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407).
 We have curated a list of high level changes here, grouped by major modules.
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+### Highlights
+
+* Support Pandas API layer on PySpark 
([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849))
+* Support push-based shuffle to improve shuffle efficiency 
([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602))
+* Add RocksDB StateStore implementation 
([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198))
+* EventTime based sessionization (session window) 
([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816))
+* ANSI SQL mode GA 
([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030))
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* Enable adaptive query execution by default 
([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679))
+* Query compilation latency reduction 
([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), 
[SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), 
[SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+* Support Scala 2.13 
([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218))
+
+
+### Core and Spark SQL
+
+**ANSI SQL Compatibility Enhancements**
+
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* New type coercion syntax rules in ANSI mode 
([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246))
+* Support LATERAL subqueries 
([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382))
+* ANSI mode: IntegralDivide throws an exception on overflow 
([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152))
+* ANSI mode: Check for overflow in Average 
([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955))
+* Block count(table.*) to follow ANSI standard and other SQL engines 
([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199))
+
+**Performance**
+
+* Query compilation latency
+  * Support traversal pruning in transform/resolve functions and their call 
sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042))
+  * Improve the performance of mapChildren and withNewChildren methods 
([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+  * Improve the performance of type coercion rules 
([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103))
+* Query optimization
+  * Remove redundant aggregates in the Optimizer 
([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122))
+  * Push down limit through Project with Join 
([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622))
+  * Push down limit for LEFT SEMI and LEFT ANTI join 
([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), 
[SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514))
+  * Push down limit through WINDOW when partition spec is empty 
([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575))
+  * Use a relative cost comparison function in the CBO 
([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922))
+  * Cardinality estimation of union, sort, and range operator 
([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411))
+  * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as 
broadcast join 
([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081))
+  * UnwrapCastInBinaryComparison support In/InSet predicate 
([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316))
+  * Subexpression elimination enhancements 
([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448))
+  * Keep necessary stats after partition pruning

[GitHub] [spark-website] viirya commented on a change in pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



viirya commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r730595822



##
File path: releases/_posts/2021-10-13-spark-release-3-2-0.md
##
@@ -0,0 +1,318 @@
+---
+layout: post
+title: Spark Release 3.2.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+_edit_last: '4'
+_wpas_done_all: '1'
+---
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users 
can scale out their applications on Spark with one line code change. Other 
major updates include RocksDB StateStore support, session window support, 
push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query 
Execution (AQE) by default, and ANSI SQL mode GA.
+
+To download Apache Spark 3.2.0, visit the 
[downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA 
for the [detailed 
changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407).
 We have curated a list of high level changes here, grouped by major modules.
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+### Highlights
+
+* Support Pandas API layer on PySpark 
([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849))
+* Support push-based shuffle to improve shuffle efficiency 
([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602))
+* Add RocksDB StateStore implementation 
([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198))
+* EventTime based sessionization (session window) 
([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816))
+* ANSI SQL mode GA 
([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030))
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* Enable adaptive query execution by default 
([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679))
+* Query compilation latency reduction 
([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), 
[SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), 
[SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+* Support Scala 2.13 
([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218))
+
+
+### Core and Spark SQL
+
+**ANSI SQL Compatibility Enhancements**
+
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* New type coercion syntax rules in ANSI mode 
([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246))
+* Support LATERAL subqueries 
([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382))
+* ANSI mode: IntegralDivide throws an exception on overflow 
([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152))
+* ANSI mode: Check for overflow in Average 
([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955))
+* Block count(table.*) to follow ANSI standard and other SQL engines 
([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199))
+
+**Performance**
+
+* Query compilation latency
+  * Support traversal pruning in transform/resolve functions and their call 
sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042))
+  * Improve the performance of mapChildren and withNewChildren methods 
([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+  * Improve the performance of type coercion rules 
([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103))
+* Query optimization
+  * Remove redundant aggregates in the Optimizer 
([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122))
+  * Push down limit through Project with Join 
([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622))
+  * Push down limit for LEFT SEMI and LEFT ANTI join 
([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), 
[SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514))
+  * Push down limit through WINDOW when partition spec is empty 
([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575))
+  * Use a relative cost comparison function in the CBO 
([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922))
+  * Cardinality estimation of union, sort, and range operator 
([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411))
+  * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as 
broadcast join 
([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081))
+  * UnwrapCastInBinaryComparison support In/InSet predicate 
([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316))
+  * Subexpression elimination enhancements 
([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448))
+  * Keep necessary stats after partition pruning

[GitHub] [spark-website] viirya commented on a change in pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



viirya commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r730595393



##
File path: releases/_posts/2021-10-13-spark-release-3-2-0.md
##
@@ -0,0 +1,318 @@
+---
+layout: post
+title: Spark Release 3.2.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+_edit_last: '4'
+_wpas_done_all: '1'
+---
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users 
can scale out their applications on Spark with one line code change. Other 
major updates include RocksDB StateStore support, session window support, 
push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query 
Execution (AQE) by default, and ANSI SQL mode GA.
+
+To download Apache Spark 3.2.0, visit the 
[downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA 
for the [detailed 
changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407).
 We have curated a list of high level changes here, grouped by major modules.
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+### Highlights
+
+* Support Pandas API layer on PySpark 
([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849))
+* Support push-based shuffle to improve shuffle efficiency 
([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602))
+* Add RocksDB StateStore implementation 
([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198))
+* EventTime based sessionization (session window) 
([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816))
+* ANSI SQL mode GA 
([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030))
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* Enable adaptive query execution by default 
([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679))
+* Query compilation latency reduction 
([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), 
[SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), 
[SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+* Support Scala 2.13 
([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218))
+
+
+### Core and Spark SQL
+
+**ANSI SQL Compatibility Enhancements**
+
+* Support for ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* New type coercion syntax rules in ANSI mode 
([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246))
+* Support LATERAL subqueries 
([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382))
+* ANSI mode: IntegralDivide throws an exception on overflow 
([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152))
+* ANSI mode: Check for overflow in Average 
([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955))
+* Block count(table.*) to follow ANSI standard and other SQL engines 
([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199))
+
+**Performance**
+
+* Query compilation latency
+  * Support traversal pruning in transform/resolve functions and their call 
sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042))
+  * Improve the performance of mapChildren and withNewChildren methods 
([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+  * Improve the performance of type coercion rules 
([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103))
+* Query optimization
+  * Remove redundant aggregates in the Optimizer 
([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122))
+  * Push down limit through Project with Join 
([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622))
+  * Push down limit for LEFT SEMI and LEFT ANTI join 
([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), 
[SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514))
+  * Push down limit through WINDOW when partition spec is empty 
([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575))
+  * Use a relative cost comparison function in the CBO 
([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922))
+  * Cardinality estimation of union, sort, and range operator 
([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411))
+  * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as 
broadcast join 
([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081))
+  * UnwrapCastInBinaryComparison support In/InSet predicate 
([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316))
+  * Subexpression elimination enhancements 
([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448))
+  * Keep necessary stats after partition pruning

[GitHub] [spark-website] viirya commented on a change in pull request #361: Add 3.2.0 release note and news and update links

2021-10-18 Thread GitBox



viirya commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r730593631



##
File path: releases/_posts/2021-10-13-spark-release-3-2-0.md
##
@@ -0,0 +1,318 @@
+---
+layout: post
+title: Spark Release 3.2.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+_edit_last: '4'
+_wpas_done_all: '1'
+---
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users 
can scale out their applications on Spark with one line code change. Other 
major updates include RocksDB StateStore support, session window support, 
push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query 
Execution (AQE) by default, and ANSI SQL mode GA.
+
+To download Apache Spark 3.2.0, visit the 
[downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA 
for the [detailed 
changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407).
 We have curated a list of high level changes here, grouped by major modules.

Review comment:
   JIRA or JIRAs?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

52 matches

Mail list logo