[GitHub] [spark-website] gengliangwang opened a new pull request #362: Update DocSearch facet filter of 3.2.0 documentation
gengliangwang opened a new pull request #362: URL: https://github.com/apache/spark-website/pull/362 There is a bug in updating the DocSearch facet filter on https://github.com/apache/spark/blob/master/dev/create-release/release-tag.sh. The version is not updated as the release version. This PR is to fix the search function of https://spark.apache.org/docs/3.2.0/. I will open another PR to fix the script on Spark. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-36834][SHUFFLE] Add support for namespacing log lines emitted by external shuffle service
This is an automated email from the ASF dual-hosted git repository. mridulm80 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4072a22 [SPARK-36834][SHUFFLE] Add support for namespacing log lines emitted by external shuffle service 4072a22 is described below commit 4072a22aa2bf15e95d3043f937a3468057f4fd36 Author: Thejdeep Gudivada AuthorDate: Mon Oct 18 21:40:55 2021 -0500 [SPARK-36834][SHUFFLE] Add support for namespacing log lines emitted by external shuffle service ### What changes were proposed in this pull request? Added a config `spark.yarn.shuffle.service.logs.namespace` which can be used to add a namespace suffix to log lines emitted by the External Shuffle Service. ### Why are the changes needed? Since many instances of ESS can be running on the same NM, it would be easier to distinguish between them. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? N/A Closes #34079 from thejdeep/SPARK-36834. Authored-by: Thejdeep Gudivada Signed-off-by: Mridul Muralidharan gmail.com> --- .../apache/spark/network/yarn/YarnShuffleService.java | 18 -- docs/running-on-yarn.md| 11 +++ 2 files changed, 27 insertions(+), 2 deletions(-) diff --git a/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java b/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java index ac16369..f1b8941 100644 --- a/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java +++ b/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java @@ -93,7 +93,8 @@ import org.apache.spark.network.yarn.util.HadoopConfigProvider; * This {@code classpath} configuration is only supported on YARN versions >= 2.9.0. */ public class YarnShuffleService extends AuxiliaryService { - private static final Logger logger = LoggerFactory.getLogger(YarnShuffleService.class); + private static final Logger defaultLogger = LoggerFactory.getLogger(YarnShuffleService.class); + private Logger logger = defaultLogger; // Port on which the shuffle server listens for fetch requests private static final String SPARK_SHUFFLE_SERVICE_PORT_KEY = "spark.shuffle.service.port"; @@ -107,6 +108,12 @@ public class YarnShuffleService extends AuxiliaryService { "spark.yarn.shuffle.service.metrics.namespace"; private static final String DEFAULT_SPARK_SHUFFLE_SERVICE_METRICS_NAME = "sparkShuffleService"; + /** + * The namespace to use for the logs produced by the shuffle service + */ + static final String SPARK_SHUFFLE_SERVICE_LOGS_NAMESPACE_KEY = + "spark.yarn.shuffle.service.logs.namespace"; + // Whether the shuffle server should authenticate fetch requests private static final String SPARK_AUTHENTICATE_KEY = "spark.authenticate"; private static final boolean DEFAULT_SPARK_AUTHENTICATE = false; @@ -204,6 +211,13 @@ public class YarnShuffleService extends AuxiliaryService { confOverlayUrl); _conf.addResource(confOverlayUrl); } + +String logsNamespace = _conf.get(SPARK_SHUFFLE_SERVICE_LOGS_NAMESPACE_KEY, ""); +if (!logsNamespace.isEmpty()) { + String className = YarnShuffleService.class.getName(); + logger = LoggerFactory.getLogger(className + "." + logsNamespace); +} + super.serviceInit(_conf); boolean stopOnFailure = _conf.getBoolean(STOP_ON_FAILURE_KEY, DEFAULT_STOP_ON_FAILURE); @@ -284,7 +298,7 @@ public class YarnShuffleService extends AuxiliaryService { // will also need the transport configuration. return mergeManagerSubClazz.getConstructor(TransportConf.class).newInstance(conf); } catch (Exception e) { - logger.error("Unable to create an instance of {}", mergeManagerImplClassName); + defaultLogger.error("Unable to create an instance of {}", mergeManagerImplClassName); return new NoOpMergedShuffleFileManager(conf); } } diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md index 8b7ed18..52d365a 100644 --- a/docs/running-on-yarn.md +++ b/docs/running-on-yarn.md @@ -806,6 +806,17 @@ The following extra configuration options are available when the shuffle service NodeManager. + + spark.yarn.shuffle.service.logs.namespace + (not set) + +A namespace which will be appended to the class name when forming the logger name to use for +emitting logs from the YARN shuffle service, like + org.apache.spark.network.yarn.YarnShuffleService.logsNamespaceValue. Since some logging frameworks +may expect the logger name to look like a class name, it's generally recommended to provide a value which +would be a valid Java package or class name and not include spaces.
[spark] branch master updated (1ef6c13 -> c2ba498)
This is an automated email from the ASF dual-hosted git repository. ueshin pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 1ef6c13 [SPARK-36933][CORE] Clean up TaskMemoryManager.acquireExecutionMemory() add c2ba498 [SPARK-36945][PYTHON] Inline type hints for python/pyspark/sql/udf.py No new revisions were added by this update. Summary of changes: python/pyspark/sql/functions.py | 2 +- python/pyspark/sql/udf.py | 129 python/pyspark/sql/udf.pyi | 58 -- 3 files changed, 91 insertions(+), 98 deletions(-) delete mode 100644 python/pyspark/sql/udf.pyi - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-36933][CORE] Clean up TaskMemoryManager.acquireExecutionMemory()
This is an automated email from the ASF dual-hosted git repository. joshrosen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 1ef6c13 [SPARK-36933][CORE] Clean up TaskMemoryManager.acquireExecutionMemory() 1ef6c13 is described below commit 1ef6c13e37bfb64b0f9dd9b624b436064ea86593 Author: Tim Armstrong AuthorDate: Mon Oct 18 14:51:24 2021 -0700 [SPARK-36933][CORE] Clean up TaskMemoryManager.acquireExecutionMemory() ### What changes were proposed in this pull request? * Factor out a method `trySpillAndAcquire()` from `acquireExecutionMemory()` that handles the details of how to spill a `MemoryConsumer` and acquire the spilled memory. This logic was duplicated twice. * Combine the two loops (spill other consumers and self-spill) into a single loop that implements equivalent logic. I made self-spill the lowest priority consumer and this is exactly equivalent. * Consolidate comments a little to explain what the policy is trying to achieve and how at a high level * Add a couple more debug log messages to make it easier to follow ### Why are the changes needed? Reduce code duplication and better separate the policy decision of which MemoryConsumer to spill from the mechanism of requesting it to spill. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added some unit tests to verify the details of the spilling decisions in some scenarios that are not covered by current unit tests. Ran these on Spark master without the TaskMemoryManager changes to confirm that the behaviour is the same before and after my refactoring. The SPARK-35486 test also provides some coverage for the retry loop. Closes #34186 from timarmstrong/cleanup-task-memory-manager. Authored-by: Tim Armstrong Signed-off-by: Josh Rosen --- .../org/apache/spark/memory/TaskMemoryManager.java | 149 +++-- .../spark/memory/TaskMemoryManagerSuite.java | 79 +++ 2 files changed, 158 insertions(+), 70 deletions(-) diff --git a/core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java b/core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java index 7a1e8c4..e2e44a5 100644 --- a/core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java +++ b/core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java @@ -135,10 +135,10 @@ public class TaskMemoryManager { * * @return number of bytes successfully granted (<= N). */ - public long acquireExecutionMemory(long required, MemoryConsumer consumer) { + public long acquireExecutionMemory(long required, MemoryConsumer requestingConsumer) { assert(required >= 0); -assert(consumer != null); -MemoryMode mode = consumer.getMode(); +assert(requestingConsumer != null); +MemoryMode mode = requestingConsumer.getMode(); // If we are allocating Tungsten pages off-heap and receive a request to allocate on-heap // memory here, then it may not make sense to spill since that would only end up freeing // off-heap memory. This is subject to change, though, so it may be risky to make this @@ -149,96 +149,105 @@ public class TaskMemoryManager { // Try to release memory from other consumers first, then we can reduce the frequency of // spilling, avoid to have too many spilled files. if (got < required) { -// Call spill() on other consumers to release memory -// Sort the consumers according their memory usage. So we avoid spilling the same consumer -// which is just spilled in last few times and re-spilling on it will produce many small -// spill files. +logger.debug("Task {} need to spill {} for {}", taskAttemptId, + Utils.bytesToString(required - got), requestingConsumer); +// We need to call spill() on consumers to free up more memory. We want to optimize for two +// things: +// * Minimize the number of spill calls, to reduce the number of spill files and avoid small +// spill files. +// * Avoid spilling more data than necessary - if we only need a little more memory, we may +// not want to spill as much data as possible. Many consumers spill more than the +// requested amount, so we can take that into account in our decisions. +// We use a heuristic that selects the smallest memory consumer with at least `required` +// bytes of memory in an attempt to balance these factors. It may work well if there are +// fewer larger requests, but can result in many small spills if there are many smaller +// requests. + +// Build a map of consumer in order of memory usage to prioritize spilling. Assign current +// consumer (if present) a nominal memory usage of 0 so that
[spark] branch master updated (21fa3ce -> 25fc495)
This is an automated email from the ASF dual-hosted git repository. ueshin pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 21fa3ce [SPARK-35925][SQL] Support DayTimeIntervalType in width-bucket function add 25fc495 [SPARK-36886][PYTHON] Inline type hints for python/pyspark/sql/context.py No new revisions were added by this update. Summary of changes: python/pyspark/sql/context.py | 221 +- python/pyspark/sql/context.pyi| 140 python/pyspark/sql/dataframe.py | 2 +- python/pyspark/sql/observation.py | 3 +- 4 files changed, 176 insertions(+), 190 deletions(-) delete mode 100644 python/pyspark/sql/context.pyi - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] gengliangwang commented on a change in pull request #361: Add 3.2.0 release note and news and update links
gengliangwang commented on a change in pull request #361: URL: https://github.com/apache/spark-website/pull/361#discussion_r731104138 ## File path: releases/_posts/2021-10-13-spark-release-3-2-0.md ## @@ -0,0 +1,319 @@ +--- +layout: post +title: Spark Release 3.2.0 +categories: [] +tags: [] +status: publish +type: post +published: true +meta: +_edit_last: '4' +_wpas_done_all: '1' +--- + +Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets. + +In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA. + +To download Apache Spark 3.2.0, visit the [downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA for the [detailed changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407). We have curated a list of high level changes here, grouped by major modules. + +* This will become a table of contents (this text will be scraped). +{:toc} + +### Highlights + +* Support Pandas API layer on PySpark ([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849)) +* Support push-based shuffle to improve shuffle efficiency ([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602)) +* Add RocksDB StateStore implementation ([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198)) +* EventTime based sessionization (session window) ([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816)) +* ANSI SQL mode GA ([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030)) +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* Enable adaptive query execution by default ([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679)) +* Query compilation latency reduction ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), [SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), [SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) +* Support Scala 2.13 ([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218)) + + +### Core and Spark SQL + +**ANSI SQL Compatibility Enhancements** + +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* New type coercion syntax rules in ANSI mode ([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246)) +* Support LATERAL subqueries ([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382)) +* ANSI mode: IntegralDivide throws an exception on overflow ([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152)) +* ANSI mode: Check for overflow in Average ([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955)) +* Block count(table.*) to follow ANSI standard and other SQL engines ([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199)) + +**Performance** + +* Query compilation latency + * Support traversal pruning in transform/resolve functions and their call sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042)) + * Improve the performance of mapChildren and withNewChildren methods ([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) + * Improve the performance of type coercion rules ([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103)) +* Query optimization + * Remove redundant aggregates in the Optimizer ([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122)) + * Push down limit through Project with Join ([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622)) + * Push down limit for LEFT SEMI and LEFT ANTI join ([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), [SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514)) + * Push down limit through WINDOW when partition spec is empty ([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575)) + * Use a relative cost comparison function in the CBO ([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922)) + * Cardinality estimation of union, sort, and range operator ([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411)) + * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as broadcast join ([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081)) + * UnwrapCastInBinaryComparison support In/InSet predicate ([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316)) + * Subexpression elimination enhancements ([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448)) + * Keep necessary stats after partition pruning
[GitHub] [spark-website] sunchao commented on a change in pull request #361: Add 3.2.0 release note and news and update links
sunchao commented on a change in pull request #361: URL: https://github.com/apache/spark-website/pull/361#discussion_r731092808 ## File path: releases/_posts/2021-10-13-spark-release-3-2-0.md ## @@ -0,0 +1,319 @@ +--- +layout: post +title: Spark Release 3.2.0 +categories: [] +tags: [] +status: publish +type: post +published: true +meta: +_edit_last: '4' +_wpas_done_all: '1' +--- + +Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets. + +In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA. + +To download Apache Spark 3.2.0, visit the [downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA for the [detailed changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407). We have curated a list of high level changes here, grouped by major modules. + +* This will become a table of contents (this text will be scraped). +{:toc} + +### Highlights + +* Support Pandas API layer on PySpark ([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849)) +* Support push-based shuffle to improve shuffle efficiency ([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602)) +* Add RocksDB StateStore implementation ([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198)) +* EventTime based sessionization (session window) ([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816)) +* ANSI SQL mode GA ([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030)) +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* Enable adaptive query execution by default ([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679)) +* Query compilation latency reduction ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), [SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), [SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) +* Support Scala 2.13 ([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218)) + + +### Core and Spark SQL + +**ANSI SQL Compatibility Enhancements** + +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* New type coercion syntax rules in ANSI mode ([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246)) +* Support LATERAL subqueries ([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382)) +* ANSI mode: IntegralDivide throws an exception on overflow ([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152)) +* ANSI mode: Check for overflow in Average ([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955)) +* Block count(table.*) to follow ANSI standard and other SQL engines ([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199)) + +**Performance** + +* Query compilation latency + * Support traversal pruning in transform/resolve functions and their call sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042)) + * Improve the performance of mapChildren and withNewChildren methods ([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) + * Improve the performance of type coercion rules ([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103)) +* Query optimization + * Remove redundant aggregates in the Optimizer ([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122)) + * Push down limit through Project with Join ([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622)) + * Push down limit for LEFT SEMI and LEFT ANTI join ([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), [SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514)) + * Push down limit through WINDOW when partition spec is empty ([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575)) + * Use a relative cost comparison function in the CBO ([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922)) + * Cardinality estimation of union, sort, and range operator ([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411)) + * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as broadcast join ([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081)) + * UnwrapCastInBinaryComparison support In/InSet predicate ([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316)) + * Subexpression elimination enhancements ([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448)) + * Keep necessary stats after partition pruning
[GitHub] [spark-website] gengliangwang commented on a change in pull request #361: Add 3.2.0 release note and news and update links
gengliangwang commented on a change in pull request #361: URL: https://github.com/apache/spark-website/pull/361#discussion_r731092815 ## File path: releases/_posts/2021-10-13-spark-release-3-2-0.md ## @@ -0,0 +1,319 @@ +--- +layout: post +title: Spark Release 3.2.0 +categories: [] +tags: [] +status: publish +type: post +published: true +meta: +_edit_last: '4' +_wpas_done_all: '1' +--- + +Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets. + +In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA. + +To download Apache Spark 3.2.0, visit the [downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA for the [detailed changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407). We have curated a list of high level changes here, grouped by major modules. + +* This will become a table of contents (this text will be scraped). +{:toc} + +### Highlights + +* Support Pandas API layer on PySpark ([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849)) +* Support push-based shuffle to improve shuffle efficiency ([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602)) +* Add RocksDB StateStore implementation ([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198)) +* EventTime based sessionization (session window) ([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816)) +* ANSI SQL mode GA ([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030)) +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* Enable adaptive query execution by default ([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679)) Review comment: Make it the second highlight? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] sunchao commented on a change in pull request #361: Add 3.2.0 release note and news and update links
sunchao commented on a change in pull request #361: URL: https://github.com/apache/spark-website/pull/361#discussion_r731085852 ## File path: releases/_posts/2021-10-13-spark-release-3-2-0.md ## @@ -0,0 +1,318 @@ +--- +layout: post +title: Spark Release 3.2.0 +categories: [] +tags: [] +status: publish +type: post +published: true +meta: +_edit_last: '4' +_wpas_done_all: '1' +--- + +Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets. + +In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA. + +To download Apache Spark 3.2.0, visit the [downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA for the [detailed changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407). We have curated a list of high level changes here, grouped by major modules. + +* This will become a table of contents (this text will be scraped). +{:toc} + +### Highlights + +* Support Pandas API layer on PySpark ([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849)) +* Support push-based shuffle to improve shuffle efficiency ([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602)) +* Add RocksDB StateStore implementation ([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198)) +* EventTime based sessionization (session window) ([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816)) +* ANSI SQL mode GA ([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030)) +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* Enable adaptive query execution by default ([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679)) +* Query compilation latency reduction ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), [SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), [SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) +* Support Scala 2.13 ([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218)) + + +### Core and Spark SQL + +**ANSI SQL Compatibility Enhancements** + +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* New type coercion syntax rules in ANSI mode ([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246)) +* Support LATERAL subqueries ([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382)) +* ANSI mode: IntegralDivide throws an exception on overflow ([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152)) +* ANSI mode: Check for overflow in Average ([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955)) +* Block count(table.*) to follow ANSI standard and other SQL engines ([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199)) + +**Performance** + +* Query compilation latency + * Support traversal pruning in transform/resolve functions and their call sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042)) + * Improve the performance of mapChildren and withNewChildren methods ([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) + * Improve the performance of type coercion rules ([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103)) +* Query optimization + * Remove redundant aggregates in the Optimizer ([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122)) + * Push down limit through Project with Join ([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622)) + * Push down limit for LEFT SEMI and LEFT ANTI join ([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), [SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514)) + * Push down limit through WINDOW when partition spec is empty ([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575)) + * Use a relative cost comparison function in the CBO ([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922)) + * Cardinality estimation of union, sort, and range operator ([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411)) + * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as broadcast join ([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081)) + * UnwrapCastInBinaryComparison support In/InSet predicate ([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316)) + * Subexpression elimination enhancements ([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448)) + * Keep necessary stats after partition pruning
[GitHub] [spark-website] sunchao commented on a change in pull request #361: Add 3.2.0 release note and news and update links
sunchao commented on a change in pull request #361: URL: https://github.com/apache/spark-website/pull/361#discussion_r731084843 ## File path: releases/_posts/2021-10-13-spark-release-3-2-0.md ## @@ -0,0 +1,318 @@ +--- +layout: post +title: Spark Release 3.2.0 +categories: [] +tags: [] +status: publish +type: post +published: true +meta: +_edit_last: '4' +_wpas_done_all: '1' +--- + +Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets. + +In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA. + +To download Apache Spark 3.2.0, visit the [downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA for the [detailed changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407). We have curated a list of high level changes here, grouped by major modules. + +* This will become a table of contents (this text will be scraped). +{:toc} + +### Highlights + +* Support Pandas API layer on PySpark ([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849)) +* Support push-based shuffle to improve shuffle efficiency ([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602)) +* Add RocksDB StateStore implementation ([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198)) +* EventTime based sessionization (session window) ([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816)) +* ANSI SQL mode GA ([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030)) +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* Enable adaptive query execution by default ([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679)) +* Query compilation latency reduction ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), [SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), [SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) +* Support Scala 2.13 ([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218)) + + +### Core and Spark SQL + +**ANSI SQL Compatibility Enhancements** + +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* New type coercion syntax rules in ANSI mode ([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246)) +* Support LATERAL subqueries ([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382)) +* ANSI mode: IntegralDivide throws an exception on overflow ([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152)) +* ANSI mode: Check for overflow in Average ([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955)) +* Block count(table.*) to follow ANSI standard and other SQL engines ([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199)) + +**Performance** + +* Query compilation latency + * Support traversal pruning in transform/resolve functions and their call sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042)) + * Improve the performance of mapChildren and withNewChildren methods ([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) + * Improve the performance of type coercion rules ([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103)) +* Query optimization + * Remove redundant aggregates in the Optimizer ([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122)) + * Push down limit through Project with Join ([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622)) + * Push down limit for LEFT SEMI and LEFT ANTI join ([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), [SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514)) + * Push down limit through WINDOW when partition spec is empty ([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575)) + * Use a relative cost comparison function in the CBO ([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922)) + * Cardinality estimation of union, sort, and range operator ([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411)) + * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as broadcast join ([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081)) + * UnwrapCastInBinaryComparison support In/InSet predicate ([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316)) + * Subexpression elimination enhancements ([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448)) + * Keep necessary stats after partition pruning
[GitHub] [spark-website] gatorsmile commented on a change in pull request #361: Add 3.2.0 release note and news and update links
gatorsmile commented on a change in pull request #361: URL: https://github.com/apache/spark-website/pull/361#discussion_r731078539 ## File path: releases/_posts/2021-10-13-spark-release-3-2-0.md ## @@ -0,0 +1,319 @@ +--- +layout: post +title: Spark Release 3.2.0 +categories: [] +tags: [] +status: publish +type: post +published: true +meta: +_edit_last: '4' +_wpas_done_all: '1' +--- + +Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets. + +In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA. + +To download Apache Spark 3.2.0, visit the [downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA for the [detailed changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407). We have curated a list of high level changes here, grouped by major modules. + +* This will become a table of contents (this text will be scraped). +{:toc} + +### Highlights + +* Support Pandas API layer on PySpark ([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849)) +* Support push-based shuffle to improve shuffle efficiency ([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602)) +* Add RocksDB StateStore implementation ([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198)) +* EventTime based sessionization (session window) ([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816)) +* ANSI SQL mode GA ([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030)) +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* Enable adaptive query execution by default ([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679)) +* Query compilation latency reduction ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), [SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), [SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) +* Support Scala 2.13 ([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218)) + + +### Core and Spark SQL + +**ANSI SQL Compatibility Enhancements** + +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* New type coercion syntax rules in ANSI mode ([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246)) +* Support LATERAL subqueries ([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382)) +* ANSI mode: IntegralDivide throws an exception on overflow ([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152)) +* ANSI mode: Check for overflow in Average ([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955)) +* Block count(table.*) to follow ANSI standard and other SQL engines ([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199)) + +**Performance** + +* Query compilation latency + * Support traversal pruning in transform/resolve functions and their call sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042)) + * Improve the performance of mapChildren and withNewChildren methods ([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) + * Improve the performance of type coercion rules ([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103)) +* Query optimization + * Remove redundant aggregates in the Optimizer ([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122)) + * Push down limit through Project with Join ([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622)) + * Push down limit for LEFT SEMI and LEFT ANTI join ([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), [SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514)) + * Push down limit through WINDOW when partition spec is empty ([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575)) + * Use a relative cost comparison function in the CBO ([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922)) + * Cardinality estimation of union, sort, and range operator ([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411)) + * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as broadcast join ([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081)) + * UnwrapCastInBinaryComparison support In/InSet predicate ([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316)) + * Subexpression elimination enhancements ([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448)) + * Keep necessary stats after partition pruning
[GitHub] [spark-website] gatorsmile commented on a change in pull request #361: Add 3.2.0 release note and news and update links
gatorsmile commented on a change in pull request #361: URL: https://github.com/apache/spark-website/pull/361#discussion_r731070322 ## File path: releases/_posts/2021-10-13-spark-release-3-2-0.md ## @@ -0,0 +1,319 @@ +--- +layout: post +title: Spark Release 3.2.0 +categories: [] +tags: [] +status: publish +type: post +published: true +meta: +_edit_last: '4' +_wpas_done_all: '1' +--- + +Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets. + +In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA. + +To download Apache Spark 3.2.0, visit the [downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA for the [detailed changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407). We have curated a list of high level changes here, grouped by major modules. + +* This will become a table of contents (this text will be scraped). +{:toc} + +### Highlights + +* Support Pandas API layer on PySpark ([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849)) +* Support push-based shuffle to improve shuffle efficiency ([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602)) +* Add RocksDB StateStore implementation ([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198)) +* EventTime based sessionization (session window) ([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816)) +* ANSI SQL mode GA ([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030)) +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* Enable adaptive query execution by default ([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679)) Review comment: The impact of this feature is more important than the other SQL and Core features, IMO. Can you adjust the order? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] gengliangwang commented on a change in pull request #361: Add 3.2.0 release note and news and update links
gengliangwang commented on a change in pull request #361: URL: https://github.com/apache/spark-website/pull/361#discussion_r730932576 ## File path: releases/_posts/2021-10-13-spark-release-3-2-0.md ## @@ -0,0 +1,317 @@ +--- +layout: post +title: Spark Release 3.2.0 +categories: [] +tags: [] +status: publish +type: post +published: true +meta: +_edit_last: '4' +_wpas_done_all: '1' +--- + +Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets. + +In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA. + +To download Apache Spark 3.2.0, visit the [downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA for the [detailed changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407). We have curated a list of high level changes here, grouped by major modules. + +* This will become a table of contents (this text will be scraped). +{:toc} + +### Highlights + +* Support Pandas API layer on PySpark ([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849)) +* Support push-based shuffle to improve shuffle efficiency ([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602)) +* Add RocksDB StateStore implementation ([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198)) +* EventTime based sessionization (session window) ([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816)) +* ANSI SQL mode GA ([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030)) +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* Enable adaptive query execution by default ([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679)) +* Query compilation latency reduction ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), [SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), [SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) +* Support Scala 2.13 ([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218)) + + +### Core and Spark SQL + +**ANSI SQL Compatibility Enhancements** + +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* New type coercion syntax rules in ANSI mode ([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246)) +* Support LATERAL subqueries ([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382)) +* ANSI mode: IntegralDivide throws an exception on overflow ([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152)) +* ANSI mode: Check for overflow in Average ([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955)) +* Block count(table.*) to follow ANSI standard and other SQL engines ([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199)) + +**Performance** + +* Query compilation latency + * Support traversal pruning in transform/resolve functions and their call sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042)) + * Improve the performance of mapChildren and withNewChildren methods ([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) + * Improve the performance of type coercion rules ([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103)) +* Query optimization + * Remove redundant aggregates in the Optimizer ([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122)) + * Push down limit through Project with Join ([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622)) + * Push down limit for LEFT SEMI and LEFT ANTI join ([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), [SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514)) + * Push down limit through WINDOW when partition spec is empty ([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575)) + * Use a relative cost comparison function in the CBO ([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922)) + * Cardinality estimation of union, sort, and range operator ([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411)) + * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as broadcast join ([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081)) + * UnwrapCastInBinaryComparison support In/InSet predicate ([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316)) + * Subexpression elimination enhancements ([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448)) + * Keep necessary stats after partition pruning
[spark] branch master updated (c29bb02 -> 21fa3ce)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c29bb02 [SPARK-36965][PYTHON] Extend python test runner by logging out the temp output files add 21fa3ce [SPARK-35925][SQL] Support DayTimeIntervalType in width-bucket function No new revisions were added by this update. Summary of changes: .../sql/catalyst/expressions/mathExpressions.scala | 12 +--- .../catalyst/expressions/MathExpressionsSuite.scala | 20 .../src/test/resources/sql-tests/inputs/interval.sql | 2 ++ .../sql-tests/results/ansi/interval.sql.out | 18 +- .../resources/sql-tests/results/interval.sql.out | 18 +- 5 files changed, 65 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] h-vetinari commented on pull request #361: Add 3.2.0 release note and news and update links
h-vetinari commented on pull request #361: URL: https://github.com/apache/spark-website/pull/361#issuecomment-945711644 Update 2: pyspark 3.2.0 has been uploaded to https://anaconda.org/conda-forge/pyspark/files, will make its way through the CDN in about an hour. > @h-vetinari yes please raise a PR to update https://github.com/apache/spark/blob/master/python/docs/source/getting_started/install.rst, thanks! Done here: https://github.com/apache/spark/pull/34315 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] Ngone51 commented on a change in pull request #361: Add 3.2.0 release note and news and update links
Ngone51 commented on a change in pull request #361: URL: https://github.com/apache/spark-website/pull/361#discussion_r730861107 ## File path: releases/_posts/2021-10-13-spark-release-3-2-0.md ## @@ -0,0 +1,317 @@ +--- +layout: post +title: Spark Release 3.2.0 +categories: [] +tags: [] +status: publish +type: post +published: true +meta: +_edit_last: '4' +_wpas_done_all: '1' +--- + +Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets. + +In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA. + +To download Apache Spark 3.2.0, visit the [downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA for the [detailed changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407). We have curated a list of high level changes here, grouped by major modules. + +* This will become a table of contents (this text will be scraped). +{:toc} + +### Highlights + +* Support Pandas API layer on PySpark ([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849)) +* Support push-based shuffle to improve shuffle efficiency ([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602)) +* Add RocksDB StateStore implementation ([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198)) +* EventTime based sessionization (session window) ([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816)) +* ANSI SQL mode GA ([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030)) +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* Enable adaptive query execution by default ([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679)) +* Query compilation latency reduction ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), [SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), [SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) +* Support Scala 2.13 ([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218)) + + +### Core and Spark SQL + +**ANSI SQL Compatibility Enhancements** + +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* New type coercion syntax rules in ANSI mode ([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246)) +* Support LATERAL subqueries ([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382)) +* ANSI mode: IntegralDivide throws an exception on overflow ([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152)) +* ANSI mode: Check for overflow in Average ([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955)) +* Block count(table.*) to follow ANSI standard and other SQL engines ([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199)) + +**Performance** + +* Query compilation latency + * Support traversal pruning in transform/resolve functions and their call sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042)) + * Improve the performance of mapChildren and withNewChildren methods ([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) + * Improve the performance of type coercion rules ([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103)) +* Query optimization + * Remove redundant aggregates in the Optimizer ([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122)) + * Push down limit through Project with Join ([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622)) + * Push down limit for LEFT SEMI and LEFT ANTI join ([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), [SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514)) + * Push down limit through WINDOW when partition spec is empty ([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575)) + * Use a relative cost comparison function in the CBO ([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922)) + * Cardinality estimation of union, sort, and range operator ([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411)) + * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as broadcast join ([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081)) + * UnwrapCastInBinaryComparison support In/InSet predicate ([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316)) + * Subexpression elimination enhancements ([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448)) + * Keep necessary stats after partition pruning
[GitHub] [spark-website] Ngone51 commented on a change in pull request #361: Add 3.2.0 release note and news and update links
Ngone51 commented on a change in pull request #361: URL: https://github.com/apache/spark-website/pull/361#discussion_r730858991 ## File path: releases/_posts/2021-10-13-spark-release-3-2-0.md ## @@ -0,0 +1,317 @@ +--- +layout: post +title: Spark Release 3.2.0 +categories: [] +tags: [] +status: publish +type: post +published: true +meta: +_edit_last: '4' +_wpas_done_all: '1' +--- + +Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets. + +In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA. + +To download Apache Spark 3.2.0, visit the [downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA for the [detailed changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407). We have curated a list of high level changes here, grouped by major modules. + +* This will become a table of contents (this text will be scraped). +{:toc} + +### Highlights + +* Support Pandas API layer on PySpark ([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849)) +* Support push-based shuffle to improve shuffle efficiency ([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602)) +* Add RocksDB StateStore implementation ([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198)) +* EventTime based sessionization (session window) ([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816)) +* ANSI SQL mode GA ([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030)) +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* Enable adaptive query execution by default ([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679)) +* Query compilation latency reduction ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), [SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), [SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) +* Support Scala 2.13 ([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218)) + + +### Core and Spark SQL + +**ANSI SQL Compatibility Enhancements** + +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* New type coercion syntax rules in ANSI mode ([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246)) +* Support LATERAL subqueries ([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382)) +* ANSI mode: IntegralDivide throws an exception on overflow ([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152)) +* ANSI mode: Check for overflow in Average ([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955)) +* Block count(table.*) to follow ANSI standard and other SQL engines ([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199)) + +**Performance** + +* Query compilation latency + * Support traversal pruning in transform/resolve functions and their call sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042)) + * Improve the performance of mapChildren and withNewChildren methods ([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) + * Improve the performance of type coercion rules ([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103)) +* Query optimization + * Remove redundant aggregates in the Optimizer ([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122)) + * Push down limit through Project with Join ([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622)) + * Push down limit for LEFT SEMI and LEFT ANTI join ([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), [SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514)) + * Push down limit through WINDOW when partition spec is empty ([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575)) + * Use a relative cost comparison function in the CBO ([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922)) + * Cardinality estimation of union, sort, and range operator ([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411)) + * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as broadcast join ([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081)) + * UnwrapCastInBinaryComparison support In/InSet predicate ([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316)) + * Subexpression elimination enhancements ([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448)) + * Keep necessary stats after partition pruning
[GitHub] [spark-website] Ngone51 commented on a change in pull request #361: Add 3.2.0 release note and news and update links
Ngone51 commented on a change in pull request #361: URL: https://github.com/apache/spark-website/pull/361#discussion_r730858991 ## File path: releases/_posts/2021-10-13-spark-release-3-2-0.md ## @@ -0,0 +1,317 @@ +--- +layout: post +title: Spark Release 3.2.0 +categories: [] +tags: [] +status: publish +type: post +published: true +meta: +_edit_last: '4' +_wpas_done_all: '1' +--- + +Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets. + +In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA. + +To download Apache Spark 3.2.0, visit the [downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA for the [detailed changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407). We have curated a list of high level changes here, grouped by major modules. + +* This will become a table of contents (this text will be scraped). +{:toc} + +### Highlights + +* Support Pandas API layer on PySpark ([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849)) +* Support push-based shuffle to improve shuffle efficiency ([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602)) +* Add RocksDB StateStore implementation ([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198)) +* EventTime based sessionization (session window) ([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816)) +* ANSI SQL mode GA ([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030)) +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* Enable adaptive query execution by default ([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679)) +* Query compilation latency reduction ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), [SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), [SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) +* Support Scala 2.13 ([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218)) + + +### Core and Spark SQL + +**ANSI SQL Compatibility Enhancements** + +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* New type coercion syntax rules in ANSI mode ([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246)) +* Support LATERAL subqueries ([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382)) +* ANSI mode: IntegralDivide throws an exception on overflow ([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152)) +* ANSI mode: Check for overflow in Average ([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955)) +* Block count(table.*) to follow ANSI standard and other SQL engines ([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199)) + +**Performance** + +* Query compilation latency + * Support traversal pruning in transform/resolve functions and their call sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042)) + * Improve the performance of mapChildren and withNewChildren methods ([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) + * Improve the performance of type coercion rules ([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103)) +* Query optimization + * Remove redundant aggregates in the Optimizer ([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122)) + * Push down limit through Project with Join ([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622)) + * Push down limit for LEFT SEMI and LEFT ANTI join ([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), [SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514)) + * Push down limit through WINDOW when partition spec is empty ([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575)) + * Use a relative cost comparison function in the CBO ([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922)) + * Cardinality estimation of union, sort, and range operator ([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411)) + * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as broadcast join ([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081)) + * UnwrapCastInBinaryComparison support In/InSet predicate ([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316)) + * Subexpression elimination enhancements ([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448)) + * Keep necessary stats after partition pruning
[GitHub] [spark-website] gengliangwang commented on pull request #361: Add 3.2.0 release note and news and update links
gengliangwang commented on pull request #361: URL: https://github.com/apache/spark-website/pull/361#issuecomment-945660978 @h-vetinari yes please raise a PR to update https://github.com/apache/spark/blob/master/python/docs/source/getting_started/install.rst, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] h-vetinari commented on pull request #361: Add 3.2.0 release note and news and update links
h-vetinari commented on pull request #361: URL: https://github.com/apache/spark-website/pull/361#issuecomment-945636499 Little update, the conda-forge build of pyspark is now waiting to be merged by the feedstock team resp. conda-forge/core. However, I noticed that the [install](https://spark.apache.org/docs/3.2.0/api/python/getting_started/install.html) instructions for conda are... not ideal. In particular, mixing pip & conda is strongly discouraged, because pip can trample on the conda-environment and break it. Should I raise a PR under https://github.com/apache/spark/? Would be good if this could then be backported to 3.2 (presumably that's necessary for it to appear in the 3.2.0 docs) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (0bba90b -> c29bb02)
This is an automated email from the ASF dual-hosted git repository. attilapiros pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 0bba90b [SPARK-36978][SQL] InferConstraints rule should create IsNotNull constraints on the accessed nested field instead of the root nested type add c29bb02 [SPARK-36965][PYTHON] Extend python test runner by logging out the temp output files No new revisions were added by this update. Summary of changes: python/run-tests.py | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e7815b1 -> 0bba90b)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e7815b1 [SPARK-37032][SQL] Fix broken SQL syntax link in SQL Reference page add 0bba90b [SPARK-36978][SQL] InferConstraints rule should create IsNotNull constraints on the accessed nested field instead of the root nested type No new revisions were added by this update. Summary of changes: .../expressions/complexTypeExtractors.scala| 11 +++-- .../plans/logical/QueryPlanConstraints.scala | 18 +++-- .../InferFiltersFromConstraintsSuite.scala | 47 +- 3 files changed, 66 insertions(+), 10 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] gengliangwang commented on a change in pull request #361: Add 3.2.0 release note and news and update links
gengliangwang commented on a change in pull request #361: URL: https://github.com/apache/spark-website/pull/361#discussion_r730748515 ## File path: js/downloads.js ## @@ -22,7 +22,7 @@ var packagesV10 = [hadoop2p7, hadoop3p2, hadoopFree, sources]; // 3.1.0+ var packagesV11 = [hadoop3p2, hadoop2p7, hadoopFree, sources]; - +addRelease("3.2.0", new Date("10/13/2021"), packagesV11, true); Review comment: Thanks, updated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] gengliangwang commented on a change in pull request #361: Add 3.2.0 release note and news and update links
gengliangwang commented on a change in pull request #361: URL: https://github.com/apache/spark-website/pull/361#discussion_r730744351 ## File path: releases/_posts/2021-10-13-spark-release-3-2-0.md ## @@ -0,0 +1,318 @@ +--- +layout: post +title: Spark Release 3.2.0 +categories: [] +tags: [] +status: publish +type: post +published: true +meta: +_edit_last: '4' +_wpas_done_all: '1' +--- + +Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets. + +In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA. + +To download Apache Spark 3.2.0, visit the [downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA for the [detailed changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407). We have curated a list of high level changes here, grouped by major modules. + +* This will become a table of contents (this text will be scraped). +{:toc} + +### Highlights + +* Support Pandas API layer on PySpark ([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849)) +* Support push-based shuffle to improve shuffle efficiency ([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602)) +* Add RocksDB StateStore implementation ([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198)) +* EventTime based sessionization (session window) ([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816)) +* ANSI SQL mode GA ([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030)) +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* Enable adaptive query execution by default ([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679)) +* Query compilation latency reduction ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), [SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), [SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) +* Support Scala 2.13 ([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218)) + + +### Core and Spark SQL + +**ANSI SQL Compatibility Enhancements** + +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* New type coercion syntax rules in ANSI mode ([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246)) +* Support LATERAL subqueries ([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382)) +* ANSI mode: IntegralDivide throws an exception on overflow ([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152)) +* ANSI mode: Check for overflow in Average ([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955)) +* Block count(table.*) to follow ANSI standard and other SQL engines ([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199)) + +**Performance** + +* Query compilation latency + * Support traversal pruning in transform/resolve functions and their call sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042)) + * Improve the performance of mapChildren and withNewChildren methods ([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) + * Improve the performance of type coercion rules ([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103)) +* Query optimization + * Remove redundant aggregates in the Optimizer ([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122)) + * Push down limit through Project with Join ([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622)) + * Push down limit for LEFT SEMI and LEFT ANTI join ([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), [SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514)) + * Push down limit through WINDOW when partition spec is empty ([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575)) + * Use a relative cost comparison function in the CBO ([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922)) + * Cardinality estimation of union, sort, and range operator ([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411)) + * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as broadcast join ([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081)) + * UnwrapCastInBinaryComparison support In/InSet predicate ([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316)) + * Subexpression elimination enhancements ([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448)) + * Keep necessary stats after partition pruning
[GitHub] [spark-website] yaooqinn commented on a change in pull request #361: Add 3.2.0 release note and news and update links
yaooqinn commented on a change in pull request #361: URL: https://github.com/apache/spark-website/pull/361#discussion_r730726311 ## File path: js/downloads.js ## @@ -22,7 +22,7 @@ var packagesV10 = [hadoop2p7, hadoop3p2, hadoopFree, sources]; // 3.1.0+ var packagesV11 = [hadoop3p2, hadoop2p7, hadoopFree, sources]; - +addRelease("3.2.0", new Date("10/13/2021"), packagesV11, true); Review comment: So this is not right then ``` var hadoop3p2 = {pretty: "Pre-built for Apache Hadoop 3.2 and later", tag: "hadoop3.2"}; ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] yaooqinn commented on a change in pull request #361: Add 3.2.0 release note and news and update links
yaooqinn commented on a change in pull request #361: URL: https://github.com/apache/spark-website/pull/361#discussion_r730725129 ## File path: releases/_posts/2021-10-13-spark-release-3-2-0.md ## @@ -0,0 +1,318 @@ +--- +layout: post +title: Spark Release 3.2.0 +categories: [] +tags: [] +status: publish +type: post +published: true +meta: +_edit_last: '4' +_wpas_done_all: '1' +--- + +Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets. + +In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA. + +To download Apache Spark 3.2.0, visit the [downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA for the [detailed changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407). We have curated a list of high level changes here, grouped by major modules. + +* This will become a table of contents (this text will be scraped). +{:toc} + +### Highlights + +* Support Pandas API layer on PySpark ([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849)) +* Support push-based shuffle to improve shuffle efficiency ([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602)) +* Add RocksDB StateStore implementation ([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198)) +* EventTime based sessionization (session window) ([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816)) +* ANSI SQL mode GA ([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030)) +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* Enable adaptive query execution by default ([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679)) +* Query compilation latency reduction ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), [SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), [SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) +* Support Scala 2.13 ([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218)) + + +### Core and Spark SQL + +**ANSI SQL Compatibility Enhancements** + +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* New type coercion syntax rules in ANSI mode ([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246)) +* Support LATERAL subqueries ([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382)) +* ANSI mode: IntegralDivide throws an exception on overflow ([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152)) +* ANSI mode: Check for overflow in Average ([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955)) +* Block count(table.*) to follow ANSI standard and other SQL engines ([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199)) + +**Performance** + +* Query compilation latency + * Support traversal pruning in transform/resolve functions and their call sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042)) + * Improve the performance of mapChildren and withNewChildren methods ([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) + * Improve the performance of type coercion rules ([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103)) +* Query optimization + * Remove redundant aggregates in the Optimizer ([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122)) + * Push down limit through Project with Join ([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622)) + * Push down limit for LEFT SEMI and LEFT ANTI join ([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), [SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514)) + * Push down limit through WINDOW when partition spec is empty ([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575)) + * Use a relative cost comparison function in the CBO ([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922)) + * Cardinality estimation of union, sort, and range operator ([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411)) + * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as broadcast join ([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081)) + * UnwrapCastInBinaryComparison support In/InSet predicate ([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316)) + * Subexpression elimination enhancements ([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448)) + * Keep necessary stats after partition pruning
[GitHub] [spark-website] yaooqinn commented on a change in pull request #361: Add 3.2.0 release note and news and update links
yaooqinn commented on a change in pull request #361: URL: https://github.com/apache/spark-website/pull/361#discussion_r730725129 ## File path: releases/_posts/2021-10-13-spark-release-3-2-0.md ## @@ -0,0 +1,318 @@ +--- +layout: post +title: Spark Release 3.2.0 +categories: [] +tags: [] +status: publish +type: post +published: true +meta: +_edit_last: '4' +_wpas_done_all: '1' +--- + +Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets. + +In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA. + +To download Apache Spark 3.2.0, visit the [downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA for the [detailed changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407). We have curated a list of high level changes here, grouped by major modules. + +* This will become a table of contents (this text will be scraped). +{:toc} + +### Highlights + +* Support Pandas API layer on PySpark ([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849)) +* Support push-based shuffle to improve shuffle efficiency ([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602)) +* Add RocksDB StateStore implementation ([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198)) +* EventTime based sessionization (session window) ([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816)) +* ANSI SQL mode GA ([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030)) +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* Enable adaptive query execution by default ([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679)) +* Query compilation latency reduction ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), [SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), [SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) +* Support Scala 2.13 ([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218)) + + +### Core and Spark SQL + +**ANSI SQL Compatibility Enhancements** + +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* New type coercion syntax rules in ANSI mode ([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246)) +* Support LATERAL subqueries ([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382)) +* ANSI mode: IntegralDivide throws an exception on overflow ([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152)) +* ANSI mode: Check for overflow in Average ([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955)) +* Block count(table.*) to follow ANSI standard and other SQL engines ([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199)) + +**Performance** + +* Query compilation latency + * Support traversal pruning in transform/resolve functions and their call sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042)) + * Improve the performance of mapChildren and withNewChildren methods ([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) + * Improve the performance of type coercion rules ([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103)) +* Query optimization + * Remove redundant aggregates in the Optimizer ([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122)) + * Push down limit through Project with Join ([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622)) + * Push down limit for LEFT SEMI and LEFT ANTI join ([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), [SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514)) + * Push down limit through WINDOW when partition spec is empty ([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575)) + * Use a relative cost comparison function in the CBO ([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922)) + * Cardinality estimation of union, sort, and range operator ([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411)) + * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as broadcast join ([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081)) + * UnwrapCastInBinaryComparison support In/InSet predicate ([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316)) + * Subexpression elimination enhancements ([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448)) + * Keep necessary stats after partition pruning
[GitHub] [spark-website] gengliangwang commented on a change in pull request #361: Add 3.2.0 release note and news and update links
gengliangwang commented on a change in pull request #361: URL: https://github.com/apache/spark-website/pull/361#discussion_r730722931 ## File path: releases/_posts/2021-10-13-spark-release-3-2-0.md ## @@ -0,0 +1,318 @@ +--- +layout: post +title: Spark Release 3.2.0 +categories: [] +tags: [] +status: publish +type: post +published: true +meta: +_edit_last: '4' +_wpas_done_all: '1' +--- + +Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets. + +In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA. + +To download Apache Spark 3.2.0, visit the [downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA for the [detailed changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407). We have curated a list of high level changes here, grouped by major modules. + +* This will become a table of contents (this text will be scraped). +{:toc} + +### Highlights + +* Support Pandas API layer on PySpark ([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849)) +* Support push-based shuffle to improve shuffle efficiency ([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602)) +* Add RocksDB StateStore implementation ([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198)) +* EventTime based sessionization (session window) ([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816)) +* ANSI SQL mode GA ([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030)) +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* Enable adaptive query execution by default ([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679)) +* Query compilation latency reduction ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), [SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), [SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) +* Support Scala 2.13 ([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218)) + + +### Core and Spark SQL + +**ANSI SQL Compatibility Enhancements** + +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* New type coercion syntax rules in ANSI mode ([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246)) +* Support LATERAL subqueries ([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382)) +* ANSI mode: IntegralDivide throws an exception on overflow ([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152)) +* ANSI mode: Check for overflow in Average ([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955)) +* Block count(table.*) to follow ANSI standard and other SQL engines ([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199)) + +**Performance** + +* Query compilation latency + * Support traversal pruning in transform/resolve functions and their call sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042)) + * Improve the performance of mapChildren and withNewChildren methods ([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) + * Improve the performance of type coercion rules ([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103)) +* Query optimization + * Remove redundant aggregates in the Optimizer ([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122)) + * Push down limit through Project with Join ([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622)) + * Push down limit for LEFT SEMI and LEFT ANTI join ([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), [SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514)) + * Push down limit through WINDOW when partition spec is empty ([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575)) + * Use a relative cost comparison function in the CBO ([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922)) + * Cardinality estimation of union, sort, and range operator ([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411)) + * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as broadcast join ([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081)) + * UnwrapCastInBinaryComparison support In/InSet predicate ([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316)) + * Subexpression elimination enhancements ([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448)) + * Keep necessary stats after partition pruning
[GitHub] [spark-website] gengliangwang commented on a change in pull request #361: Add 3.2.0 release note and news and update links
gengliangwang commented on a change in pull request #361: URL: https://github.com/apache/spark-website/pull/361#discussion_r730722931 ## File path: releases/_posts/2021-10-13-spark-release-3-2-0.md ## @@ -0,0 +1,318 @@ +--- +layout: post +title: Spark Release 3.2.0 +categories: [] +tags: [] +status: publish +type: post +published: true +meta: +_edit_last: '4' +_wpas_done_all: '1' +--- + +Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets. + +In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA. + +To download Apache Spark 3.2.0, visit the [downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA for the [detailed changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407). We have curated a list of high level changes here, grouped by major modules. + +* This will become a table of contents (this text will be scraped). +{:toc} + +### Highlights + +* Support Pandas API layer on PySpark ([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849)) +* Support push-based shuffle to improve shuffle efficiency ([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602)) +* Add RocksDB StateStore implementation ([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198)) +* EventTime based sessionization (session window) ([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816)) +* ANSI SQL mode GA ([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030)) +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* Enable adaptive query execution by default ([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679)) +* Query compilation latency reduction ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), [SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), [SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) +* Support Scala 2.13 ([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218)) + + +### Core and Spark SQL + +**ANSI SQL Compatibility Enhancements** + +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* New type coercion syntax rules in ANSI mode ([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246)) +* Support LATERAL subqueries ([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382)) +* ANSI mode: IntegralDivide throws an exception on overflow ([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152)) +* ANSI mode: Check for overflow in Average ([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955)) +* Block count(table.*) to follow ANSI standard and other SQL engines ([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199)) + +**Performance** + +* Query compilation latency + * Support traversal pruning in transform/resolve functions and their call sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042)) + * Improve the performance of mapChildren and withNewChildren methods ([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) + * Improve the performance of type coercion rules ([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103)) +* Query optimization + * Remove redundant aggregates in the Optimizer ([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122)) + * Push down limit through Project with Join ([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622)) + * Push down limit for LEFT SEMI and LEFT ANTI join ([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), [SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514)) + * Push down limit through WINDOW when partition spec is empty ([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575)) + * Use a relative cost comparison function in the CBO ([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922)) + * Cardinality estimation of union, sort, and range operator ([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411)) + * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as broadcast join ([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081)) + * UnwrapCastInBinaryComparison support In/InSet predicate ([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316)) + * Subexpression elimination enhancements ([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448)) + * Keep necessary stats after partition pruning
[GitHub] [spark-website] gengliangwang commented on a change in pull request #361: Add 3.2.0 release note and news and update links
gengliangwang commented on a change in pull request #361: URL: https://github.com/apache/spark-website/pull/361#discussion_r730721947 ## File path: releases/_posts/2021-10-13-spark-release-3-2-0.md ## @@ -0,0 +1,318 @@ +--- +layout: post +title: Spark Release 3.2.0 +categories: [] +tags: [] +status: publish +type: post +published: true +meta: +_edit_last: '4' +_wpas_done_all: '1' +--- + +Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets. + +In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA. + +To download Apache Spark 3.2.0, visit the [downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA for the [detailed changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407). We have curated a list of high level changes here, grouped by major modules. + +* This will become a table of contents (this text will be scraped). +{:toc} + +### Highlights + +* Support Pandas API layer on PySpark ([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849)) +* Support push-based shuffle to improve shuffle efficiency ([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602)) +* Add RocksDB StateStore implementation ([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198)) +* EventTime based sessionization (session window) ([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816)) +* ANSI SQL mode GA ([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030)) +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* Enable adaptive query execution by default ([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679)) +* Query compilation latency reduction ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), [SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), [SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) +* Support Scala 2.13 ([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218)) + + +### Core and Spark SQL + +**ANSI SQL Compatibility Enhancements** + +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* New type coercion syntax rules in ANSI mode ([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246)) +* Support LATERAL subqueries ([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382)) +* ANSI mode: IntegralDivide throws an exception on overflow ([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152)) +* ANSI mode: Check for overflow in Average ([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955)) +* Block count(table.*) to follow ANSI standard and other SQL engines ([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199)) + +**Performance** + +* Query compilation latency + * Support traversal pruning in transform/resolve functions and their call sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042)) + * Improve the performance of mapChildren and withNewChildren methods ([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) + * Improve the performance of type coercion rules ([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103)) +* Query optimization + * Remove redundant aggregates in the Optimizer ([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122)) + * Push down limit through Project with Join ([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622)) + * Push down limit for LEFT SEMI and LEFT ANTI join ([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), [SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514)) + * Push down limit through WINDOW when partition spec is empty ([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575)) + * Use a relative cost comparison function in the CBO ([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922)) + * Cardinality estimation of union, sort, and range operator ([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411)) + * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as broadcast join ([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081)) + * UnwrapCastInBinaryComparison support In/InSet predicate ([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316)) + * Subexpression elimination enhancements ([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448)) + * Keep necessary stats after partition pruning
[GitHub] [spark-website] gengliangwang commented on a change in pull request #361: Add 3.2.0 release note and news and update links
gengliangwang commented on a change in pull request #361: URL: https://github.com/apache/spark-website/pull/361#discussion_r730721796 ## File path: releases/_posts/2021-10-13-spark-release-3-2-0.md ## @@ -0,0 +1,318 @@ +--- +layout: post +title: Spark Release 3.2.0 +categories: [] +tags: [] +status: publish +type: post +published: true +meta: +_edit_last: '4' +_wpas_done_all: '1' +--- + +Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets. + +In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA. + +To download Apache Spark 3.2.0, visit the [downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA for the [detailed changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407). We have curated a list of high level changes here, grouped by major modules. + +* This will become a table of contents (this text will be scraped). +{:toc} + +### Highlights + +* Support Pandas API layer on PySpark ([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849)) +* Support push-based shuffle to improve shuffle efficiency ([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602)) +* Add RocksDB StateStore implementation ([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198)) +* EventTime based sessionization (session window) ([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816)) +* ANSI SQL mode GA ([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030)) +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* Enable adaptive query execution by default ([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679)) +* Query compilation latency reduction ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), [SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), [SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) +* Support Scala 2.13 ([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218)) + + +### Core and Spark SQL + +**ANSI SQL Compatibility Enhancements** + +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* New type coercion syntax rules in ANSI mode ([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246)) +* Support LATERAL subqueries ([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382)) +* ANSI mode: IntegralDivide throws an exception on overflow ([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152)) +* ANSI mode: Check for overflow in Average ([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955)) +* Block count(table.*) to follow ANSI standard and other SQL engines ([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199)) + +**Performance** + +* Query compilation latency + * Support traversal pruning in transform/resolve functions and their call sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042)) + * Improve the performance of mapChildren and withNewChildren methods ([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) + * Improve the performance of type coercion rules ([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103)) +* Query optimization + * Remove redundant aggregates in the Optimizer ([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122)) + * Push down limit through Project with Join ([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622)) + * Push down limit for LEFT SEMI and LEFT ANTI join ([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), [SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514)) + * Push down limit through WINDOW when partition spec is empty ([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575)) + * Use a relative cost comparison function in the CBO ([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922)) + * Cardinality estimation of union, sort, and range operator ([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411)) + * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as broadcast join ([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081)) + * UnwrapCastInBinaryComparison support In/InSet predicate ([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316)) + * Subexpression elimination enhancements ([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448)) + * Keep necessary stats after partition pruning
[GitHub] [spark-website] cloud-fan commented on a change in pull request #361: Add 3.2.0 release note and news and update links
cloud-fan commented on a change in pull request #361: URL: https://github.com/apache/spark-website/pull/361#discussion_r730721766 ## File path: releases/_posts/2021-10-13-spark-release-3-2-0.md ## @@ -0,0 +1,318 @@ +--- +layout: post +title: Spark Release 3.2.0 +categories: [] +tags: [] +status: publish +type: post +published: true +meta: +_edit_last: '4' +_wpas_done_all: '1' +--- + +Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets. + +In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA. + +To download Apache Spark 3.2.0, visit the [downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA for the [detailed changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407). We have curated a list of high level changes here, grouped by major modules. + +* This will become a table of contents (this text will be scraped). +{:toc} + +### Highlights + +* Support Pandas API layer on PySpark ([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849)) +* Support push-based shuffle to improve shuffle efficiency ([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602)) +* Add RocksDB StateStore implementation ([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198)) +* EventTime based sessionization (session window) ([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816)) +* ANSI SQL mode GA ([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030)) +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* Enable adaptive query execution by default ([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679)) +* Query compilation latency reduction ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), [SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), [SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) +* Support Scala 2.13 ([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218)) + + +### Core and Spark SQL + +**ANSI SQL Compatibility Enhancements** + +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* New type coercion syntax rules in ANSI mode ([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246)) +* Support LATERAL subqueries ([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382)) +* ANSI mode: IntegralDivide throws an exception on overflow ([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152)) +* ANSI mode: Check for overflow in Average ([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955)) +* Block count(table.*) to follow ANSI standard and other SQL engines ([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199)) + +**Performance** + +* Query compilation latency + * Support traversal pruning in transform/resolve functions and their call sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042)) + * Improve the performance of mapChildren and withNewChildren methods ([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) + * Improve the performance of type coercion rules ([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103)) +* Query optimization + * Remove redundant aggregates in the Optimizer ([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122)) + * Push down limit through Project with Join ([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622)) + * Push down limit for LEFT SEMI and LEFT ANTI join ([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), [SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514)) + * Push down limit through WINDOW when partition spec is empty ([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575)) + * Use a relative cost comparison function in the CBO ([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922)) + * Cardinality estimation of union, sort, and range operator ([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411)) + * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as broadcast join ([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081)) + * UnwrapCastInBinaryComparison support In/InSet predicate ([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316)) + * Subexpression elimination enhancements ([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448)) + * Keep necessary stats after partition pruning
[GitHub] [spark-website] yaooqinn commented on a change in pull request #361: Add 3.2.0 release note and news and update links
yaooqinn commented on a change in pull request #361: URL: https://github.com/apache/spark-website/pull/361#discussion_r730716668 ## File path: releases/_posts/2021-10-13-spark-release-3-2-0.md ## @@ -0,0 +1,318 @@ +--- +layout: post +title: Spark Release 3.2.0 +categories: [] +tags: [] +status: publish +type: post +published: true +meta: +_edit_last: '4' +_wpas_done_all: '1' +--- + +Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets. + +In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA. + +To download Apache Spark 3.2.0, visit the [downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA for the [detailed changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407). We have curated a list of high level changes here, grouped by major modules. + +* This will become a table of contents (this text will be scraped). +{:toc} + +### Highlights + +* Support Pandas API layer on PySpark ([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849)) +* Support push-based shuffle to improve shuffle efficiency ([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602)) +* Add RocksDB StateStore implementation ([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198)) +* EventTime based sessionization (session window) ([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816)) +* ANSI SQL mode GA ([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030)) +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* Enable adaptive query execution by default ([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679)) +* Query compilation latency reduction ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), [SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), [SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) +* Support Scala 2.13 ([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218)) + + +### Core and Spark SQL + +**ANSI SQL Compatibility Enhancements** + +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* New type coercion syntax rules in ANSI mode ([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246)) +* Support LATERAL subqueries ([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382)) +* ANSI mode: IntegralDivide throws an exception on overflow ([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152)) +* ANSI mode: Check for overflow in Average ([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955)) +* Block count(table.*) to follow ANSI standard and other SQL engines ([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199)) + +**Performance** + +* Query compilation latency + * Support traversal pruning in transform/resolve functions and their call sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042)) + * Improve the performance of mapChildren and withNewChildren methods ([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) + * Improve the performance of type coercion rules ([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103)) +* Query optimization + * Remove redundant aggregates in the Optimizer ([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122)) + * Push down limit through Project with Join ([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622)) + * Push down limit for LEFT SEMI and LEFT ANTI join ([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), [SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514)) + * Push down limit through WINDOW when partition spec is empty ([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575)) + * Use a relative cost comparison function in the CBO ([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922)) + * Cardinality estimation of union, sort, and range operator ([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411)) + * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as broadcast join ([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081)) + * UnwrapCastInBinaryComparison support In/InSet predicate ([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316)) + * Subexpression elimination enhancements ([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448)) + * Keep necessary stats after partition pruning
[GitHub] [spark-website] gengliangwang commented on pull request #361: Add 3.2.0 release note and news and update links
gengliangwang commented on pull request #361: URL: https://github.com/apache/spark-website/pull/361#issuecomment-945532927 FYI the PySpark 3.2 is available on PyPI now: https://pypi.org/project/pyspark/3.2.0/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] cloud-fan commented on a change in pull request #361: Add 3.2.0 release note and news and update links
cloud-fan commented on a change in pull request #361: URL: https://github.com/apache/spark-website/pull/361#discussion_r730688457 ## File path: releases/_posts/2021-10-13-spark-release-3-2-0.md ## @@ -0,0 +1,318 @@ +--- +layout: post +title: Spark Release 3.2.0 +categories: [] +tags: [] +status: publish +type: post +published: true +meta: +_edit_last: '4' +_wpas_done_all: '1' +--- + +Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets. + +In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA. + +To download Apache Spark 3.2.0, visit the [downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA for the [detailed changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407). We have curated a list of high level changes here, grouped by major modules. + +* This will become a table of contents (this text will be scraped). +{:toc} + +### Highlights + +* Support Pandas API layer on PySpark ([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849)) +* Support push-based shuffle to improve shuffle efficiency ([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602)) +* Add RocksDB StateStore implementation ([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198)) +* EventTime based sessionization (session window) ([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816)) +* ANSI SQL mode GA ([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030)) +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* Enable adaptive query execution by default ([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679)) +* Query compilation latency reduction ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), [SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), [SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) +* Support Scala 2.13 ([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218)) + + +### Core and Spark SQL + +**ANSI SQL Compatibility Enhancements** + +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* New type coercion syntax rules in ANSI mode ([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246)) +* Support LATERAL subqueries ([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382)) +* ANSI mode: IntegralDivide throws an exception on overflow ([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152)) +* ANSI mode: Check for overflow in Average ([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955)) +* Block count(table.*) to follow ANSI standard and other SQL engines ([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199)) + +**Performance** + +* Query compilation latency + * Support traversal pruning in transform/resolve functions and their call sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042)) + * Improve the performance of mapChildren and withNewChildren methods ([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) + * Improve the performance of type coercion rules ([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103)) +* Query optimization + * Remove redundant aggregates in the Optimizer ([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122)) + * Push down limit through Project with Join ([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622)) + * Push down limit for LEFT SEMI and LEFT ANTI join ([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), [SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514)) + * Push down limit through WINDOW when partition spec is empty ([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575)) + * Use a relative cost comparison function in the CBO ([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922)) + * Cardinality estimation of union, sort, and range operator ([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411)) + * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as broadcast join ([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081)) + * UnwrapCastInBinaryComparison support In/InSet predicate ([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316)) + * Subexpression elimination enhancements ([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448)) + * Keep necessary stats after partition pruning
[GitHub] [spark-website] cloud-fan commented on a change in pull request #361: Add 3.2.0 release note and news and update links
cloud-fan commented on a change in pull request #361: URL: https://github.com/apache/spark-website/pull/361#discussion_r730685882 ## File path: releases/_posts/2021-10-13-spark-release-3-2-0.md ## @@ -0,0 +1,318 @@ +--- +layout: post +title: Spark Release 3.2.0 +categories: [] +tags: [] +status: publish +type: post +published: true +meta: +_edit_last: '4' +_wpas_done_all: '1' +--- + +Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets. + +In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA. + +To download Apache Spark 3.2.0, visit the [downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA for the [detailed changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407). We have curated a list of high level changes here, grouped by major modules. + +* This will become a table of contents (this text will be scraped). +{:toc} + +### Highlights + +* Support Pandas API layer on PySpark ([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849)) +* Support push-based shuffle to improve shuffle efficiency ([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602)) +* Add RocksDB StateStore implementation ([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198)) +* EventTime based sessionization (session window) ([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816)) +* ANSI SQL mode GA ([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030)) +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* Enable adaptive query execution by default ([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679)) +* Query compilation latency reduction ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), [SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), [SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) +* Support Scala 2.13 ([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218)) + + +### Core and Spark SQL + +**ANSI SQL Compatibility Enhancements** + +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* New type coercion syntax rules in ANSI mode ([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246)) +* Support LATERAL subqueries ([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382)) +* ANSI mode: IntegralDivide throws an exception on overflow ([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152)) +* ANSI mode: Check for overflow in Average ([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955)) +* Block count(table.*) to follow ANSI standard and other SQL engines ([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199)) + +**Performance** + +* Query compilation latency + * Support traversal pruning in transform/resolve functions and their call sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042)) + * Improve the performance of mapChildren and withNewChildren methods ([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) + * Improve the performance of type coercion rules ([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103)) +* Query optimization + * Remove redundant aggregates in the Optimizer ([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122)) + * Push down limit through Project with Join ([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622)) + * Push down limit for LEFT SEMI and LEFT ANTI join ([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), [SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514)) + * Push down limit through WINDOW when partition spec is empty ([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575)) + * Use a relative cost comparison function in the CBO ([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922)) + * Cardinality estimation of union, sort, and range operator ([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411)) + * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as broadcast join ([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081)) + * UnwrapCastInBinaryComparison support In/InSet predicate ([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316)) + * Subexpression elimination enhancements ([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448)) + * Keep necessary stats after partition pruning
[GitHub] [spark-website] cloud-fan commented on a change in pull request #361: Add 3.2.0 release note and news and update links
cloud-fan commented on a change in pull request #361: URL: https://github.com/apache/spark-website/pull/361#discussion_r730684820 ## File path: releases/_posts/2021-10-13-spark-release-3-2-0.md ## @@ -0,0 +1,318 @@ +--- +layout: post +title: Spark Release 3.2.0 +categories: [] +tags: [] +status: publish +type: post +published: true +meta: +_edit_last: '4' +_wpas_done_all: '1' +--- + +Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets. + +In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA. + +To download Apache Spark 3.2.0, visit the [downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA for the [detailed changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407). We have curated a list of high level changes here, grouped by major modules. + +* This will become a table of contents (this text will be scraped). +{:toc} + +### Highlights + +* Support Pandas API layer on PySpark ([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849)) +* Support push-based shuffle to improve shuffle efficiency ([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602)) +* Add RocksDB StateStore implementation ([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198)) +* EventTime based sessionization (session window) ([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816)) +* ANSI SQL mode GA ([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030)) +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* Enable adaptive query execution by default ([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679)) +* Query compilation latency reduction ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), [SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), [SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) +* Support Scala 2.13 ([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218)) + + +### Core and Spark SQL + +**ANSI SQL Compatibility Enhancements** + +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* New type coercion syntax rules in ANSI mode ([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246)) +* Support LATERAL subqueries ([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382)) +* ANSI mode: IntegralDivide throws an exception on overflow ([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152)) +* ANSI mode: Check for overflow in Average ([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955)) +* Block count(table.*) to follow ANSI standard and other SQL engines ([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199)) + +**Performance** + +* Query compilation latency + * Support traversal pruning in transform/resolve functions and their call sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042)) + * Improve the performance of mapChildren and withNewChildren methods ([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) + * Improve the performance of type coercion rules ([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103)) +* Query optimization + * Remove redundant aggregates in the Optimizer ([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122)) + * Push down limit through Project with Join ([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622)) + * Push down limit for LEFT SEMI and LEFT ANTI join ([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), [SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514)) + * Push down limit through WINDOW when partition spec is empty ([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575)) + * Use a relative cost comparison function in the CBO ([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922)) + * Cardinality estimation of union, sort, and range operator ([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411)) + * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as broadcast join ([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081)) + * UnwrapCastInBinaryComparison support In/InSet predicate ([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316)) + * Subexpression elimination enhancements ([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448)) + * Keep necessary stats after partition pruning
[GitHub] [spark-website] h-vetinari commented on pull request #361: Add 3.2.0 release note and news and update links
h-vetinari commented on pull request #361: URL: https://github.com/apache/spark-website/pull/361#issuecomment-945483919 > @gengliangwang Have we pushed it to Conda? Hey all, I started a pull-request to package pyspark 3.2 for conda-forge (normally this would have been done sooner already, but the automated bot was waiting for the PyPI upload): https://github.com/conda-forge/pyspark-feedstock/pull/31 It turns out that pyspark specifies `py4j==0.10.9.2`, whereas conda-forge currently only has `py4j==0.10.9`. Building both these packages should take a couple of hours, depending on how fast I can get people to merge the PRs. After the packages have been built (and the packages become available through the content delivery network; takes about an hour upon CI completion), it would be possible to install pyspark through conda-forge as follows: ``` conda install -c conda-forge pyspark=3.2 ``` Hope this helps. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] gengliangwang commented on a change in pull request #361: Add 3.2.0 release note and news and update links
gengliangwang commented on a change in pull request #361: URL: https://github.com/apache/spark-website/pull/361#discussion_r730621527 ## File path: releases/_posts/2021-10-13-spark-release-3-2-0.md ## @@ -0,0 +1,318 @@ +--- +layout: post +title: Spark Release 3.2.0 +categories: [] +tags: [] +status: publish +type: post +published: true +meta: +_edit_last: '4' +_wpas_done_all: '1' +--- + +Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets. + +In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA. + +To download Apache Spark 3.2.0, visit the [downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA for the [detailed changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407). We have curated a list of high level changes here, grouped by major modules. + +* This will become a table of contents (this text will be scraped). +{:toc} + +### Highlights + +* Support Pandas API layer on PySpark ([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849)) +* Support push-based shuffle to improve shuffle efficiency ([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602)) +* Add RocksDB StateStore implementation ([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198)) +* EventTime based sessionization (session window) ([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816)) +* ANSI SQL mode GA ([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030)) +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* Enable adaptive query execution by default ([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679)) +* Query compilation latency reduction ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), [SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), [SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) +* Support Scala 2.13 ([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218)) + + +### Core and Spark SQL + +**ANSI SQL Compatibility Enhancements** + +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* New type coercion syntax rules in ANSI mode ([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246)) +* Support LATERAL subqueries ([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382)) +* ANSI mode: IntegralDivide throws an exception on overflow ([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152)) +* ANSI mode: Check for overflow in Average ([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955)) +* Block count(table.*) to follow ANSI standard and other SQL engines ([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199)) + +**Performance** + +* Query compilation latency + * Support traversal pruning in transform/resolve functions and their call sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042)) + * Improve the performance of mapChildren and withNewChildren methods ([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) + * Improve the performance of type coercion rules ([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103)) +* Query optimization + * Remove redundant aggregates in the Optimizer ([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122)) + * Push down limit through Project with Join ([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622)) + * Push down limit for LEFT SEMI and LEFT ANTI join ([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), [SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514)) + * Push down limit through WINDOW when partition spec is empty ([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575)) + * Use a relative cost comparison function in the CBO ([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922)) + * Cardinality estimation of union, sort, and range operator ([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411)) + * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as broadcast join ([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081)) + * UnwrapCastInBinaryComparison support In/InSet predicate ([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316)) + * Subexpression elimination enhancements ([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448)) + * Keep necessary stats after partition pruning
[GitHub] [spark-website] gengliangwang commented on a change in pull request #361: Add 3.2.0 release note and news and update links
gengliangwang commented on a change in pull request #361: URL: https://github.com/apache/spark-website/pull/361#discussion_r730621346 ## File path: releases/_posts/2021-10-13-spark-release-3-2-0.md ## @@ -0,0 +1,318 @@ +--- +layout: post +title: Spark Release 3.2.0 +categories: [] +tags: [] +status: publish +type: post +published: true +meta: +_edit_last: '4' +_wpas_done_all: '1' +--- + +Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets. + +In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA. + +To download Apache Spark 3.2.0, visit the [downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA for the [detailed changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407). We have curated a list of high level changes here, grouped by major modules. + +* This will become a table of contents (this text will be scraped). +{:toc} + +### Highlights + +* Support Pandas API layer on PySpark ([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849)) +* Support push-based shuffle to improve shuffle efficiency ([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602)) +* Add RocksDB StateStore implementation ([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198)) +* EventTime based sessionization (session window) ([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816)) +* ANSI SQL mode GA ([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030)) +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* Enable adaptive query execution by default ([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679)) +* Query compilation latency reduction ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), [SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), [SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) +* Support Scala 2.13 ([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218)) + + +### Core and Spark SQL + +**ANSI SQL Compatibility Enhancements** + +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* New type coercion syntax rules in ANSI mode ([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246)) +* Support LATERAL subqueries ([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382)) +* ANSI mode: IntegralDivide throws an exception on overflow ([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152)) +* ANSI mode: Check for overflow in Average ([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955)) +* Block count(table.*) to follow ANSI standard and other SQL engines ([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199)) + +**Performance** + +* Query compilation latency + * Support traversal pruning in transform/resolve functions and their call sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042)) + * Improve the performance of mapChildren and withNewChildren methods ([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) + * Improve the performance of type coercion rules ([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103)) +* Query optimization + * Remove redundant aggregates in the Optimizer ([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122)) + * Push down limit through Project with Join ([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622)) + * Push down limit for LEFT SEMI and LEFT ANTI join ([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), [SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514)) + * Push down limit through WINDOW when partition spec is empty ([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575)) + * Use a relative cost comparison function in the CBO ([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922)) + * Cardinality estimation of union, sort, and range operator ([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411)) + * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as broadcast join ([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081)) + * UnwrapCastInBinaryComparison support In/InSet predicate ([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316)) + * Subexpression elimination enhancements ([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448)) + * Keep necessary stats after partition pruning
[GitHub] [spark-website] gengliangwang commented on a change in pull request #361: Add 3.2.0 release note and news and update links
gengliangwang commented on a change in pull request #361: URL: https://github.com/apache/spark-website/pull/361#discussion_r730617091 ## File path: site/releases/spark-release-3-2-0.html ## @@ -0,0 +1,623 @@ + + + + + + + + + Spark Release 3.2.0 | Apache Spark + + + + + + + + https://cdn.jsdelivr.net/npm/bootstrap@5.0.2/dist/css/bootstrap.min.css; rel="stylesheet" + integrity="sha384-EVSTQN3/azprG1Anm3QDgpJLIm9Nao0Yz1ztcQTwFspd3yD65VohhpuuCOmLASjC" crossorigin="anonymous"> + https://fonts.googleapis.com;> + https://fonts.gstatic.com; crossorigin> + https://fonts.googleapis.com/css2?family=DM+Sans:ital,wght@0,400;0,500;0,700;1,400;1,500;1,700+Prime:wght@400;700=swap; rel="stylesheet"> + + + + + + + + var _gaq = _gaq || []; + _gaq.push(['_setAccount', 'UA-32518208-2']); + _gaq.push(['_trackPageview']); + (function() { +var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; +ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; +var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); + })(); + + + function trackOutboundLink(link, category, action) { +try { + _gaq.push(['_trackEvent', category , action]); +} catch(err){} + +setTimeout(function() { + document.location.href = link.href; +}, 100); + } + + + + + + + + + + + + + + +Download + + + + Libraries + + + SQL and DataFrames + Spark Streaming + MLlib (machine learning) + GraphX (graph) + + + + Third-Party Projects + + + + + Documentation + + + Latest Release (Spark 3.2.0) + Older Versions and Other Resources + Frequently Asked Questions + + + +Examples + + + + Community + + + Mailing Lists Resources + Contributing to Spark + Improvement Proposals (SPIP) + + https://issues.apache.org/jira/browse/SPARK;>Issue Tracker + + Powered By + Project Committers + Project History + + + + + Developers + + + Useful Developer Tools + Versioning Policy + Release Process + Security + + + + + + + Apache Software Foundation + + + https://www.apache.org/;>Apache Homepage + https://www.apache.org/licenses/;>License + https://www.apache.org/foundation/sponsorship.html;>Sponsorship + https://www.apache.org/foundation/thanks.html;>Thanks + https://www.apache.org/security/;>Security + + + + + + + + + + Spark Release 3.2.0 + + +Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets. + +In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA. Review comment: Yes, I am following the official website. cc @HyukjinKwon @zero323 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] sarutak commented on a change in pull request #361: Add 3.2.0 release note and news and update links
sarutak commented on a change in pull request #361: URL: https://github.com/apache/spark-website/pull/361#discussion_r730603780 ## File path: site/releases/spark-release-3-2-0.html ## @@ -0,0 +1,623 @@ + + + + + + + + + Spark Release 3.2.0 | Apache Spark + + + + + + + + https://cdn.jsdelivr.net/npm/bootstrap@5.0.2/dist/css/bootstrap.min.css; rel="stylesheet" + integrity="sha384-EVSTQN3/azprG1Anm3QDgpJLIm9Nao0Yz1ztcQTwFspd3yD65VohhpuuCOmLASjC" crossorigin="anonymous"> + https://fonts.googleapis.com;> + https://fonts.gstatic.com; crossorigin> + https://fonts.googleapis.com/css2?family=DM+Sans:ital,wght@0,400;0,500;0,700;1,400;1,500;1,700+Prime:wght@400;700=swap; rel="stylesheet"> + + + + + + + + var _gaq = _gaq || []; + _gaq.push(['_setAccount', 'UA-32518208-2']); + _gaq.push(['_trackPageview']); + (function() { +var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; +ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; +var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); + })(); + + + function trackOutboundLink(link, category, action) { +try { + _gaq.push(['_trackEvent', category , action]); +} catch(err){} + +setTimeout(function() { + document.location.href = link.href; +}, 100); + } + + + + + + + + + + + + + + +Download + + + + Libraries + + + SQL and DataFrames + Spark Streaming + MLlib (machine learning) + GraphX (graph) + + + + Third-Party Projects + + + + + Documentation + + + Latest Release (Spark 3.2.0) + Older Versions and Other Resources + Frequently Asked Questions + + + +Examples + + + + Community + + + Mailing Lists Resources + Contributing to Spark + Improvement Proposals (SPIP) + + https://issues.apache.org/jira/browse/SPARK;>Issue Tracker + + Powered By + Project Committers + Project History + + + + + Developers + + + Useful Developer Tools + Versioning Policy + Release Process + Security + + + + + + + Apache Software Foundation + + + https://www.apache.org/;>Apache Homepage + https://www.apache.org/licenses/;>License + https://www.apache.org/foundation/sponsorship.html;>Sponsorship + https://www.apache.org/foundation/thanks.html;>Thanks + https://www.apache.org/security/;>Security + + + + + + + + + + Spark Release 3.2.0 + + +Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets. + +In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA. + +To download Apache Spark 3.2.0, visit the https://spark.apache.org/downloads.html;>downloads page. You can consult JIRA for the https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420version=12349407;>detailed changes. We have curated a list of high level changes here, grouped by major modules. + + + Highlights + Core and Spark SQL + Structured Streaming + PySpark + MLLIB + SparkR + GraphX + Deprecations and Removals + Known Issues + Credits + + +Highlights + + + Support Pandas API layer on PySpark (https://issues.apache.org/jira/browse/SPARK-34849;>SPARK-34849) + Support push-based shuffle to improve shuffle efficiency (https://issues.apache.org/jira/browse/SPARK-30602;>SPARK-30602) + Add RocksDB StateStore implementation (https://issues.apache.org/jira/browse/SPARK-34198;>SPARK-34198) + EventTime based sessionization (session window) (https://issues.apache.org/jira/browse/SPARK-10816;>SPARK-10816) + ANSI SQL mode GA (https://issues.apache.org/jira/browse/SPARK-35030;>SPARK-35030) + Support for ANSI SQL INTERVAL types (https://issues.apache.org/jira/browse/SPARK-27790;>SPARK-27790) + Enable adaptive query execution by default
[GitHub] [spark-website] gengliangwang commented on a change in pull request #361: Add 3.2.0 release note and news and update links
gengliangwang commented on a change in pull request #361: URL: https://github.com/apache/spark-website/pull/361#discussion_r730612896 ## File path: releases/_posts/2021-10-13-spark-release-3-2-0.md ## @@ -0,0 +1,318 @@ +--- +layout: post +title: Spark Release 3.2.0 +categories: [] +tags: [] +status: publish +type: post +published: true +meta: +_edit_last: '4' +_wpas_done_all: '1' +--- + +Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets. + +In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA. + +To download Apache Spark 3.2.0, visit the [downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA for the [detailed changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407). We have curated a list of high level changes here, grouped by major modules. Review comment: It means the JIRA website. This is the same with the release note of 3.1.1 and 3.0.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] gengliangwang commented on a change in pull request #361: Add 3.2.0 release note and news and update links
gengliangwang commented on a change in pull request #361: URL: https://github.com/apache/spark-website/pull/361#discussion_r730606615 ## File path: releases/_posts/2021-10-13-spark-release-3-2-0.md ## @@ -0,0 +1,318 @@ +--- +layout: post +title: Spark Release 3.2.0 +categories: [] +tags: [] +status: publish +type: post +published: true +meta: +_edit_last: '4' +_wpas_done_all: '1' +--- + +Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets. + +In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA. + +To download Apache Spark 3.2.0, visit the [downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA for the [detailed changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407). We have curated a list of high level changes here, grouped by major modules. + +* This will become a table of contents (this text will be scraped). +{:toc} + +### Highlights + +* Support Pandas API layer on PySpark ([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849)) +* Support push-based shuffle to improve shuffle efficiency ([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602)) +* Add RocksDB StateStore implementation ([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198)) +* EventTime based sessionization (session window) ([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816)) +* ANSI SQL mode GA ([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030)) +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* Enable adaptive query execution by default ([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679)) +* Query compilation latency reduction ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), [SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), [SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) +* Support Scala 2.13 ([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218)) + + +### Core and Spark SQL + +**ANSI SQL Compatibility Enhancements** + +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* New type coercion syntax rules in ANSI mode ([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246)) +* ANSI mode: IntegralDivide throws exception on overflow ([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152)) +* ANSI mode: Check for overflow in Average ([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955)) +* Block count(table.*) to follow ANSI standard and other SQL engines ([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199)) + +**Performance** + +* Query compilation latency + * Support traversal pruning in transform/resolve functions and their call sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042)) + * Improve the performance of mapChildren and withNewChildren methods ([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) + * Improve the performance of type coercion rules ([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103)) +* Query optimization + * Remove redundant aggregates in the Optimizer ([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122)) + * Push down limit through Project with Join ([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622)) + * Push down limit for LEFT SEMI and LEFT ANTI join ([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), [SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514)) + * Push down limit through window when partitionSpec is empty ([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575)) + * Use a relative cost comparison function in the CBO ([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922)) + * Cardinality estimation of union, sort and range operator ([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411)) + * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as broadcast join ([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081)) + * UnwrapCastInBinaryComparison support In/InSet predicate ([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316)) + * Subexpression elimination enhancements ([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448)) + * Keep necessary stats after partition pruning ([SPARK-34119](https://issues.apache.org/jira/browse/SPARK-34119)) + * Decouple bucket filter pruning
[GitHub] [spark-website] sarutak commented on a change in pull request #361: Add 3.2.0 release note and news and update links
sarutak commented on a change in pull request #361: URL: https://github.com/apache/spark-website/pull/361#discussion_r730600301 ## File path: site/releases/spark-release-3-2-0.html ## @@ -0,0 +1,623 @@ + + + + + + + + + Spark Release 3.2.0 | Apache Spark + + + + + + + + https://cdn.jsdelivr.net/npm/bootstrap@5.0.2/dist/css/bootstrap.min.css; rel="stylesheet" + integrity="sha384-EVSTQN3/azprG1Anm3QDgpJLIm9Nao0Yz1ztcQTwFspd3yD65VohhpuuCOmLASjC" crossorigin="anonymous"> + https://fonts.googleapis.com;> + https://fonts.gstatic.com; crossorigin> + https://fonts.googleapis.com/css2?family=DM+Sans:ital,wght@0,400;0,500;0,700;1,400;1,500;1,700+Prime:wght@400;700=swap; rel="stylesheet"> + + + + + + + + var _gaq = _gaq || []; + _gaq.push(['_setAccount', 'UA-32518208-2']); + _gaq.push(['_trackPageview']); + (function() { +var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; +ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; +var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); + })(); + + + function trackOutboundLink(link, category, action) { +try { + _gaq.push(['_trackEvent', category , action]); +} catch(err){} + +setTimeout(function() { + document.location.href = link.href; +}, 100); + } + + + + + + + + + + + + + + +Download + + + + Libraries + + + SQL and DataFrames + Spark Streaming + MLlib (machine learning) + GraphX (graph) + + + + Third-Party Projects + + + + + Documentation + + + Latest Release (Spark 3.2.0) + Older Versions and Other Resources + Frequently Asked Questions + + + +Examples + + + + Community + + + Mailing Lists Resources + Contributing to Spark + Improvement Proposals (SPIP) + + https://issues.apache.org/jira/browse/SPARK;>Issue Tracker + + Powered By + Project Committers + Project History + + + + + Developers + + + Useful Developer Tools + Versioning Policy + Release Process + Security + + + + + + + Apache Software Foundation + + + https://www.apache.org/;>Apache Homepage + https://www.apache.org/licenses/;>License + https://www.apache.org/foundation/sponsorship.html;>Sponsorship + https://www.apache.org/foundation/thanks.html;>Thanks + https://www.apache.org/security/;>Security + + + + + + + + + + Spark Release 3.2.0 + + +Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets. + +In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA. Review comment: Should we write `pandas` instead of `Pandas`? I don't know whether the capitalized notation is officially recognized or not, but all the occurrence of `pandas` in the official web site is not capitalized. https://pandas.pydata.org/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] sarutak commented on a change in pull request #361: Add 3.2.0 release note and news and update links
sarutak commented on a change in pull request #361: URL: https://github.com/apache/spark-website/pull/361#discussion_r730600301 ## File path: site/releases/spark-release-3-2-0.html ## @@ -0,0 +1,623 @@ + + + + + + + + + Spark Release 3.2.0 | Apache Spark + + + + + + + + https://cdn.jsdelivr.net/npm/bootstrap@5.0.2/dist/css/bootstrap.min.css; rel="stylesheet" + integrity="sha384-EVSTQN3/azprG1Anm3QDgpJLIm9Nao0Yz1ztcQTwFspd3yD65VohhpuuCOmLASjC" crossorigin="anonymous"> + https://fonts.googleapis.com;> + https://fonts.gstatic.com; crossorigin> + https://fonts.googleapis.com/css2?family=DM+Sans:ital,wght@0,400;0,500;0,700;1,400;1,500;1,700+Prime:wght@400;700=swap; rel="stylesheet"> + + + + + + + + var _gaq = _gaq || []; + _gaq.push(['_setAccount', 'UA-32518208-2']); + _gaq.push(['_trackPageview']); + (function() { +var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; +ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; +var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); + })(); + + + function trackOutboundLink(link, category, action) { +try { + _gaq.push(['_trackEvent', category , action]); +} catch(err){} + +setTimeout(function() { + document.location.href = link.href; +}, 100); + } + + + + + + + + + + + + + + +Download + + + + Libraries + + + SQL and DataFrames + Spark Streaming + MLlib (machine learning) + GraphX (graph) + + + + Third-Party Projects + + + + + Documentation + + + Latest Release (Spark 3.2.0) + Older Versions and Other Resources + Frequently Asked Questions + + + +Examples + + + + Community + + + Mailing Lists Resources + Contributing to Spark + Improvement Proposals (SPIP) + + https://issues.apache.org/jira/browse/SPARK;>Issue Tracker + + Powered By + Project Committers + Project History + + + + + Developers + + + Useful Developer Tools + Versioning Policy + Release Process + Security + + + + + + + Apache Software Foundation + + + https://www.apache.org/;>Apache Homepage + https://www.apache.org/licenses/;>License + https://www.apache.org/foundation/sponsorship.html;>Sponsorship + https://www.apache.org/foundation/thanks.html;>Thanks + https://www.apache.org/security/;>Security + + + + + + + + + + Spark Release 3.2.0 + + +Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets. + +In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA. Review comment: Should we write `pandas` instead of `Pandas`? I don't know whether the capitalized notation is officially recognized or not, but all the occurrence of `pandas` in the official web site is not capitalized. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] viirya commented on a change in pull request #361: Add 3.2.0 release note and news and update links
viirya commented on a change in pull request #361: URL: https://github.com/apache/spark-website/pull/361#discussion_r730593631 ## File path: releases/_posts/2021-10-13-spark-release-3-2-0.md ## @@ -0,0 +1,318 @@ +--- +layout: post +title: Spark Release 3.2.0 +categories: [] +tags: [] +status: publish +type: post +published: true +meta: +_edit_last: '4' +_wpas_done_all: '1' +--- + +Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets. + +In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA. + +To download Apache Spark 3.2.0, visit the [downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA for the [detailed changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407). We have curated a list of high level changes here, grouped by major modules. Review comment: consult JIRA or consult JIRAs? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] viirya commented on a change in pull request #361: Add 3.2.0 release note and news and update links
viirya commented on a change in pull request #361: URL: https://github.com/apache/spark-website/pull/361#discussion_r730597510 ## File path: releases/_posts/2021-10-13-spark-release-3-2-0.md ## @@ -0,0 +1,318 @@ +--- +layout: post +title: Spark Release 3.2.0 +categories: [] +tags: [] +status: publish +type: post +published: true +meta: +_edit_last: '4' +_wpas_done_all: '1' +--- + +Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets. + +In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA. + +To download Apache Spark 3.2.0, visit the [downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA for the [detailed changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407). We have curated a list of high level changes here, grouped by major modules. + +* This will become a table of contents (this text will be scraped). +{:toc} + +### Highlights + +* Support Pandas API layer on PySpark ([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849)) +* Support push-based shuffle to improve shuffle efficiency ([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602)) +* Add RocksDB StateStore implementation ([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198)) +* EventTime based sessionization (session window) ([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816)) +* ANSI SQL mode GA ([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030)) +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* Enable adaptive query execution by default ([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679)) +* Query compilation latency reduction ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), [SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), [SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) +* Support Scala 2.13 ([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218)) + + +### Core and Spark SQL + +**ANSI SQL Compatibility Enhancements** + +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* New type coercion syntax rules in ANSI mode ([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246)) +* Support LATERAL subqueries ([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382)) +* ANSI mode: IntegralDivide throws an exception on overflow ([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152)) +* ANSI mode: Check for overflow in Average ([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955)) +* Block count(table.*) to follow ANSI standard and other SQL engines ([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199)) + +**Performance** + +* Query compilation latency + * Support traversal pruning in transform/resolve functions and their call sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042)) + * Improve the performance of mapChildren and withNewChildren methods ([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) + * Improve the performance of type coercion rules ([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103)) +* Query optimization + * Remove redundant aggregates in the Optimizer ([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122)) + * Push down limit through Project with Join ([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622)) + * Push down limit for LEFT SEMI and LEFT ANTI join ([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), [SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514)) + * Push down limit through WINDOW when partition spec is empty ([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575)) + * Use a relative cost comparison function in the CBO ([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922)) + * Cardinality estimation of union, sort, and range operator ([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411)) + * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as broadcast join ([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081)) + * UnwrapCastInBinaryComparison support In/InSet predicate ([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316)) + * Subexpression elimination enhancements ([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448)) + * Keep necessary stats after partition pruning
[GitHub] [spark-website] viirya commented on a change in pull request #361: Add 3.2.0 release note and news and update links
viirya commented on a change in pull request #361: URL: https://github.com/apache/spark-website/pull/361#discussion_r730596615 ## File path: releases/_posts/2021-10-13-spark-release-3-2-0.md ## @@ -0,0 +1,318 @@ +--- +layout: post +title: Spark Release 3.2.0 +categories: [] +tags: [] +status: publish +type: post +published: true +meta: +_edit_last: '4' +_wpas_done_all: '1' +--- + +Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets. + +In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA. + +To download Apache Spark 3.2.0, visit the [downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA for the [detailed changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407). We have curated a list of high level changes here, grouped by major modules. + +* This will become a table of contents (this text will be scraped). +{:toc} + +### Highlights + +* Support Pandas API layer on PySpark ([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849)) +* Support push-based shuffle to improve shuffle efficiency ([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602)) +* Add RocksDB StateStore implementation ([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198)) +* EventTime based sessionization (session window) ([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816)) +* ANSI SQL mode GA ([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030)) +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* Enable adaptive query execution by default ([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679)) +* Query compilation latency reduction ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), [SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), [SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) +* Support Scala 2.13 ([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218)) + + +### Core and Spark SQL + +**ANSI SQL Compatibility Enhancements** + +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* New type coercion syntax rules in ANSI mode ([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246)) +* Support LATERAL subqueries ([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382)) +* ANSI mode: IntegralDivide throws an exception on overflow ([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152)) +* ANSI mode: Check for overflow in Average ([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955)) +* Block count(table.*) to follow ANSI standard and other SQL engines ([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199)) + +**Performance** + +* Query compilation latency + * Support traversal pruning in transform/resolve functions and their call sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042)) + * Improve the performance of mapChildren and withNewChildren methods ([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) + * Improve the performance of type coercion rules ([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103)) +* Query optimization + * Remove redundant aggregates in the Optimizer ([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122)) + * Push down limit through Project with Join ([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622)) + * Push down limit for LEFT SEMI and LEFT ANTI join ([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), [SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514)) + * Push down limit through WINDOW when partition spec is empty ([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575)) + * Use a relative cost comparison function in the CBO ([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922)) + * Cardinality estimation of union, sort, and range operator ([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411)) + * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as broadcast join ([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081)) + * UnwrapCastInBinaryComparison support In/InSet predicate ([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316)) + * Subexpression elimination enhancements ([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448)) + * Keep necessary stats after partition pruning
[GitHub] [spark-website] viirya commented on a change in pull request #361: Add 3.2.0 release note and news and update links
viirya commented on a change in pull request #361: URL: https://github.com/apache/spark-website/pull/361#discussion_r730595822 ## File path: releases/_posts/2021-10-13-spark-release-3-2-0.md ## @@ -0,0 +1,318 @@ +--- +layout: post +title: Spark Release 3.2.0 +categories: [] +tags: [] +status: publish +type: post +published: true +meta: +_edit_last: '4' +_wpas_done_all: '1' +--- + +Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets. + +In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA. + +To download Apache Spark 3.2.0, visit the [downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA for the [detailed changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407). We have curated a list of high level changes here, grouped by major modules. + +* This will become a table of contents (this text will be scraped). +{:toc} + +### Highlights + +* Support Pandas API layer on PySpark ([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849)) +* Support push-based shuffle to improve shuffle efficiency ([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602)) +* Add RocksDB StateStore implementation ([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198)) +* EventTime based sessionization (session window) ([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816)) +* ANSI SQL mode GA ([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030)) +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* Enable adaptive query execution by default ([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679)) +* Query compilation latency reduction ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), [SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), [SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) +* Support Scala 2.13 ([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218)) + + +### Core and Spark SQL + +**ANSI SQL Compatibility Enhancements** + +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* New type coercion syntax rules in ANSI mode ([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246)) +* Support LATERAL subqueries ([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382)) +* ANSI mode: IntegralDivide throws an exception on overflow ([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152)) +* ANSI mode: Check for overflow in Average ([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955)) +* Block count(table.*) to follow ANSI standard and other SQL engines ([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199)) + +**Performance** + +* Query compilation latency + * Support traversal pruning in transform/resolve functions and their call sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042)) + * Improve the performance of mapChildren and withNewChildren methods ([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) + * Improve the performance of type coercion rules ([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103)) +* Query optimization + * Remove redundant aggregates in the Optimizer ([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122)) + * Push down limit through Project with Join ([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622)) + * Push down limit for LEFT SEMI and LEFT ANTI join ([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), [SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514)) + * Push down limit through WINDOW when partition spec is empty ([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575)) + * Use a relative cost comparison function in the CBO ([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922)) + * Cardinality estimation of union, sort, and range operator ([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411)) + * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as broadcast join ([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081)) + * UnwrapCastInBinaryComparison support In/InSet predicate ([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316)) + * Subexpression elimination enhancements ([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448)) + * Keep necessary stats after partition pruning
[GitHub] [spark-website] viirya commented on a change in pull request #361: Add 3.2.0 release note and news and update links
viirya commented on a change in pull request #361: URL: https://github.com/apache/spark-website/pull/361#discussion_r730595393 ## File path: releases/_posts/2021-10-13-spark-release-3-2-0.md ## @@ -0,0 +1,318 @@ +--- +layout: post +title: Spark Release 3.2.0 +categories: [] +tags: [] +status: publish +type: post +published: true +meta: +_edit_last: '4' +_wpas_done_all: '1' +--- + +Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets. + +In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA. + +To download Apache Spark 3.2.0, visit the [downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA for the [detailed changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407). We have curated a list of high level changes here, grouped by major modules. + +* This will become a table of contents (this text will be scraped). +{:toc} + +### Highlights + +* Support Pandas API layer on PySpark ([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849)) +* Support push-based shuffle to improve shuffle efficiency ([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602)) +* Add RocksDB StateStore implementation ([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198)) +* EventTime based sessionization (session window) ([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816)) +* ANSI SQL mode GA ([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030)) +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* Enable adaptive query execution by default ([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679)) +* Query compilation latency reduction ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), [SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), [SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) +* Support Scala 2.13 ([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218)) + + +### Core and Spark SQL + +**ANSI SQL Compatibility Enhancements** + +* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790)) +* New type coercion syntax rules in ANSI mode ([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246)) +* Support LATERAL subqueries ([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382)) +* ANSI mode: IntegralDivide throws an exception on overflow ([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152)) +* ANSI mode: Check for overflow in Average ([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955)) +* Block count(table.*) to follow ANSI standard and other SQL engines ([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199)) + +**Performance** + +* Query compilation latency + * Support traversal pruning in transform/resolve functions and their call sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042)) + * Improve the performance of mapChildren and withNewChildren methods ([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989)) + * Improve the performance of type coercion rules ([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103)) +* Query optimization + * Remove redundant aggregates in the Optimizer ([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122)) + * Push down limit through Project with Join ([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622)) + * Push down limit for LEFT SEMI and LEFT ANTI join ([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), [SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514)) + * Push down limit through WINDOW when partition spec is empty ([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575)) + * Use a relative cost comparison function in the CBO ([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922)) + * Cardinality estimation of union, sort, and range operator ([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411)) + * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as broadcast join ([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081)) + * UnwrapCastInBinaryComparison support In/InSet predicate ([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316)) + * Subexpression elimination enhancements ([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448)) + * Keep necessary stats after partition pruning
[GitHub] [spark-website] viirya commented on a change in pull request #361: Add 3.2.0 release note and news and update links
viirya commented on a change in pull request #361: URL: https://github.com/apache/spark-website/pull/361#discussion_r730593631 ## File path: releases/_posts/2021-10-13-spark-release-3-2-0.md ## @@ -0,0 +1,318 @@ +--- +layout: post +title: Spark Release 3.2.0 +categories: [] +tags: [] +status: publish +type: post +published: true +meta: +_edit_last: '4' +_wpas_done_all: '1' +--- + +Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets. + +In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA. + +To download Apache Spark 3.2.0, visit the [downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA for the [detailed changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12349407). We have curated a list of high level changes here, grouped by major modules. Review comment: JIRA or JIRAs? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org