date:20230414

[spark] branch master updated: [SPARK-43150][SQL] Remove workaround for PARQUET-2160

2023-04-14 Thread sunchao

This is an automated email from the ASF dual-hosted git repository.

sunchao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8deb950f09c [SPARK-43150][SQL] Remove workaround for PARQUET-2160
8deb950f09c is described below

commit 8deb950f09cd4e5224ed2e727245b5386e1dafdc
Author: Cheng Pan 
AuthorDate: Fri Apr 14 21:50:37 2023 -0700

[SPARK-43150][SQL] Remove workaround for PARQUET-2160

### What changes were proposed in this pull request?

Remove workaround(SPARK-41952) for 
[PARQUET-2160](https://issues.apache.org/jira/browse/PARQUET-2160)

### Why are the changes needed?

[SPARK-42926](https://issues.apache.org/jira/browse/SPARK-42926) upgraded 
Parquet to 1.13.0, which includes 
[PARQUET-2160](https://issues.apache.org/jira/browse/PARQUET-2160). So we no 
longer need [SPARK-41952](https://issues.apache.org/jira/browse/SPARK-41952).

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass GA.

Closes #40802 from pan3793/SPARK-43150.

Authored-by: Cheng Pan 
Signed-off-by: Chao Sun 
---
 .../datasources/parquet/ParquetCodecFactory.java   | 112 -
 .../parquet/SpecificParquetRecordReaderBase.java   |   2 -
 2 files changed, 114 deletions(-)

diff --git 
a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetCodecFactory.java
 
b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetCodecFactory.java
deleted file mode 100644
index 2edbdc70da2..000
--- 
a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetCodecFactory.java
+++ /dev/null
@@ -1,112 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.sql.execution.datasources.parquet;
-
-import java.io.IOException;
-import java.io.InputStream;
-import java.nio.ByteBuffer;
-
-import org.apache.hadoop.conf.Configuration;
-import org.apache.hadoop.io.compress.CodecPool;
-import org.apache.hadoop.io.compress.CompressionCodec;
-import org.apache.hadoop.io.compress.Decompressor;
-import org.apache.parquet.bytes.BytesInput;
-import org.apache.parquet.hadoop.CodecFactory;
-import org.apache.parquet.hadoop.codec.ZstandardCodec;
-import org.apache.parquet.hadoop.metadata.CompressionCodecName;
-
-/**
- * This class implements a codec factory that is used when reading from 
Parquet. It adds a
- * workaround for memory issues encountered when reading from zstd-compressed 
files. For
- * details, see https://issues.apache.org/jira/browse/PARQUET-2160;>PARQUET-2160
- *
- * TODO: Remove this workaround after upgrading Parquet which include 
PARQUET-2160.
- */
-public class ParquetCodecFactory extends CodecFactory {
-
-  public ParquetCodecFactory(Configuration configuration, int pageSize) {
-super(configuration, pageSize);
-  }
-
-  /**
-   * Copied and modified from CodecFactory.HeapBytesDecompressor
-   */
-  @SuppressWarnings("deprecation")
-  class HeapBytesDecompressor extends BytesDecompressor {
-
-private final CompressionCodec codec;
-private final Decompressor decompressor;
-
-HeapBytesDecompressor(CompressionCodecName codecName) {
-  this.codec = getCodec(codecName);
-  if (codec != null) {
-decompressor = CodecPool.getDecompressor(codec);
-  } else {
-decompressor = null;
-  }
-}
-
-@Override
-public BytesInput decompress(BytesInput bytes, int uncompressedSize) 
throws IOException {
-  final BytesInput decompressed;
-  if (codec != null) {
-if (decompressor != null) {
-  decompressor.reset();
-}
-InputStream is = codec.createInputStream(bytes.toInputStream(), 
decompressor);
-
-if (codec instanceof ZstandardCodec) {
-  // We need to explicitly close the ZstdDecompressorStream here to 
release the resources
-  // it holds to avoid off-heap memory fragmentation issue, see 
PARQUET-2160.
-  // This change will load the decompressor stream into heap a little 
earlier, since the
-  //

[GitHub] [spark-website] srowen commented on pull request #452: ApacheSparkMéxicoCity Meetup added

2023-04-14 Thread via GitHub



srowen commented on PR #452:
URL: https://github.com/apache/spark-website/pull/452#issuecomment-1509494403

   Please change the html file too by running Jekyll or just editing then both 
here


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] srowen commented on pull request #453:  Including ApacheSparkMéxicoCity Meetup on community page 

2023-04-14 Thread via GitHub



srowen commented on PR #453:
URL: https://github.com/apache/spark-website/pull/453#issuecomment-1509494062

   Duplicate


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] srowen closed pull request #453:  Including ApacheSparkMéxicoCity Meetup on community page 

2023-04-14 Thread via GitHub



srowen closed pull request #453:  Including ApacheSparkMéxicoCity Meetup on 
community page 
URL: https://github.com/apache/spark-website/pull/453


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] xinrong-meng commented on pull request #454: Improve instructions for the release process

2023-04-14 Thread via GitHub



xinrong-meng commented on PR #454:
URL: https://github.com/apache/spark-website/pull/454#issuecomment-1509491301

   We should submit the form linked in the description. More instructions are 
given when filling out the form.
   I was able to fill it out independently without "further manual steps from 
PMC".
   Would you prefer to remove the last sentence? @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] xinrong-meng commented on pull request #454: Improve instructions for the release process

2023-04-14 Thread via GitHub



xinrong-meng commented on PR #454:
URL: https://github.com/apache/spark-website/pull/454#issuecomment-1509490447

   Would you please review it? @dongjoon-hyun @HyukjinKwon @gengliangwang 
@viirya  @ueshin @gatorsmile 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] xinrong-meng opened a new pull request, #454: Improve instructions for the release process

2023-04-14 Thread via GitHub



xinrong-meng opened a new pull request, #454:
URL: https://github.com/apache/spark-website/pull/454

   Improve instructions for the release process, specifically, the `Finalize 
the release` step.
   
   That includes
   - point out which steps have been automated
   - details to pay attention to
   - typo fixing
   - more clear steps by naming


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] JuanPabloDiaz opened a new pull request, #453:  Including ApacheSparkMéxicoCity Meetup on community page 

2023-04-14 Thread via GitHub



JuanPabloDiaz opened a new pull request, #453:
URL: https://github.com/apache/spark-website/pull/453

   While browsing the site, I find out that the site is missing Apache Spark 
México City. https://www.meetup.com/es/apache-spark-mexicocity/
   
   I And would like to include the community on the following web page 
https://spark.apache.org/community.html
   
   I change the .md and the .html community files. I hope this helps.
   
   Author: Juan Diaz 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] JuanPabloDiaz opened a new pull request, #452: ApacheSparkMéxicoCity Meetup added

2023-04-14 Thread via GitHub



JuanPabloDiaz opened a new pull request, #452:
URL: https://github.com/apache/spark-website/pull/452

    Including ApacheSparkMéxicoCity Meetup on community page 
   
   While browsing the site, I find out that the site is missing Apache Spark 
México City. https://www.meetup.com/es/apache-spark-mexicocity/
   
   I And would like to include the community on the following web page 
https://spark.apache.org/community.html
   
   I change the .md and the .html community files. I hope this helps.
   
   Author: Juan Diaz 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.3 updated: [SPARK-43050][SQL] Fix construct aggregate expressions by replacing grouping functions

2023-04-14 Thread yumwang

This is an automated email from the ASF dual-hosted git repository.

yumwang pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new cc6d3f9d7f8 [SPARK-43050][SQL] Fix construct aggregate expressions by 
replacing grouping functions
cc6d3f9d7f8 is described below

commit cc6d3f9d7f82887e0060a14045238f5fdc7506b9
Author: Yuming Wang 
AuthorDate: Sat Apr 15 09:27:22 2023 +0800

[SPARK-43050][SQL] Fix construct aggregate expressions by replacing 
grouping functions

This PR fixes construct aggregate expressions by replacing grouping 
functions if a expression is part of aggregation.
In the following example, the second `b` should also be replaced:
https://user-images.githubusercontent.com/5399861/230415618-84cd6334-690e-4b0b-867b-ccc4056226a8.png;>

Fix bug:
```
spark-sql (default)> SELECT CASE WHEN a IS NULL THEN count(b) WHEN b IS 
NULL THEN count(c) END
   > FROM grouping
   > GROUP BY GROUPING SETS (a, b, c);
[MISSING_AGGREGATION] The non-aggregating expression "b" is based on 
columns which are not participating in the GROUP BY clause.
```

No.

Unit test.

Closes #40685 from wangyum/SPARK-43050.

Authored-by: Yuming Wang 
Signed-off-by: Yuming Wang 
(cherry picked from commit 45b84cd37add1b9ce274273ad5e519e6bc1d8013)
Signed-off-by: Yuming Wang 
---
 .../spark/sql/catalyst/analysis/Analyzer.scala | 27 --
 .../resources/sql-tests/inputs/grouping_set.sql|  4 
 .../sql-tests/results/grouping_set.sql.out | 18 +++
 3 files changed, 31 insertions(+), 18 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index 881f2cc2078..4a5c0b4aa88 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -605,31 +605,22 @@ class Analyzer(override val catalogManager: 
CatalogManager)
 aggregations: Seq[NamedExpression],
 groupByAliases: Seq[Alias],
 groupingAttrs: Seq[Expression],
-gid: Attribute): Seq[NamedExpression] = aggregations.map { agg =>
-  // collect all the found AggregateExpression, so we can check an 
expression is part of
-  // any AggregateExpression or not.
-  val aggsBuffer = ArrayBuffer[Expression]()
-  // Returns whether the expression belongs to any expressions in 
`aggsBuffer` or not.
-  def isPartOfAggregation(e: Expression): Boolean = {
-aggsBuffer.exists(a => a.exists(_ eq e))
-  }
-  replaceGroupingFunc(agg, groupByExprs, gid).transformDown {
-// AggregateExpression should be computed on the unmodified value of 
its argument
-// expressions, so we should not replace any references to grouping 
expression
-// inside it.
-case e if AggregateExpression.isAggregate(e) =>
-  aggsBuffer += e
-  e
-case e if isPartOfAggregation(e) => e
+gid: Attribute): Seq[NamedExpression] = {
+  def replaceExprs(e: Expression): Expression = e match {
+case e if AggregateExpression.isAggregate(e) => e
 case e =>
   // Replace expression by expand output attribute.
   val index = groupByAliases.indexWhere(_.child.semanticEquals(e))
   if (index == -1) {
-e
+e.mapChildren(replaceExprs)
   } else {
 groupingAttrs(index)
   }
-  }.asInstanceOf[NamedExpression]
+  }
+  aggregations
+.map(replaceGroupingFunc(_, groupByExprs, gid))
+.map(replaceExprs)
+.map(_.asInstanceOf[NamedExpression])
 }
 
 /*
diff --git a/sql/core/src/test/resources/sql-tests/inputs/grouping_set.sql 
b/sql/core/src/test/resources/sql-tests/inputs/grouping_set.sql
index 4d516bdda7b..909c36c926c 100644
--- a/sql/core/src/test/resources/sql-tests/inputs/grouping_set.sql
+++ b/sql/core/src/test/resources/sql-tests/inputs/grouping_set.sql
@@ -60,3 +60,7 @@ SELECT grouping(k1), k1, k2, avg(v) FROM (VALUES 
(1,1,1),(2,2,2)) AS t(k1,k2,v)
 
 -- grouping_id function
 SELECT grouping_id(k1, k2), avg(v) from (VALUES (1,1,1),(2,2,2)) AS t(k1,k2,v) 
GROUP BY k1, k2 GROUPING SETS ((k2, k1), k1);
+
+SELECT CASE WHEN a IS NULL THEN count(b) WHEN b IS NULL THEN count(c) END
+FROM grouping
+GROUP BY GROUPING SETS (a, b, c);
diff --git a/sql/core/src/test/resources/sql-tests/results/grouping_set.sql.out 
b/sql/core/src/test/resources/sql-tests/results/grouping_set.sql.out
index d89050ab6d8..0429efddafa 100644
--- a/sql/core/src/test/resources/sql-tests/results/grouping_set.sql.out
+++

[spark] branch branch-3.4 updated: [SPARK-43050][SQL] Fix construct aggregate expressions by replacing grouping functions

2023-04-14 Thread yumwang

This is an automated email from the ASF dual-hosted git repository.

yumwang pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new b89b927af4f [SPARK-43050][SQL] Fix construct aggregate expressions by 
replacing grouping functions
b89b927af4f is described below

commit b89b927af4fe2ae35c2818d1f2f0efd879a39565
Author: Yuming Wang 
AuthorDate: Sat Apr 15 09:27:22 2023 +0800

[SPARK-43050][SQL] Fix construct aggregate expressions by replacing 
grouping functions

### What changes were proposed in this pull request?

This PR fixes construct aggregate expressions by replacing grouping 
functions if a expression is part of aggregation.
In the following example, the second `b` should also be replaced:
https://user-images.githubusercontent.com/5399861/230415618-84cd6334-690e-4b0b-867b-ccc4056226a8.png;>

### Why are the changes needed?

Fix bug:
```
spark-sql (default)> SELECT CASE WHEN a IS NULL THEN count(b) WHEN b IS 
NULL THEN count(c) END
   > FROM grouping
   > GROUP BY GROUPING SETS (a, b, c);
[MISSING_AGGREGATION] The non-aggregating expression "b" is based on 
columns which are not participating in the GROUP BY clause.
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit test.

Closes #40685 from wangyum/SPARK-43050.

Authored-by: Yuming Wang 
Signed-off-by: Yuming Wang 
(cherry picked from commit 45b84cd37add1b9ce274273ad5e519e6bc1d8013)
Signed-off-by: Yuming Wang 

# Conflicts:
#   
sql/core/src/test/resources/sql-tests/analyzer-results/grouping_set.sql.out
---
 .../spark/sql/catalyst/analysis/Analyzer.scala | 27 --
 .../resources/sql-tests/inputs/grouping_set.sql|  4 
 .../sql-tests/results/grouping_set.sql.out | 18 +++
 3 files changed, 31 insertions(+), 18 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index d9c2c0ef63b..dc8904d7f74 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -608,31 +608,22 @@ class Analyzer(override val catalogManager: 
CatalogManager) extends RuleExecutor
 aggregations: Seq[NamedExpression],
 groupByAliases: Seq[Alias],
 groupingAttrs: Seq[Expression],
-gid: Attribute): Seq[NamedExpression] = aggregations.map { agg =>
-  // collect all the found AggregateExpression, so we can check an 
expression is part of
-  // any AggregateExpression or not.
-  val aggsBuffer = ArrayBuffer[Expression]()
-  // Returns whether the expression belongs to any expressions in 
`aggsBuffer` or not.
-  def isPartOfAggregation(e: Expression): Boolean = {
-aggsBuffer.exists(a => a.exists(_ eq e))
-  }
-  replaceGroupingFunc(agg, groupByExprs, gid).transformDown {
-// AggregateExpression should be computed on the unmodified value of 
its argument
-// expressions, so we should not replace any references to grouping 
expression
-// inside it.
-case e if AggregateExpression.isAggregate(e) =>
-  aggsBuffer += e
-  e
-case e if isPartOfAggregation(e) => e
+gid: Attribute): Seq[NamedExpression] = {
+  def replaceExprs(e: Expression): Expression = e match {
+case e if AggregateExpression.isAggregate(e) => e
 case e =>
   // Replace expression by expand output attribute.
   val index = groupByAliases.indexWhere(_.child.semanticEquals(e))
   if (index == -1) {
-e
+e.mapChildren(replaceExprs)
   } else {
 groupingAttrs(index)
   }
-  }.asInstanceOf[NamedExpression]
+  }
+  aggregations
+.map(replaceGroupingFunc(_, groupByExprs, gid))
+.map(replaceExprs)
+.map(_.asInstanceOf[NamedExpression])
 }
 
 /*
diff --git a/sql/core/src/test/resources/sql-tests/inputs/grouping_set.sql 
b/sql/core/src/test/resources/sql-tests/inputs/grouping_set.sql
index 4d516bdda7b..909c36c926c 100644
--- a/sql/core/src/test/resources/sql-tests/inputs/grouping_set.sql
+++ b/sql/core/src/test/resources/sql-tests/inputs/grouping_set.sql
@@ -60,3 +60,7 @@ SELECT grouping(k1), k1, k2, avg(v) FROM (VALUES 
(1,1,1),(2,2,2)) AS t(k1,k2,v)
 
 -- grouping_id function
 SELECT grouping_id(k1, k2), avg(v) from (VALUES (1,1,1),(2,2,2)) AS t(k1,k2,v) 
GROUP BY k1, k2 GROUPING SETS ((k2, k1), k1);
+
+SELECT CASE WHEN a IS NULL THEN count(b) WHEN b IS NULL THEN count(c) END
+FROM grouping
+GROUP BY GROUPING

[spark] branch master updated (779602806e0 -> 45b84cd37ad)

2023-04-14 Thread yumwang

This is an automated email from the ASF dual-hosted git repository.

yumwang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 779602806e0 [SPARK-43095][SQL] Avoid Once strategy's idempotence is 
broken for batch: `Infer Filters`
 add 45b84cd37ad [SPARK-43050][SQL] Fix construct aggregate expressions by 
replacing grouping functions

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/analysis/Analyzer.scala | 27 --
 .../analyzer-results/grouping_set.sql.out  | 16 +
 .../resources/sql-tests/inputs/grouping_set.sql|  4 
 .../sql-tests/results/grouping_set.sql.out | 18 +++
 4 files changed, 47 insertions(+), 18 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (59ba09a3c51 -> 779602806e0)

2023-04-14 Thread yumwang

This is an automated email from the ASF dual-hosted git repository.

yumwang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 59ba09a3c51 [SPARK-42926][BUILD][SQL] Upgrade Parquet to 1.13.0
 add 779602806e0 [SPARK-43095][SQL] Avoid Once strategy's idempotence is 
broken for batch: `Infer Filters`

No new revisions were added by this update.

Summary of changes:
 .../plans/logical/QueryPlanConstraints.scala   |  8 ---
 .../InferFiltersFromConstraintsSuite.scala | 28 ++
 2 files changed, 33 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-42926][BUILD][SQL] Upgrade Parquet to 1.13.0

2023-04-14 Thread yumwang

This is an automated email from the ASF dual-hosted git repository.

yumwang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 59ba09a3c51 [SPARK-42926][BUILD][SQL] Upgrade Parquet to 1.13.0
59ba09a3c51 is described below

commit 59ba09a3c511b3f11a07138afae6dc9f15edf99d
Author: Yuming Wang 
AuthorDate: Sat Apr 15 09:15:34 2023 +0800

[SPARK-42926][BUILD][SQL] Upgrade Parquet to 1.13.0

### What changes were proposed in this pull request?

This PR upgrades Apache Parquet to 1.13.0. Apache Parquet [1.13.0 release 
notes](https://github.com/apache/parquet-mr/blob/apache-parquet-1.13.0/CHANGES.md?plain=1#L22-L78).

### Why are the changes needed?

1. This release includes 
[PARQUET-2160](https://issues.apache.org/jira/browse/PARQUET-2160). So we no 
longer need [SPARK-41952](https://issues.apache.org/jira/browse/SPARK-41952).
2. This release includes [Java Vector API 
support](https://github.com/apache/parquet-mr/blob/apache-parquet-1.13.0/README.md?plain=1#L88-L100).

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing unit test and benchmark test.

TPC-DS benchmark result:
Query | Parquet 1.13.0(first time) | Parquet 1.12.3(first time) | Parquet 
1.13.0(second time) | Parquet 1.12.3(second time) | Parquet 1.13.0(third time) 
| Parquet 1.12.3(third time)
-- | -- | -- | -- | -- | -- | --
q1.sql | 37.819 | 37.786 | 36.322 | 37.59 | 37.772 | 36.776
q2.sql | 42.132 | 41.513 | 43.189 | 42.274 | 42.859 | 42.605
q3.sql | 5.933 | 6.1 | 6.082 | 6.071 | 6.128 | 6.094
q4.sql | 335.051 | 319.173 | 322.396 | 320.977 | 324.464 | 326.822
q5.sql | 78.41 | 76.631 | 76.841 | 76.37 | 78.257 | 76.502
q6.sql | 9.006 | 9.11 | 8.737 | 8.577 | 8.729 | 9.05
q7.sql | 12.881 | 12.731 | 12.685 | 12.662 | 12.606 | 12.675
q8.sql | 10.122 | 10.092 | 10.035 | 10.853 | 10.277 | 10.841
q9.sql | 72.562 | 71.942 | 73.649 | 73.04 | 72.899 | 72.01
q10.sql | 14.127 | 13.075 | 14.276 | 13.913 | 13.281 | 13.229
q11.sql | 111.334 | 111.612 | 110.952 | 110.776 | 111.686 | 112.27
q12.sql | 3.138 | 3.854 | 3.187 | 3.613 | 3.437 | 3.306
q13.sql | 13.131 | 12.676 | 12.516 | 12.417 | 12.739 | 12.987
q14a.sql | 217.664 | 213.632 | 214.655 | 213.333 | 217.601 | 213.341
q14b.sql | 191.553 | 182.775 | 184.35 | 187.004 | 188.313 | 189.876
q15.sql | 10.308 | 10.46 | 10.304 | 9.901 | 10.175 | 10.307
q16.sql | 81.97 | 82.059 | 82.41 | 81.263 | 83.179 | 82.042
q17.sql | 28.876 | 28.905 | 30.41 | 29.573 | 29.555 | 28.837
q18.sql | 14.183 | 13.929 | 14.11 | 14.466 | 13.969 | 14.022
q19.sql | 6.611 | 7.593 | 6.652 | 6.659 | 6.446 | 6.533
q20.sql | 3.263 | 3.701 | 3.56 | 3.503 | 3.53 | 3.627
q21.sql | 2.252 | 2.188 | 2.249 | 2.128 | 2.161 | 2.252
q22.sql | 14.809 | 14.715 | 14.324 | 14.266 | 14.567 | 14.123
q23a.sql | 554.385 | 544.75 | 546.213 | 542.194 | 553.784 | 547.388
q23b.sql | 781.236 | 768.367 | 770.584 | 776.065 | 776.502 | 776.006
q24a.sql | 196.806 | 193.989 | 197.608 | 194.416 | 194.71 | 192.817
q24b.sql | 176.56 | 183.084 | 177.486 | 177.936 | 177.776 | 177.389
q25.sql | 22.323 | 22.089 | 22.665 | 22.049 | 22.248 | 22.317
q26.sql | 8.574 | 8.356 | 8.174 | 8.753 | 8.186 | 8.302
q27.sql | 9.056 | 8.252 | 8.37 | 8.319 | 8.516 | 8.38
q28.sql | 102.185 | 102.382 | 102.344 | 103.058 | 102.024 | 102.786
q29.sql | 75.655 | 75.604 | 75.217 | 75.532 | 75.835 | 76.024
q30.sql | 12.476 | 12.966 | 13.039 | 14.108 | 12.19 | 13.143
q31.sql | 26.343 | 27.632 | 26.337 | 26.791 | 26.74 | 26.098
q32.sql | 3.251 | 3.41 | 3.378 | 3.333 | 3.371 | 3.516
q33.sql | 7.143 | 6.125 | 6.85 | 6.718 | 7.067 | 6.615
q34.sql | 8.53 | 8.656 | 8.536 | 8.866 | 8.358 | 8.589
q35.sql | 35.212 | 35.571 | 35.659 | 37.631 | 36.292 | 35.603
q36.sql | 9.264 | 9.166 | 9.748 | 9.488 | 9.45 | 9.469
q37.sql | 36.368 | 35.881 | 37.023 | 36.578 | 35.823 | 36.7
q38.sql | 74.58 | 73.472 | 72.926 | 73.823 | 71.097 | 73.329
q39a.sql | 8.596 | 7.637 | 8.036 | 7.984 | 7.849 | 7.88
q39b.sql | 7.233 | 6.641 | 6.278 | 7.06 | 6.595 | 6.691
q40.sql | 17.34 | 16.558 | 16.448 | 16.864 | 16.432 | 16.413
q41.sql | 1.223 | 1.105 | 1.103 | 1.182 | 1.232 | 1.304
q42.sql | 2.464 | 2.441 | 2.554 | 2.544 | 2.314 | 2.393
q43.sql | 7.477 | 7.396 | 7.394 | 7.764 | 7.381 | 7.534
q44.sql | 30.228 | 30.516 | 30.859 | 31.057 | 30.372 | 29.008
q45.sql | 9.93 | 10.089 | 9.874 | 10.075 | 9.802 | 9.838
q46.sql | 9.544 | 9.949 | 9.503 | 9.755 | 9.395 | 9.25
q47.sql | 27.322 | 26.952 | 26.974 | 26.83 | 27.087 | 26.991
q48.sql | 14.266 | 14.39 | 14.517 | 14.684 | 14.471 | 14.61
q49.sql | 21.279 | 21.733 | 20.286 | 20.945 | 22.388 | 21.52
q50.sql | 191.416 | 194.256 |

[spark] branch master updated: [SPARK-42597][SQL][FOLLOW-UP] Only rewrite `EqualNullSafe` when the left side is non-nullable

2023-04-14 Thread yumwang

This is an automated email from the ASF dual-hosted git repository.

yumwang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e01ca3c76ed [SPARK-42597][SQL][FOLLOW-UP] Only rewrite `EqualNullSafe` 
when the left side is non-nullable
e01ca3c76ed is described below

commit e01ca3c76ed7031819237deb6ab34d13141bda8e
Author: Yuming Wang 
AuthorDate: Sat Apr 15 09:10:02 2023 +0800

[SPARK-42597][SQL][FOLLOW-UP] Only rewrite `EqualNullSafe` when the left 
side is non-nullable

### What changes were proposed in this pull request?

This PR makes it only rewrite `EqualNullSafe` when the left side is 
non-nullable when unwrapping date type to timestamp type.

### Why are the changes needed?

1. The current rewrite is incorrect for `EqualNullSafe` if left side is 
nullable.
3. Even if we fix the rewrite, it will contain `If` and can't be pushed 
down to the data source:
   ```
   EqualNullSafe(Cast(ts, DateType), date) if Cast(ts, DateType).nullable 
-> If(IsNull(ts), FalseLiteral, And(GreaterThanOrEqual(ts, Cast(date, 
TimestampType)), LessThan(ts, Cast(date + 1, TimestampType
   ```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Update existing unit test.

Closes #40743 from wangyum/SPARK-42597-dev.

Lead-authored-by: Yuming Wang 
Co-authored-by: Yuming Wang 
Signed-off-by: Yuming Wang 
---
 .../sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala  | 5 -
 .../sql/catalyst/optimizer/UnwrapCastInBinaryComparisonSuite.scala | 7 +++
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala
index 7e5242d968d..54b1dd419fb 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala
@@ -315,7 +315,10 @@ object UnwrapCastInBinaryComparison extends 
Rule[LogicalPlan] {
 GreaterThanOrEqual(fromExp, Cast(dateAddOne, fromExp.dataType, tz, 
evalMode))
   case _: GreaterThanOrEqual =>
 GreaterThanOrEqual(fromExp, Cast(date, fromExp.dataType, tz, evalMode))
-  case Equality(_, _) =>
+  case _: EqualTo =>
+And(GreaterThanOrEqual(fromExp, Cast(date, fromExp.dataType, tz, 
evalMode)),
+  LessThan(fromExp, Cast(dateAddOne, fromExp.dataType, tz, evalMode)))
+  case EqualNullSafe(left, _) if !left.nullable =>
 And(GreaterThanOrEqual(fromExp, Cast(date, fromExp.dataType, tz, 
evalMode)),
   LessThan(fromExp, Cast(dateAddOne, fromExp.dataType, tz, evalMode)))
   case _: LessThan =>
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparisonSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparisonSuite.scala
index 8ceb26b906b..0f0acb669c2 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparisonSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparisonSuite.scala
@@ -46,7 +46,7 @@ class UnwrapCastInBinaryComparisonSuite extends PlanTest with 
ExpressionEvalHelp
   val f2: BoundReference = $"b".float.canBeNull.at(1)
   val f3: BoundReference = $"c".decimal(5, 2).canBeNull.at(2)
   val f4: BoundReference = $"d".boolean.canBeNull.at(3)
-  val f5: BoundReference = $"e".timestamp.canBeNull.at(4)
+  val f5: BoundReference = $"e".timestamp.notNull.at(4)
   val f6: BoundReference = $"f".timestampNTZ.canBeNull.at(5)
 
   test("unwrap casts when literal == max") {
@@ -392,9 +392,8 @@ class UnwrapCastInBinaryComparisonSuite extends PlanTest 
with ExpressionEvalHelp
   (f5 >= castTimestamp(dateLit) && f5 < castTimestamp(dateAddOne)) ||
 (f6 >= castTimestampNTZ(dateLit) && f6 < castTimestampNTZ(dateAddOne)))
 assertEquivalent(
-  castDate(f5) <=> dateLit || castDate(f6) === dateLit,
-  (f5 >= castTimestamp(dateLit) && f5 < castTimestamp(dateAddOne)) ||
-(f6 >= castTimestampNTZ(dateLit) && f6 < castTimestampNTZ(dateAddOne)))
+  castDate(f5) <=> dateLit || castDate(f6) <=> dateLit,
+  (f5 >= castTimestamp(dateLit) && f5 < castTimestamp(dateAddOne)) || 
castDate(f6) <=> dateLit)
 assertEquivalent(
   dateLit < castDate(f5) || dateLit < castDate(f6),
   castTimestamp(dateAddOne) <= f5 || castTimestampNTZ(dateAddOne) <= f6)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For

[spark] branch master updated: [SPARK-43107][SQL] Coalesce buckets in join applied on broadcast join stream side

2023-04-14 Thread yumwang

This is an automated email from the ASF dual-hosted git repository.

yumwang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 157de2f6292 [SPARK-43107][SQL] Coalesce buckets in join applied on 
broadcast join stream side
157de2f6292 is described below

commit 157de2f6292353dc1f55e523f36b2d5d07e93389
Author: Yuming Wang 
AuthorDate: Sat Apr 15 09:07:01 2023 +0800

[SPARK-43107][SQL] Coalesce buckets in join applied on broadcast join 
stream side

### What changes were proposed in this pull request?

This PR adds support coalesce buckets in join applied on broadcast join 
stream side.

### Why are the changes needed?

Reduce shuffle to improve query performance.

Before | After
-- | --
https://user-images.githubusercontent.com/5399861/231473104-4d9bbe4e-9cfe-4473-ba17-75d09f2eb1b9.png;
 width="400" height="630"> | https://user-images.githubusercontent.com/5399861/231470141-bd15031f-facb-4c59-84f2-e60f054257d6.png;
 width="400" height="630">

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit test.

Closes #40756 from wangyum/SPARK-43107.

Authored-by: Yuming Wang 
Signed-off-by: Yuming Wang 
---
 .../bucketing/CoalesceBucketsInJoin.scala  | 14 +++-
 .../spark/sql/sources/BucketedReadSuite.scala  | 26 +-
 2 files changed, 33 insertions(+), 7 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoin.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoin.scala
index 7a31e343766..bd79aca8e64 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoin.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoin.scala
@@ -25,7 +25,7 @@ import org.apache.spark.sql.catalyst.optimizer.{BuildLeft, 
BuildRight}
 import org.apache.spark.sql.catalyst.plans.physical.{HashPartitioning, 
Partitioning}
 import org.apache.spark.sql.catalyst.rules.Rule
 import org.apache.spark.sql.execution.{FileSourceScanExec, FilterExec, 
ProjectExec, SparkPlan}
-import org.apache.spark.sql.execution.joins.{ShuffledHashJoinExec, 
ShuffledJoin, SortMergeJoinExec}
+import org.apache.spark.sql.execution.joins.{BroadcastHashJoinExec, 
BroadcastNestedLoopJoinExec, ShuffledHashJoinExec, ShuffledJoin, 
SortMergeJoinExec}
 
 /**
  * This rule coalesces one side of the `SortMergeJoin` and `ShuffledHashJoin`
@@ -42,7 +42,7 @@ object CoalesceBucketsInJoin extends Rule[SparkPlan] {
   plan: SparkPlan,
   numCoalescedBuckets: Int): SparkPlan = {
 plan transformUp {
-  case f: FileSourceScanExec =>
+  case f: FileSourceScanExec if f.relation.bucketSpec.nonEmpty =>
 f.copy(optionalNumCoalescedBuckets = Some(numCoalescedBuckets))
 }
   }
@@ -115,7 +115,11 @@ object ExtractJoinWithBuckets {
   private def hasScanOperation(plan: SparkPlan): Boolean = plan match {
 case f: FilterExec => hasScanOperation(f.child)
 case p: ProjectExec => hasScanOperation(p.child)
-case _: FileSourceScanExec => true
+case j: BroadcastHashJoinExec =>
+  if (j.buildSide == BuildLeft) hasScanOperation(j.right) else 
hasScanOperation(j.left)
+case j: BroadcastNestedLoopJoinExec =>
+  if (j.buildSide == BuildLeft) hasScanOperation(j.right) else 
hasScanOperation(j.left)
+case f: FileSourceScanExec => f.relation.bucketSpec.nonEmpty
 case _ => false
   }
 
@@ -142,9 +146,7 @@ object ExtractJoinWithBuckets {
   }
 
   private def isApplicable(j: ShuffledJoin): Boolean = {
-(j.isInstanceOf[SortMergeJoinExec] ||
-  j.isInstanceOf[ShuffledHashJoinExec]) &&
-  hasScanOperation(j.left) &&
+hasScanOperation(j.left) &&
   hasScanOperation(j.right) &&
   satisfiesOutputPartitioning(j.leftKeys, j.left.outputPartitioning) &&
   satisfiesOutputPartitioning(j.rightKeys, j.right.outputPartitioning)
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala
index a18c681e0fe..0a3824e0298 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala
@@ -1011,9 +1011,10 @@ abstract class BucketedReadSuite extends QueryTest with 
SQLTestUtils with Adapti
   }
 
   test("bucket coalescing is applied when join expressions match with 
partitioning expressions") {
-withTable("t1", "t2") {
+withTable("t1", "t2", "t3") {
   df1.write.format("parquet").bucketBy(8, "i", "j").saveAsTable("t1")
   df2.write.format("parquet").bucketBy(4, "i", "j").saveAsTable("t2")
+

[GitHub] [spark-website] dongjoon-hyun commented on a diff in pull request #448: Updated downloads.js

2023-04-14 Thread via GitHub



dongjoon-hyun commented on code in PR #448:
URL: https://github.com/apache/spark-website/pull/448#discussion_r1167332304


##
js/downloads.js:
##
@@ -1,31 +1,26 @@
-// Script for generating Spark download links
-// No dependencies; pure javascript.
-
-releases = {};
+const releases = {};
 
 function addRelease(version, releaseDate, packages, mirrored) {
   releases[version] = {
 released: releaseDate,
-packages: packages,
-mirrored: mirrored
+packages,
+mirrored
   };
 }
 
-var sources = {pretty: "Source Code", tag: "sources"};
-var hadoopFree = {pretty: "Pre-built with user-provided Apache Hadoop", tag: 
"without-hadoop"};
-var hadoop2p7 = {pretty: "Pre-built for Apache Hadoop 2.7", tag: "hadoop2.7"};
-var hadoop3p3 = {pretty: "Pre-built for Apache Hadoop 3.2 and later", tag: 
"hadoop3.2"};
-var hadoop3p3scala213 = {pretty: "Pre-built for Apache Hadoop 3.2 and later 
(Scala 2.13)", tag: "hadoop3.2-scala2.13"};
-var hadoop2p = {pretty: "Pre-built for Apache Hadoop 2.7", tag: "hadoop2"};
-var hadoop3p = {pretty: "Pre-built for Apache Hadoop 3.3 and later", tag: 
"hadoop3"};
-var hadoop3pscala213 = {pretty: "Pre-built for Apache Hadoop 3.3 and later 
(Scala 2.13)", tag: "hadoop3-scala2.13"};
-
-// 3.2.0+
-var packagesV12 = [hadoop3p3, hadoop3p3scala213, hadoop2p7, hadoopFree, 
sources];
-// 3.3.0+
-var packagesV13 = [hadoop3p, hadoop3pscala213, hadoop2p, hadoopFree, sources];
-
-addRelease("3.3.2", new Date("02/17/2023"), packagesV13, true);
+const sources = {pretty: "Source Code", tag: "sources"};
+const hadoopFree = {pretty: "Pre-built with user-provided Apache Hadoop", tag: 
"without-hadoop"};
+const hadoop2p7 = {pretty: "Pre-built for Apache Hadoop 2.7", tag: 
"hadoop2.7"};
+const hadoop3p3 = {pretty: "Pre-built for Apache Hadoop 3.2 and later", tag: 
"hadoop3.2"};
+const hadoop3p3scala213 = {pretty: "Pre-built for Apache Hadoop 3.2 and later 
(Scala 2.13)", tag: "hadoop3.2-scala2.13"};
+const hadoop2p = {pretty: "Pre-built for Apache Hadoop 2.7", tag: "hadoop2"};
+const hadoop3p = {pretty: "Pre-built for Apache Hadoop 3.3 and later", tag: 
"hadoop3"};
+const hadoop3pscala213 = {pretty: "Pre-built for Apache Hadoop 3.3 and later 
(Scala 2.13)", tag: "hadoop3-scala2.13"};
+
+const packagesV12 = [hadoop3p3, hadoop3p3scala213, hadoop2p7, hadoopFree, 
sources];
+const packagesV13 = [hadoop3p, hadoop3pscala213, hadoop2p, hadoopFree, 
sources];
+
+addRelease("3.3.3", new Date("04/14/2023"), packagesV13, true);

Review Comment:
   Ya, Apache Spark community doesn't release it yet.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] xinrong-meng commented on a diff in pull request #448: Updated downloads.js

2023-04-14 Thread via GitHub



xinrong-meng commented on code in PR #448:
URL: https://github.com/apache/spark-website/pull/448#discussion_r1167257625


##
js/downloads.js:
##
@@ -1,31 +1,26 @@
-// Script for generating Spark download links
-// No dependencies; pure javascript.
-
-releases = {};
+const releases = {};
 
 function addRelease(version, releaseDate, packages, mirrored) {
   releases[version] = {
 released: releaseDate,
-packages: packages,
-mirrored: mirrored
+packages,
+mirrored
   };
 }
 
-var sources = {pretty: "Source Code", tag: "sources"};
-var hadoopFree = {pretty: "Pre-built with user-provided Apache Hadoop", tag: 
"without-hadoop"};
-var hadoop2p7 = {pretty: "Pre-built for Apache Hadoop 2.7", tag: "hadoop2.7"};
-var hadoop3p3 = {pretty: "Pre-built for Apache Hadoop 3.2 and later", tag: 
"hadoop3.2"};
-var hadoop3p3scala213 = {pretty: "Pre-built for Apache Hadoop 3.2 and later 
(Scala 2.13)", tag: "hadoop3.2-scala2.13"};
-var hadoop2p = {pretty: "Pre-built for Apache Hadoop 2.7", tag: "hadoop2"};
-var hadoop3p = {pretty: "Pre-built for Apache Hadoop 3.3 and later", tag: 
"hadoop3"};
-var hadoop3pscala213 = {pretty: "Pre-built for Apache Hadoop 3.3 and later 
(Scala 2.13)", tag: "hadoop3-scala2.13"};
-
-// 3.2.0+
-var packagesV12 = [hadoop3p3, hadoop3p3scala213, hadoop2p7, hadoopFree, 
sources];
-// 3.3.0+
-var packagesV13 = [hadoop3p, hadoop3pscala213, hadoop2p, hadoopFree, sources];
-
-addRelease("3.3.2", new Date("02/17/2023"), packagesV13, true);
+const sources = {pretty: "Source Code", tag: "sources"};
+const hadoopFree = {pretty: "Pre-built with user-provided Apache Hadoop", tag: 
"without-hadoop"};
+const hadoop2p7 = {pretty: "Pre-built for Apache Hadoop 2.7", tag: 
"hadoop2.7"};
+const hadoop3p3 = {pretty: "Pre-built for Apache Hadoop 3.2 and later", tag: 
"hadoop3.2"};
+const hadoop3p3scala213 = {pretty: "Pre-built for Apache Hadoop 3.2 and later 
(Scala 2.13)", tag: "hadoop3.2-scala2.13"};
+const hadoop2p = {pretty: "Pre-built for Apache Hadoop 2.7", tag: "hadoop2"};
+const hadoop3p = {pretty: "Pre-built for Apache Hadoop 3.3 and later", tag: 
"hadoop3"};
+const hadoop3pscala213 = {pretty: "Pre-built for Apache Hadoop 3.3 and later 
(Scala 2.13)", tag: "hadoop3-scala2.13"};
+
+const packagesV12 = [hadoop3p3, hadoop3p3scala213, hadoop2p7, hadoopFree, 
sources];
+const packagesV13 = [hadoop3p, hadoop3pscala213, hadoop2p, hadoopFree, 
sources];
+
+addRelease("3.3.3", new Date("04/14/2023"), packagesV13, true);

Review Comment:
   I am wondering why `3.3.3` is associated with `04/14/2023`. The date should 
be the release date.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] xinrong-meng commented on pull request #443: [DO NOT MERGE] 3.4.0 API Auditing

2023-04-14 Thread via GitHub



xinrong-meng commented on PR #443:
URL: https://github.com/apache/spark-website/pull/443#issuecomment-1509222840

   Closed. Thanks @dongjoon-hyun for the reminder.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] xinrong-meng closed pull request #443: [DO NOT MERGE] 3.4.0 API Auditing

2023-04-14 Thread via GitHub



xinrong-meng closed pull request #443: [DO NOT MERGE] 3.4.0 API Auditing
URL: https://github.com/apache/spark-website/pull/443


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r61288 - in /dev/spark: v3.2.4-rc1-docs/ v3.4.0-rc7-docs/

2023-04-14 Thread xinrong

Author: xinrong
Date: Fri Apr 14 20:31:17 2023
New Revision: 61288

Log:
Removing RC artifacts.

Removed:
dev/spark/v3.2.4-rc1-docs/
dev/spark/v3.4.0-rc7-docs/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] xinrong-meng commented on pull request #451: Fix the download page of Spark 3.4.0

2023-04-14 Thread via GitHub



xinrong-meng commented on PR #451:
URL: https://github.com/apache/spark-website/pull/451#issuecomment-1509172623

   Merged to asf-site branch, thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark-website] branch asf-site updated: Fix the download page of Spark 3.4.0

2023-04-14 Thread xinrong

This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 624de69568 Fix the download page of Spark 3.4.0
624de69568 is described below

commit 624de69568e5c743206a63cfc49d8647e41e1167
Author: Gengliang Wang 
AuthorDate: Fri Apr 14 13:03:59 2023 -0700

Fix the download page of Spark 3.4.0


Currently it shows 3.3.2 on top
https://user-images.githubusercontent.com/1097932/232143660-9d97a7c0-5eb0-44af-9f06-41cb6386a2dd.png;>

After fix:
https://user-images.githubusercontent.com/1097932/232143685-ee5b06e6-3af9-43ea-8690-209f4d8cd25f.png;>

Author: Gengliang Wang 

Closes #451 from gengliangwang/fixDownload.
---
 js/downloads.js  | 2 +-
 site/js/downloads.js | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/js/downloads.js b/js/downloads.js
index 915b9c8809..9781273310 100644
--- a/js/downloads.js
+++ b/js/downloads.js
@@ -25,9 +25,9 @@ var packagesV12 = [hadoop3p3, hadoop3p3scala213, hadoop2p7, 
hadoopFree, sources]
 // 3.3.0+
 var packagesV13 = [hadoop3p, hadoop3pscala213, hadoop2p, hadoopFree, sources];
 
+addRelease("3.4.0", new Date("04/13/2023"), packagesV13, true);
 addRelease("3.3.2", new Date("02/17/2023"), packagesV13, true);
 addRelease("3.2.4", new Date("04/13/2023"), packagesV12, true);
-addRelease("3.4.0", new Date("04/13/2023"), packagesV13, true);
 
 function append(el, contents) {
   el.innerHTML += contents;
diff --git a/site/js/downloads.js b/site/js/downloads.js
index 915b9c8809..9781273310 100644
--- a/site/js/downloads.js
+++ b/site/js/downloads.js
@@ -25,9 +25,9 @@ var packagesV12 = [hadoop3p3, hadoop3p3scala213, hadoop2p7, 
hadoopFree, sources]
 // 3.3.0+
 var packagesV13 = [hadoop3p, hadoop3pscala213, hadoop2p, hadoopFree, sources];
 
+addRelease("3.4.0", new Date("04/13/2023"), packagesV13, true);
 addRelease("3.3.2", new Date("02/17/2023"), packagesV13, true);
 addRelease("3.2.4", new Date("04/13/2023"), packagesV12, true);
-addRelease("3.4.0", new Date("04/13/2023"), packagesV13, true);
 
 function append(el, contents) {
   el.innerHTML += contents;


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] xinrong-meng closed pull request #451: Fix the download page of Spark 3.4.0

2023-04-14 Thread via GitHub



xinrong-meng closed pull request #451: Fix the download page of Spark 3.4.0
URL: https://github.com/apache/spark-website/pull/451


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] gengliangwang opened a new pull request, #451: Fix the download page of Spark 3.4.0

2023-04-14 Thread via GitHub



gengliangwang opened a new pull request, #451:
URL: https://github.com/apache/spark-website/pull/451

   
   Currently it shows 3.3.2 on top
   https://user-images.githubusercontent.com/1097932/232143660-9d97a7c0-5eb0-44af-9f06-41cb6386a2dd.png;>
   
   After fix:
   https://user-images.githubusercontent.com/1097932/232143685-ee5b06e6-3af9-43ea-8690-209f4d8cd25f.png;>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] xinrong-meng closed pull request #450: Fix the API docs of Spark 3.4.0

2023-04-14 Thread via GitHub



xinrong-meng closed pull request #450: Fix the API docs of Spark 3.4.0
URL: https://github.com/apache/spark-website/pull/450


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] xinrong-meng commented on pull request #450: Fix the API docs of Spark 3.4.0

2023-04-14 Thread via GitHub



xinrong-meng commented on PR #450:
URL: https://github.com/apache/spark-website/pull/450#issuecomment-1509115348

   Merged to asf-site branch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] xinrong-meng commented on pull request #450: Fix the API docs of Spark 3.4.0

2023-04-14 Thread via GitHub



xinrong-meng commented on PR #450:
URL: https://github.com/apache/spark-website/pull/450#issuecomment-1509111397

   Thank you Gengliang! I'll clean up the staging after the PR is in.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] gengliangwang opened a new pull request, #450: Fix the API docs of Spark 3.4.0

2023-04-14 Thread via GitHub



gengliangwang opened a new pull request, #450:
URL: https://github.com/apache/spark-website/pull/450

   
   Currently the API docs of Spark 3.4.0 are wrong. This PR is to use the doc 
from https://dist.apache.org/repos/dist/dev/spark/v3.4.0-rc7-docs/ to overwrite 
them 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r61281 - in /dev/spark/v3.4.0-rc7-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/R/articles/ _site/api/R/deps/ _site/api/R/deps/bootstrap-5.2.2/ _site/api/R/deps/jquery-3.6.0/ _site/api

2023-04-14 Thread xinrong

Author: xinrong
Date: Fri Apr 14 18:58:10 2023
New Revision: 61281

Log:
Apache Spark v3.4.0-rc7 docs


[This commit notification would consist of 2789 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] dongjoon-hyun commented on pull request #447: Add 3.4.0 release note and news and update links

2023-04-14 Thread via GitHub



dongjoon-hyun commented on PR #447:
URL: https://github.com/apache/spark-website/pull/447#issuecomment-1508935369

   +1, late LGTM. Thank you, all!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] dongjoon-hyun commented on pull request #449: Fix a typo from release notes in Spark 3.4

2023-04-14 Thread via GitHub



dongjoon-hyun commented on PR #449:
URL: https://github.com/apache/spark-website/pull/449#issuecomment-1508935110

   Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] xinrong-meng commented on pull request #447: Add 3.4.0 release note and news and update links

2023-04-14 Thread via GitHub



xinrong-meng commented on PR #447:
URL: https://github.com/apache/spark-website/pull/447#issuecomment-1508926338

   Thank you all! Thanks @HyukjinKwon for the fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-43064][SQL] Spark SQL CLI SQL tab should only show once statement once

2023-04-14 Thread sunchao

This is an automated email from the ASF dual-hosted git repository.

sunchao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4708adc9185 [SPARK-43064][SQL] Spark SQL CLI SQL tab should only show 
once statement once
4708adc9185 is described below

commit 4708adc91856d7665e54e99545ecd5249ce817f7
Author: Angerszh 
AuthorDate: Fri Apr 14 08:43:42 2023 -0700

[SPARK-43064][SQL] Spark SQL CLI SQL tab should only show once statement 
once

### What changes were proposed in this pull request?
Before
https://user-images.githubusercontent.com/46485123/230573688-42acb9f2-6fa0-48d0-bfde-c7ceeb306aef.png;>

After
https://user-images.githubusercontent.com/46485123/230573720-2c2a7731-d776-439c-ba6f-0dad9dc87a42.png;>

### Why are the changes needed?
Don't need show twice, too weird

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
MT

Closes #40701 from AngersZh/SPARK-43064.

Authored-by: Angerszh 
Signed-off-by: Chao Sun 
---
 .../apache/spark/sql/hive/thriftserver/SparkSQLDriver.scala  | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLDriver.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLDriver.scala
index 18a0e57a2d3..8ae65a58608 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLDriver.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLDriver.scala
@@ -29,6 +29,7 @@ import 
org.apache.hadoop.hive.ql.processors.CommandProcessorResponse
 import org.apache.spark.SparkThrowable
 import org.apache.spark.internal.Logging
 import org.apache.spark.sql.SQLContext
+import org.apache.spark.sql.catalyst.plans.logical.CommandResult
 import org.apache.spark.sql.execution.{QueryExecution, SQLExecution}
 import org.apache.spark.sql.execution.HiveResult.hiveResultString
 import org.apache.spark.sql.internal.{SQLConf, VariableSubstitution}
@@ -65,8 +66,15 @@ private[hive] class SparkSQLDriver(val context: SQLContext = 
SparkSQLEnv.sqlCont
   }
   context.sparkContext.setJobDescription(substitutorCommand)
   val execution = 
context.sessionState.executePlan(context.sql(command).logicalPlan)
-  hiveResponse = SQLExecution.withNewExecutionId(execution, Some("cli")) {
-hiveResultString(execution.executedPlan)
+  // The SQL command has been executed above via `executePlan`, therefore 
we don't need to
+  // wrap it again with a new execution ID when getting Hive result.
+  execution.logical match {
+case _: CommandResult =>
+  hiveResponse = hiveResultString(execution.executedPlan)
+case _ =>
+  hiveResponse = SQLExecution.withNewExecutionId(execution, 
Some("cli")) {
+hiveResultString(execution.executedPlan)
+  }
   }
   tableSchema = getResultSetSchema(execution)
   new CommandProcessorResponse(0)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-43104][BUILD] Set `shadeTestJar` of `protobuf` module to `false`

2023-04-14 Thread sunchao

This is an automated email from the ASF dual-hosted git repository.

sunchao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 70094dac1eb [SPARK-43104][BUILD] Set `shadeTestJar` of `protobuf` 
module to `false`
70094dac1eb is described below

commit 70094dac1ebc2b755f76d945ea669b6f7de170dd
Author: yangjie01 
AuthorDate: Fri Apr 14 08:41:36 2023 -0700

[SPARK-43104][BUILD] Set `shadeTestJar` of `protobuf` module to `false`

### What changes were proposed in this pull request?
This pr aims to set `shadeTestJar` of `protobuf` module to `false` to skip 
shade `spark-protobuf_**-tests.jar` process.

### Why are the changes needed?

When `shadeTestJar` is true,  `maven-shade-plugin` always try to 
find(sometime is downloading) some non-existent jars:

```
Downloading from central: 
https://repo.maven.apache.org/maven2/org/apache/hadoop/hadoop-client-api/3.3.5/hadoop-client-api-3.3.5-tests.jar
[WARNING] Could not get tests for 
org.apache.hadoop:hadoop-client-api:jar:3.3.5:compile
Downloading from central: 
https://repo.maven.apache.org/maven2/org/tukaani/xz/1.9/xz-1.9-tests.jar
[WARNING] Could not get tests for org.tukaani:xz:jar:1.9:compile
Downloading from gcs-maven-central-mirror: 
https://maven-central.storage-download.googleapis.com/maven2/org/slf4j/slf4j-api/2.0.7/slf4j-api-2.0.7-tests.jar
Downloaded from gcs-maven-central-mirror: 
https://maven-central.storage-download.googleapis.com/maven2/org/slf4j/slf4j-api/2.0.7/slf4j-api-2.0.7-tests.jar
 (0 B at 0 B/s)
[WARNING] Could not get tests for 
com.google.protobuf:protobuf-java:jar:3.22.0:compile
Downloading from central: 
https://repo.maven.apache.org/maven2/org/apache/curator/curator-framework/2.13.0/curator-framework-2.13.0-tests.jar
[WARNING] Could not get tests for 
org.apache.curator:curator-framework:jar:2.13.0:compile
Downloading from central: 
https://repo.maven.apache.org/maven2/org/apache/logging/log4j/log4j-slf4j2-impl/2.20.0/log4j-slf4j2-impl-2.20.0-tests.jar
[WARNING] Could not get tests for 
org.apache.logging.log4j:log4j-slf4j2-impl:jar:2.20.0:compile
Downloading from central: 
https://repo.maven.apache.org/maven2/org/jetbrains/annotations/17.0.0/annotations-17.0.0-tests.jar
[WARNING] Could not get tests for 
org.jetbrains:annotations:jar:17.0.0:compile
Downloading from central: 
https://repo.maven.apache.org/maven2/org/apache/logging/log4j/log4j-1.2-api/2.20.0/log4j-1.2-api-2.20.0-tests.jar
[WARNING] Could not get tests for 
org.apache.logging.log4j:log4j-1.2-api:jar:2.20.0:compile
Downloading from central: 
https://repo.maven.apache.org/maven2/org/threeten/threeten-extra/1.7.1/threeten-extra-1.7.1-tests.jar
[WARNING] Could not get tests for 
org.threeten:threeten-extra:jar:1.7.1:compile
Downloading from central: 
https://repo.maven.apache.org/maven2/org/apache/yetus/audience-annotations/0.13.0/audience-annotations-0.13.0-tests.jar
[WARNING] Could not get tests for 
org.apache.yetus:audience-annotations:jar:0.13.0:compile
Downloading from central: 
https://repo.maven.apache.org/maven2/com/fasterxml/jackson/core/jackson-databind/2.14.2/jackson-databind-2.14.2-tests.jar
[WARNING] Could not get tests for 
com.fasterxml.jackson.core:jackson-databind:jar:2.14.2:compile
Downloading from central: 
https://repo.maven.apache.org/maven2/com/google/code/findbugs/jsr305/3.0.0/jsr305-3.0.0-tests.jar
[WARNING] Could not get tests for 
com.google.code.findbugs:jsr305:jar:3.0.0:runtime
Downloading from central: 
https://repo.maven.apache.org/maven2/org/xerial/snappy/snappy-java/1.1.9.1/snappy-java-1.1.9.1-tests.jar
[WARNING] Could not get tests for 
org.xerial.snappy:snappy-java:jar:1.1.9.1:compile
Downloading from central: 
https://repo.maven.apache.org/maven2/javax/activation/activation/1.1.1/activation-1.1.1-tests.jar
[WARNING] Could not get tests for 
javax.activation:activation:jar:1.1.1:compile
Downloading from central: 
https://repo.maven.apache.org/maven2/org/apache/hive/hive-storage-api/2.8.1/hive-storage-api-2.8.1-tests.jar
[WARNING] Could not get tests for 
org.apache.hive:hive-storage-api:jar:2.8.1:compile
Downloading from central: 
https://repo.maven.apache.org/maven2/org/spark-project/spark/unused/1.0.0/unused-1.0.0-tests.jar
[WARNING] Could not get tests for 
org.spark-project.spark:unused:jar:1.0.0:compile
Downloading from central: 
https://repo.maven.apache.org/maven2/org/scala-lang/scala-library/2.12.17/scala-library-2.12.17-tests.jar
[WARNING] Could not get tests for 
org.scala-lang:scala-library:jar:2.12.17:compile
Downloading from central: 
https://repo.maven.apache.org/maven2/com/github/luben/zstd-jni/1.5.4-2/zstd-jni-1.5.4-2-tests.jar
[WARNING] Could not get tests for 
com.github.luben:zstd-jni:jar:1.5.4-2:compile
Downloading from

[GitHub] [spark-website] PasunuriSrinidhi commented on pull request #448: Updated downloads.js

2023-04-14 Thread via GitHub



PasunuriSrinidhi commented on PR #448:
URL: https://github.com/apache/spark-website/pull/448#issuecomment-1508533792

   > What changed here, mostly adding const? does it do anything? You need to 
run jekyll to update the copy in site/ Maybe, just don't know if this matters.
   
   I converted normal function to es6 (Arrow function)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] srowen commented on pull request #448: Updated downloads.js

2023-04-14 Thread via GitHub



srowen commented on PR #448:
URL: https://github.com/apache/spark-website/pull/448#issuecomment-1508503919

   What changed here, mostly adding const? does it do anything?
   You need to run jekyll to update the copy in site/
   Maybe, just don't know if this matters.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] HyukjinKwon closed pull request #449: Fix a typo from release notes in Spark 3.4

2023-04-14 Thread via GitHub



HyukjinKwon closed pull request #449: Fix a typo from release notes in Spark 3.4
URL: https://github.com/apache/spark-website/pull/449


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark-website] branch asf-site updated: Fix a typo from release notes in Spark 3.4

2023-04-14 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new ff55a784fa Fix a typo from release notes in Spark 3.4
ff55a784fa is described below

commit ff55a784fad231b34ae17293859f8d27a80dd10e
Author: Hyukjin Kwon 
AuthorDate: Fri Apr 14 18:42:42 2023 +0900

Fix a typo from release notes in Spark 3.4

This PR fixes "This will become a table of contents (this text will be 
scraped)." in the release notes that is a mistake.

Author: Hyukjin Kwon 

Closes #449 from HyukjinKwon/minor-typo.
---
 releases/_posts/2023-04-13-spark-release-3-4-0.md | 2 --
 site/releases/spark-release-3-4-0.html| 4 
 2 files changed, 6 deletions(-)

diff --git a/releases/_posts/2023-04-13-spark-release-3-4-0.md 
b/releases/_posts/2023-04-13-spark-release-3-4-0.md
index 42435b5064..840c189f0e 100644
--- a/releases/_posts/2023-04-13-spark-release-3-4-0.md
+++ b/releases/_posts/2023-04-13-spark-release-3-4-0.md
@@ -17,8 +17,6 @@ This release introduces Python client for Spark Connect, 
augments Structured Str
 
 To download Apache Spark 3.4.0, visit the 
[downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA 
for the [detailed changes](https://s.apache.org/spark-3.4.0). We have curated a 
list of high level changes here, grouped by major modules.
 
-* This will become a table of contents (this text will be scraped).
-  {:toc}
 
 ### Highlight
 
diff --git a/site/releases/spark-release-3-4-0.html 
b/site/releases/spark-release-3-4-0.html
index 93d6688464..c3191a9016 100644
--- a/site/releases/spark-release-3-4-0.html
+++ b/site/releases/spark-release-3-4-0.html
@@ -131,10 +131,6 @@
 
 To download Apache Spark 3.4.0, visit the https://spark.apache.org/downloads.html;>downloads page. You can 
consult JIRA for the https://s.apache.org/spark-3.4.0;>detailed 
changes. We have curated a list of high level changes here, grouped by 
major modules.
 
-
-  This will become a table of contents (this text will be scraped).
-
-
 Highlight
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] HyukjinKwon commented on pull request #449: Fix a typo from release notes in Spark 3.4

2023-04-14 Thread via GitHub



HyukjinKwon commented on PR #449:
URL: https://github.com/apache/spark-website/pull/449#issuecomment-1508240343

   Merged to asf-site branch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-43123][SQL] Internal field metadata should not be leaked to catalogs

2023-04-14 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4c938d62d79 [SPARK-43123][SQL] Internal field metadata should not be 
leaked to catalogs
4c938d62d79 is described below

commit 4c938d62d791742b9f0c6a77b66fc06a90d7c0ad
Author: Wenchen Fan 
AuthorDate: Fri Apr 14 17:08:29 2023 +0800

[SPARK-43123][SQL] Internal field metadata should not be leaked to catalogs

### What changes were proposed in this pull request?

In Spark, we have defined some internal field metadata to help query 
resolution and compilation. For example, there are quite some field metadata 
that are related to metadata columns.

However, when we create tables, these internal field metadata can be 
leaked. This PR updates CTAS/RTAS commands to remove these internal field 
metadata before creating tables. CREATE/REPLACE TABLE command is fine as users 
can't generate these internal field metadata via the type string.

### Why are the changes needed?

to avoid potential issues, like mistakenly treating a data column as 
metadata column

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

new test

Closes #40776 from cloud-fan/meta.

Authored-by: Wenchen Fan 
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/catalyst/analysis/Analyzer.scala |  4 +--
 .../apache/spark/sql/catalyst/util/package.scala   | 34 ++
 .../execution/command/createDataSourceTables.scala |  5 +--
 .../datasources/v2/WriteToDataSourceV2Exec.scala   | 42 +++---
 .../spark/sql/connector/MetadataColumnSuite.scala  | 16 +
 5 files changed, 69 insertions(+), 32 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index 126b0618727..74592c15d23 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -41,7 +41,7 @@ import 
org.apache.spark.sql.catalyst.streaming.StreamingRelationV2
 import org.apache.spark.sql.catalyst.trees.AlwaysProcess
 import org.apache.spark.sql.catalyst.trees.CurrentOrigin.withOrigin
 import org.apache.spark.sql.catalyst.trees.TreePattern._
-import org.apache.spark.sql.catalyst.util.{toPrettySQL, CharVarcharUtils, 
StringUtils}
+import org.apache.spark.sql.catalyst.util.{toPrettySQL, AUTO_GENERATED_ALIAS, 
CharVarcharUtils, StringUtils}
 import org.apache.spark.sql.catalyst.util.ResolveDefaultColumns._
 import org.apache.spark.sql.connector.catalog.{View => _, _}
 import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._
@@ -492,7 +492,7 @@ class Analyzer(override val catalogManager: CatalogManager) 
extends RuleExecutor
 case l: Literal => Alias(l, toPrettySQL(l))()
 case e =>
   val metaForAutoGeneratedAlias = new MetadataBuilder()
-.putString("__autoGeneratedAlias", "true")
+.putString(AUTO_GENERATED_ALIAS, "true")
 .build()
   Alias(e, toPrettySQL(e))(explicitMetadata = 
Some(metaForAutoGeneratedAlias))
   }
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala
index e1ce45c1353..23ddb534af9 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala
@@ -27,7 +27,7 @@ import com.google.common.io.ByteStreams
 import org.apache.spark.internal.Logging
 import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.internal.SQLConf
-import org.apache.spark.sql.types.{MetadataBuilder, NumericType, StringType}
+import org.apache.spark.sql.types.{MetadataBuilder, NumericType, StringType, 
StructType}
 import org.apache.spark.unsafe.types.UTF8String
 import org.apache.spark.util.Utils
 
@@ -191,12 +191,13 @@ package object util extends Logging {
 
   val METADATA_COL_ATTR_KEY = "__metadata_col"
 
+  /**
+   * If set, this metadata column can only be accessed with qualifiers, e.g. 
`qualifiers.col` or
+   * `qualifiers.*`. If not set, metadata columns cannot be accessed via star.
+   */
+  val QUALIFIED_ACCESS_ONLY = "__qualified_access_only"
+
   implicit class MetadataColumnHelper(attr: Attribute) {
-/**
- * If set, this metadata column can only be accessed with qualifiers, e.g. 
`qualifiers.col` or
- * `qualifiers.*`. If not set, metadata columns cannot be accessed via 
star.
- */
-val QUALIFIED_ACCESS_ONLY = "__qualified_access_only"
 
 def

[GitHub] [spark-website] bjornjorgensen commented on pull request #449: Fix a typo from release notes in Spark 3.4

2023-04-14 Thread via GitHub



bjornjorgensen commented on PR #449:
URL: https://github.com/apache/spark-website/pull/449#issuecomment-1508183332

   Thank you. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] HyukjinKwon commented on pull request #447: Add 3.4.0 release note and news and update links

2023-04-14 Thread via GitHub



HyukjinKwon commented on PR #447:
URL: https://github.com/apache/spark-website/pull/447#issuecomment-1508167653

   https://github.com/apache/spark-website/pull/449


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] HyukjinKwon opened a new pull request, #449: Fix a typo from release notes in Spark 3.4

2023-04-14 Thread via GitHub



HyukjinKwon opened a new pull request, #449:
URL: https://github.com/apache/spark-website/pull/449

   This PR fixes "This will become a table of contents (this text will be 
scraped)." in the release notes that is a mistake.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] HyukjinKwon commented on pull request #447: Add 3.4.0 release note and news and update links

2023-04-14 Thread via GitHub



HyukjinKwon commented on PR #447:
URL: https://github.com/apache/spark-website/pull/447#issuecomment-1508152159

   uhoh, sure let me make a quick fix then.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] bjornjorgensen commented on pull request #447: Add 3.4.0 release note and news and update links

2023-04-14 Thread via GitHub



bjornjorgensen commented on PR #447:
URL: https://github.com/apache/spark-website/pull/447#issuecomment-1508150253

   @HyukjinKwon If you have a working install of Bundler 
   this is the line 
https://github.com/apache/spark-website/blob/74099512789a0f4099716b5e6cf0882eed6f5d13/releases/_posts/2023-04-13-spark-release-3-4-0.md?plain=1#L20
  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] bjornjorgensen commented on pull request #447: Add 3.4.0 release note and news and update links

2023-04-14 Thread via GitHub



bjornjorgensen commented on PR #447:
URL: https://github.com/apache/spark-website/pull/447#issuecomment-1508130455

   @HyukjinKwon I have some problems with 
   `
   bundle install
   Bundler 2.4.1 is running, but your lockfile was generated with 2.3.7. 
Installing Bundler 2.3.7 and restarting using that version.
   Fetching gem metadata from https://rubygems.org/.
   `
   install failed! 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] HyukjinKwon commented on pull request #447: Add 3.4.0 release note and news and update links

2023-04-14 Thread via GitHub



HyukjinKwon commented on PR #447:
URL: https://github.com/apache/spark-website/pull/447#issuecomment-1508127146

   @bjornjorgensen are you interested in making a PR please?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] bjornjorgensen commented on pull request #447: Add 3.4.0 release note and news and update links

2023-04-14 Thread via GitHub



bjornjorgensen commented on PR #447:
URL: https://github.com/apache/spark-website/pull/447#issuecomment-1508111219

   We must remove "This will become a table of contents (this text will be 
scraped)." from https://spark.apache.org/releases/spark-release-3-4-0.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] PasunuriSrinidhi opened a new pull request, #448: Updated downloads.js

2023-04-14 Thread via GitHub



PasunuriSrinidhi opened a new pull request, #448:
URL: https://github.com/apache/spark-website/pull/448

   This will be the more advanced version and I think that the code provided 
looks well-structured and readable.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] HyukjinKwon closed pull request #447: Add 3.4.0 release note and news and update links

2023-04-14 Thread via GitHub



HyukjinKwon closed pull request #447: Add 3.4.0 release note and news and 
update links
URL: https://github.com/apache/spark-website/pull/447


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] HyukjinKwon commented on pull request #447: Add 3.4.0 release note and news and update links

2023-04-14 Thread via GitHub



HyukjinKwon commented on PR #447:
URL: https://github.com/apache/spark-website/pull/447#issuecomment-1508034288

   Thanks all.
   
   Merged to asf-site branch. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] HyukjinKwon commented on pull request #447: Add 3.4.0 release note and news and update links

2023-04-14 Thread via GitHub



HyukjinKwon commented on PR #447:
URL: https://github.com/apache/spark-website/pull/447#issuecomment-1508021891

   I fixed and force-pushed it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] yaooqinn commented on pull request #446: Add v3.4.0 docs

2023-04-14 Thread via GitHub



yaooqinn commented on PR #446:
URL: https://github.com/apache/spark-website/pull/446#issuecomment-1507967472

   late LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] yaooqinn commented on pull request #444: Add v3.2.4 docs

2023-04-14 Thread via GitHub



yaooqinn commented on PR #444:
URL: https://github.com/apache/spark-website/pull/444#issuecomment-1507966805

   Thank you @dongjoon-hyun， late +1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

55 matches

Mail list logo