[spark] branch master updated: [SPARK-43150][SQL] Remove workaround for PARQUET-2160
This is an automated email from the ASF dual-hosted git repository. sunchao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 8deb950f09c [SPARK-43150][SQL] Remove workaround for PARQUET-2160 8deb950f09c is described below commit 8deb950f09cd4e5224ed2e727245b5386e1dafdc Author: Cheng Pan AuthorDate: Fri Apr 14 21:50:37 2023 -0700 [SPARK-43150][SQL] Remove workaround for PARQUET-2160 ### What changes were proposed in this pull request? Remove workaround(SPARK-41952) for [PARQUET-2160](https://issues.apache.org/jira/browse/PARQUET-2160) ### Why are the changes needed? [SPARK-42926](https://issues.apache.org/jira/browse/SPARK-42926) upgraded Parquet to 1.13.0, which includes [PARQUET-2160](https://issues.apache.org/jira/browse/PARQUET-2160). So we no longer need [SPARK-41952](https://issues.apache.org/jira/browse/SPARK-41952). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. Closes #40802 from pan3793/SPARK-43150. Authored-by: Cheng Pan Signed-off-by: Chao Sun --- .../datasources/parquet/ParquetCodecFactory.java | 112 - .../parquet/SpecificParquetRecordReaderBase.java | 2 - 2 files changed, 114 deletions(-) diff --git a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetCodecFactory.java b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetCodecFactory.java deleted file mode 100644 index 2edbdc70da2..000 --- a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetCodecFactory.java +++ /dev/null @@ -1,112 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the "License"); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * - *http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.spark.sql.execution.datasources.parquet; - -import java.io.IOException; -import java.io.InputStream; -import java.nio.ByteBuffer; - -import org.apache.hadoop.conf.Configuration; -import org.apache.hadoop.io.compress.CodecPool; -import org.apache.hadoop.io.compress.CompressionCodec; -import org.apache.hadoop.io.compress.Decompressor; -import org.apache.parquet.bytes.BytesInput; -import org.apache.parquet.hadoop.CodecFactory; -import org.apache.parquet.hadoop.codec.ZstandardCodec; -import org.apache.parquet.hadoop.metadata.CompressionCodecName; - -/** - * This class implements a codec factory that is used when reading from Parquet. It adds a - * workaround for memory issues encountered when reading from zstd-compressed files. For - * details, see https://issues.apache.org/jira/browse/PARQUET-2160;>PARQUET-2160 - * - * TODO: Remove this workaround after upgrading Parquet which include PARQUET-2160. - */ -public class ParquetCodecFactory extends CodecFactory { - - public ParquetCodecFactory(Configuration configuration, int pageSize) { -super(configuration, pageSize); - } - - /** - * Copied and modified from CodecFactory.HeapBytesDecompressor - */ - @SuppressWarnings("deprecation") - class HeapBytesDecompressor extends BytesDecompressor { - -private final CompressionCodec codec; -private final Decompressor decompressor; - -HeapBytesDecompressor(CompressionCodecName codecName) { - this.codec = getCodec(codecName); - if (codec != null) { -decompressor = CodecPool.getDecompressor(codec); - } else { -decompressor = null; - } -} - -@Override -public BytesInput decompress(BytesInput bytes, int uncompressedSize) throws IOException { - final BytesInput decompressed; - if (codec != null) { -if (decompressor != null) { - decompressor.reset(); -} -InputStream is = codec.createInputStream(bytes.toInputStream(), decompressor); - -if (codec instanceof ZstandardCodec) { - // We need to explicitly close the ZstdDecompressorStream here to release the resources - // it holds to avoid off-heap memory fragmentation issue, see PARQUET-2160. - // This change will load the decompressor stream into heap a little earlier, since the - //
[GitHub] [spark-website] srowen commented on pull request #452: ApacheSparkMéxicoCity Meetup added
srowen commented on PR #452: URL: https://github.com/apache/spark-website/pull/452#issuecomment-1509494403 Please change the html file too by running Jekyll or just editing then both here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] srowen commented on pull request #453: Including ApacheSparkMéxicoCity Meetup on community page
srowen commented on PR #453: URL: https://github.com/apache/spark-website/pull/453#issuecomment-1509494062 Duplicate -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] srowen closed pull request #453: Including ApacheSparkMéxicoCity Meetup on community page
srowen closed pull request #453: Including ApacheSparkMéxicoCity Meetup on community page URL: https://github.com/apache/spark-website/pull/453 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] xinrong-meng commented on pull request #454: Improve instructions for the release process
xinrong-meng commented on PR #454: URL: https://github.com/apache/spark-website/pull/454#issuecomment-1509491301 We should submit the form linked in the description. More instructions are given when filling out the form. I was able to fill it out independently without "further manual steps from PMC". Would you prefer to remove the last sentence? @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] xinrong-meng commented on pull request #454: Improve instructions for the release process
xinrong-meng commented on PR #454: URL: https://github.com/apache/spark-website/pull/454#issuecomment-1509490447 Would you please review it? @dongjoon-hyun @HyukjinKwon @gengliangwang @viirya @ueshin @gatorsmile -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] xinrong-meng opened a new pull request, #454: Improve instructions for the release process
xinrong-meng opened a new pull request, #454: URL: https://github.com/apache/spark-website/pull/454 Improve instructions for the release process, specifically, the `Finalize the release` step. That includes - point out which steps have been automated - details to pay attention to - typo fixing - more clear steps by naming -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] JuanPabloDiaz opened a new pull request, #453: Including ApacheSparkMéxicoCity Meetup on community page
JuanPabloDiaz opened a new pull request, #453: URL: https://github.com/apache/spark-website/pull/453 While browsing the site, I find out that the site is missing Apache Spark México City. https://www.meetup.com/es/apache-spark-mexicocity/ I And would like to include the community on the following web page https://spark.apache.org/community.html I change the .md and the .html community files. I hope this helps. Author: Juan Diaz -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] JuanPabloDiaz opened a new pull request, #452: ApacheSparkMéxicoCity Meetup added
JuanPabloDiaz opened a new pull request, #452: URL: https://github.com/apache/spark-website/pull/452 Including ApacheSparkMéxicoCity Meetup on community page While browsing the site, I find out that the site is missing Apache Spark México City. https://www.meetup.com/es/apache-spark-mexicocity/ I And would like to include the community on the following web page https://spark.apache.org/community.html I change the .md and the .html community files. I hope this helps. Author: Juan Diaz -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.3 updated: [SPARK-43050][SQL] Fix construct aggregate expressions by replacing grouping functions
This is an automated email from the ASF dual-hosted git repository. yumwang pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new cc6d3f9d7f8 [SPARK-43050][SQL] Fix construct aggregate expressions by replacing grouping functions cc6d3f9d7f8 is described below commit cc6d3f9d7f82887e0060a14045238f5fdc7506b9 Author: Yuming Wang AuthorDate: Sat Apr 15 09:27:22 2023 +0800 [SPARK-43050][SQL] Fix construct aggregate expressions by replacing grouping functions This PR fixes construct aggregate expressions by replacing grouping functions if a expression is part of aggregation. In the following example, the second `b` should also be replaced: https://user-images.githubusercontent.com/5399861/230415618-84cd6334-690e-4b0b-867b-ccc4056226a8.png;> Fix bug: ``` spark-sql (default)> SELECT CASE WHEN a IS NULL THEN count(b) WHEN b IS NULL THEN count(c) END > FROM grouping > GROUP BY GROUPING SETS (a, b, c); [MISSING_AGGREGATION] The non-aggregating expression "b" is based on columns which are not participating in the GROUP BY clause. ``` No. Unit test. Closes #40685 from wangyum/SPARK-43050. Authored-by: Yuming Wang Signed-off-by: Yuming Wang (cherry picked from commit 45b84cd37add1b9ce274273ad5e519e6bc1d8013) Signed-off-by: Yuming Wang --- .../spark/sql/catalyst/analysis/Analyzer.scala | 27 -- .../resources/sql-tests/inputs/grouping_set.sql| 4 .../sql-tests/results/grouping_set.sql.out | 18 +++ 3 files changed, 31 insertions(+), 18 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index 881f2cc2078..4a5c0b4aa88 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -605,31 +605,22 @@ class Analyzer(override val catalogManager: CatalogManager) aggregations: Seq[NamedExpression], groupByAliases: Seq[Alias], groupingAttrs: Seq[Expression], -gid: Attribute): Seq[NamedExpression] = aggregations.map { agg => - // collect all the found AggregateExpression, so we can check an expression is part of - // any AggregateExpression or not. - val aggsBuffer = ArrayBuffer[Expression]() - // Returns whether the expression belongs to any expressions in `aggsBuffer` or not. - def isPartOfAggregation(e: Expression): Boolean = { -aggsBuffer.exists(a => a.exists(_ eq e)) - } - replaceGroupingFunc(agg, groupByExprs, gid).transformDown { -// AggregateExpression should be computed on the unmodified value of its argument -// expressions, so we should not replace any references to grouping expression -// inside it. -case e if AggregateExpression.isAggregate(e) => - aggsBuffer += e - e -case e if isPartOfAggregation(e) => e +gid: Attribute): Seq[NamedExpression] = { + def replaceExprs(e: Expression): Expression = e match { +case e if AggregateExpression.isAggregate(e) => e case e => // Replace expression by expand output attribute. val index = groupByAliases.indexWhere(_.child.semanticEquals(e)) if (index == -1) { -e +e.mapChildren(replaceExprs) } else { groupingAttrs(index) } - }.asInstanceOf[NamedExpression] + } + aggregations +.map(replaceGroupingFunc(_, groupByExprs, gid)) +.map(replaceExprs) +.map(_.asInstanceOf[NamedExpression]) } /* diff --git a/sql/core/src/test/resources/sql-tests/inputs/grouping_set.sql b/sql/core/src/test/resources/sql-tests/inputs/grouping_set.sql index 4d516bdda7b..909c36c926c 100644 --- a/sql/core/src/test/resources/sql-tests/inputs/grouping_set.sql +++ b/sql/core/src/test/resources/sql-tests/inputs/grouping_set.sql @@ -60,3 +60,7 @@ SELECT grouping(k1), k1, k2, avg(v) FROM (VALUES (1,1,1),(2,2,2)) AS t(k1,k2,v) -- grouping_id function SELECT grouping_id(k1, k2), avg(v) from (VALUES (1,1,1),(2,2,2)) AS t(k1,k2,v) GROUP BY k1, k2 GROUPING SETS ((k2, k1), k1); + +SELECT CASE WHEN a IS NULL THEN count(b) WHEN b IS NULL THEN count(c) END +FROM grouping +GROUP BY GROUPING SETS (a, b, c); diff --git a/sql/core/src/test/resources/sql-tests/results/grouping_set.sql.out b/sql/core/src/test/resources/sql-tests/results/grouping_set.sql.out index d89050ab6d8..0429efddafa 100644 --- a/sql/core/src/test/resources/sql-tests/results/grouping_set.sql.out +++
[spark] branch branch-3.4 updated: [SPARK-43050][SQL] Fix construct aggregate expressions by replacing grouping functions
This is an automated email from the ASF dual-hosted git repository. yumwang pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new b89b927af4f [SPARK-43050][SQL] Fix construct aggregate expressions by replacing grouping functions b89b927af4f is described below commit b89b927af4fe2ae35c2818d1f2f0efd879a39565 Author: Yuming Wang AuthorDate: Sat Apr 15 09:27:22 2023 +0800 [SPARK-43050][SQL] Fix construct aggregate expressions by replacing grouping functions ### What changes were proposed in this pull request? This PR fixes construct aggregate expressions by replacing grouping functions if a expression is part of aggregation. In the following example, the second `b` should also be replaced: https://user-images.githubusercontent.com/5399861/230415618-84cd6334-690e-4b0b-867b-ccc4056226a8.png;> ### Why are the changes needed? Fix bug: ``` spark-sql (default)> SELECT CASE WHEN a IS NULL THEN count(b) WHEN b IS NULL THEN count(c) END > FROM grouping > GROUP BY GROUPING SETS (a, b, c); [MISSING_AGGREGATION] The non-aggregating expression "b" is based on columns which are not participating in the GROUP BY clause. ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit test. Closes #40685 from wangyum/SPARK-43050. Authored-by: Yuming Wang Signed-off-by: Yuming Wang (cherry picked from commit 45b84cd37add1b9ce274273ad5e519e6bc1d8013) Signed-off-by: Yuming Wang # Conflicts: # sql/core/src/test/resources/sql-tests/analyzer-results/grouping_set.sql.out --- .../spark/sql/catalyst/analysis/Analyzer.scala | 27 -- .../resources/sql-tests/inputs/grouping_set.sql| 4 .../sql-tests/results/grouping_set.sql.out | 18 +++ 3 files changed, 31 insertions(+), 18 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index d9c2c0ef63b..dc8904d7f74 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -608,31 +608,22 @@ class Analyzer(override val catalogManager: CatalogManager) extends RuleExecutor aggregations: Seq[NamedExpression], groupByAliases: Seq[Alias], groupingAttrs: Seq[Expression], -gid: Attribute): Seq[NamedExpression] = aggregations.map { agg => - // collect all the found AggregateExpression, so we can check an expression is part of - // any AggregateExpression or not. - val aggsBuffer = ArrayBuffer[Expression]() - // Returns whether the expression belongs to any expressions in `aggsBuffer` or not. - def isPartOfAggregation(e: Expression): Boolean = { -aggsBuffer.exists(a => a.exists(_ eq e)) - } - replaceGroupingFunc(agg, groupByExprs, gid).transformDown { -// AggregateExpression should be computed on the unmodified value of its argument -// expressions, so we should not replace any references to grouping expression -// inside it. -case e if AggregateExpression.isAggregate(e) => - aggsBuffer += e - e -case e if isPartOfAggregation(e) => e +gid: Attribute): Seq[NamedExpression] = { + def replaceExprs(e: Expression): Expression = e match { +case e if AggregateExpression.isAggregate(e) => e case e => // Replace expression by expand output attribute. val index = groupByAliases.indexWhere(_.child.semanticEquals(e)) if (index == -1) { -e +e.mapChildren(replaceExprs) } else { groupingAttrs(index) } - }.asInstanceOf[NamedExpression] + } + aggregations +.map(replaceGroupingFunc(_, groupByExprs, gid)) +.map(replaceExprs) +.map(_.asInstanceOf[NamedExpression]) } /* diff --git a/sql/core/src/test/resources/sql-tests/inputs/grouping_set.sql b/sql/core/src/test/resources/sql-tests/inputs/grouping_set.sql index 4d516bdda7b..909c36c926c 100644 --- a/sql/core/src/test/resources/sql-tests/inputs/grouping_set.sql +++ b/sql/core/src/test/resources/sql-tests/inputs/grouping_set.sql @@ -60,3 +60,7 @@ SELECT grouping(k1), k1, k2, avg(v) FROM (VALUES (1,1,1),(2,2,2)) AS t(k1,k2,v) -- grouping_id function SELECT grouping_id(k1, k2), avg(v) from (VALUES (1,1,1),(2,2,2)) AS t(k1,k2,v) GROUP BY k1, k2 GROUPING SETS ((k2, k1), k1); + +SELECT CASE WHEN a IS NULL THEN count(b) WHEN b IS NULL THEN count(c) END +FROM grouping +GROUP BY GROUPING
[spark] branch master updated (779602806e0 -> 45b84cd37ad)
This is an automated email from the ASF dual-hosted git repository. yumwang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 779602806e0 [SPARK-43095][SQL] Avoid Once strategy's idempotence is broken for batch: `Infer Filters` add 45b84cd37ad [SPARK-43050][SQL] Fix construct aggregate expressions by replacing grouping functions No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/Analyzer.scala | 27 -- .../analyzer-results/grouping_set.sql.out | 16 + .../resources/sql-tests/inputs/grouping_set.sql| 4 .../sql-tests/results/grouping_set.sql.out | 18 +++ 4 files changed, 47 insertions(+), 18 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (59ba09a3c51 -> 779602806e0)
This is an automated email from the ASF dual-hosted git repository. yumwang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 59ba09a3c51 [SPARK-42926][BUILD][SQL] Upgrade Parquet to 1.13.0 add 779602806e0 [SPARK-43095][SQL] Avoid Once strategy's idempotence is broken for batch: `Infer Filters` No new revisions were added by this update. Summary of changes: .../plans/logical/QueryPlanConstraints.scala | 8 --- .../InferFiltersFromConstraintsSuite.scala | 28 ++ 2 files changed, 33 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-42926][BUILD][SQL] Upgrade Parquet to 1.13.0
This is an automated email from the ASF dual-hosted git repository. yumwang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 59ba09a3c51 [SPARK-42926][BUILD][SQL] Upgrade Parquet to 1.13.0 59ba09a3c51 is described below commit 59ba09a3c511b3f11a07138afae6dc9f15edf99d Author: Yuming Wang AuthorDate: Sat Apr 15 09:15:34 2023 +0800 [SPARK-42926][BUILD][SQL] Upgrade Parquet to 1.13.0 ### What changes were proposed in this pull request? This PR upgrades Apache Parquet to 1.13.0. Apache Parquet [1.13.0 release notes](https://github.com/apache/parquet-mr/blob/apache-parquet-1.13.0/CHANGES.md?plain=1#L22-L78). ### Why are the changes needed? 1. This release includes [PARQUET-2160](https://issues.apache.org/jira/browse/PARQUET-2160). So we no longer need [SPARK-41952](https://issues.apache.org/jira/browse/SPARK-41952). 2. This release includes [Java Vector API support](https://github.com/apache/parquet-mr/blob/apache-parquet-1.13.0/README.md?plain=1#L88-L100). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing unit test and benchmark test. TPC-DS benchmark result: Query | Parquet 1.13.0(first time) | Parquet 1.12.3(first time) | Parquet 1.13.0(second time) | Parquet 1.12.3(second time) | Parquet 1.13.0(third time) | Parquet 1.12.3(third time) -- | -- | -- | -- | -- | -- | -- q1.sql | 37.819 | 37.786 | 36.322 | 37.59 | 37.772 | 36.776 q2.sql | 42.132 | 41.513 | 43.189 | 42.274 | 42.859 | 42.605 q3.sql | 5.933 | 6.1 | 6.082 | 6.071 | 6.128 | 6.094 q4.sql | 335.051 | 319.173 | 322.396 | 320.977 | 324.464 | 326.822 q5.sql | 78.41 | 76.631 | 76.841 | 76.37 | 78.257 | 76.502 q6.sql | 9.006 | 9.11 | 8.737 | 8.577 | 8.729 | 9.05 q7.sql | 12.881 | 12.731 | 12.685 | 12.662 | 12.606 | 12.675 q8.sql | 10.122 | 10.092 | 10.035 | 10.853 | 10.277 | 10.841 q9.sql | 72.562 | 71.942 | 73.649 | 73.04 | 72.899 | 72.01 q10.sql | 14.127 | 13.075 | 14.276 | 13.913 | 13.281 | 13.229 q11.sql | 111.334 | 111.612 | 110.952 | 110.776 | 111.686 | 112.27 q12.sql | 3.138 | 3.854 | 3.187 | 3.613 | 3.437 | 3.306 q13.sql | 13.131 | 12.676 | 12.516 | 12.417 | 12.739 | 12.987 q14a.sql | 217.664 | 213.632 | 214.655 | 213.333 | 217.601 | 213.341 q14b.sql | 191.553 | 182.775 | 184.35 | 187.004 | 188.313 | 189.876 q15.sql | 10.308 | 10.46 | 10.304 | 9.901 | 10.175 | 10.307 q16.sql | 81.97 | 82.059 | 82.41 | 81.263 | 83.179 | 82.042 q17.sql | 28.876 | 28.905 | 30.41 | 29.573 | 29.555 | 28.837 q18.sql | 14.183 | 13.929 | 14.11 | 14.466 | 13.969 | 14.022 q19.sql | 6.611 | 7.593 | 6.652 | 6.659 | 6.446 | 6.533 q20.sql | 3.263 | 3.701 | 3.56 | 3.503 | 3.53 | 3.627 q21.sql | 2.252 | 2.188 | 2.249 | 2.128 | 2.161 | 2.252 q22.sql | 14.809 | 14.715 | 14.324 | 14.266 | 14.567 | 14.123 q23a.sql | 554.385 | 544.75 | 546.213 | 542.194 | 553.784 | 547.388 q23b.sql | 781.236 | 768.367 | 770.584 | 776.065 | 776.502 | 776.006 q24a.sql | 196.806 | 193.989 | 197.608 | 194.416 | 194.71 | 192.817 q24b.sql | 176.56 | 183.084 | 177.486 | 177.936 | 177.776 | 177.389 q25.sql | 22.323 | 22.089 | 22.665 | 22.049 | 22.248 | 22.317 q26.sql | 8.574 | 8.356 | 8.174 | 8.753 | 8.186 | 8.302 q27.sql | 9.056 | 8.252 | 8.37 | 8.319 | 8.516 | 8.38 q28.sql | 102.185 | 102.382 | 102.344 | 103.058 | 102.024 | 102.786 q29.sql | 75.655 | 75.604 | 75.217 | 75.532 | 75.835 | 76.024 q30.sql | 12.476 | 12.966 | 13.039 | 14.108 | 12.19 | 13.143 q31.sql | 26.343 | 27.632 | 26.337 | 26.791 | 26.74 | 26.098 q32.sql | 3.251 | 3.41 | 3.378 | 3.333 | 3.371 | 3.516 q33.sql | 7.143 | 6.125 | 6.85 | 6.718 | 7.067 | 6.615 q34.sql | 8.53 | 8.656 | 8.536 | 8.866 | 8.358 | 8.589 q35.sql | 35.212 | 35.571 | 35.659 | 37.631 | 36.292 | 35.603 q36.sql | 9.264 | 9.166 | 9.748 | 9.488 | 9.45 | 9.469 q37.sql | 36.368 | 35.881 | 37.023 | 36.578 | 35.823 | 36.7 q38.sql | 74.58 | 73.472 | 72.926 | 73.823 | 71.097 | 73.329 q39a.sql | 8.596 | 7.637 | 8.036 | 7.984 | 7.849 | 7.88 q39b.sql | 7.233 | 6.641 | 6.278 | 7.06 | 6.595 | 6.691 q40.sql | 17.34 | 16.558 | 16.448 | 16.864 | 16.432 | 16.413 q41.sql | 1.223 | 1.105 | 1.103 | 1.182 | 1.232 | 1.304 q42.sql | 2.464 | 2.441 | 2.554 | 2.544 | 2.314 | 2.393 q43.sql | 7.477 | 7.396 | 7.394 | 7.764 | 7.381 | 7.534 q44.sql | 30.228 | 30.516 | 30.859 | 31.057 | 30.372 | 29.008 q45.sql | 9.93 | 10.089 | 9.874 | 10.075 | 9.802 | 9.838 q46.sql | 9.544 | 9.949 | 9.503 | 9.755 | 9.395 | 9.25 q47.sql | 27.322 | 26.952 | 26.974 | 26.83 | 27.087 | 26.991 q48.sql | 14.266 | 14.39 | 14.517 | 14.684 | 14.471 | 14.61 q49.sql | 21.279 | 21.733 | 20.286 | 20.945 | 22.388 | 21.52 q50.sql | 191.416 | 194.256 |
[spark] branch master updated: [SPARK-42597][SQL][FOLLOW-UP] Only rewrite `EqualNullSafe` when the left side is non-nullable
This is an automated email from the ASF dual-hosted git repository. yumwang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e01ca3c76ed [SPARK-42597][SQL][FOLLOW-UP] Only rewrite `EqualNullSafe` when the left side is non-nullable e01ca3c76ed is described below commit e01ca3c76ed7031819237deb6ab34d13141bda8e Author: Yuming Wang AuthorDate: Sat Apr 15 09:10:02 2023 +0800 [SPARK-42597][SQL][FOLLOW-UP] Only rewrite `EqualNullSafe` when the left side is non-nullable ### What changes were proposed in this pull request? This PR makes it only rewrite `EqualNullSafe` when the left side is non-nullable when unwrapping date type to timestamp type. ### Why are the changes needed? 1. The current rewrite is incorrect for `EqualNullSafe` if left side is nullable. 3. Even if we fix the rewrite, it will contain `If` and can't be pushed down to the data source: ``` EqualNullSafe(Cast(ts, DateType), date) if Cast(ts, DateType).nullable -> If(IsNull(ts), FalseLiteral, And(GreaterThanOrEqual(ts, Cast(date, TimestampType)), LessThan(ts, Cast(date + 1, TimestampType ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Update existing unit test. Closes #40743 from wangyum/SPARK-42597-dev. Lead-authored-by: Yuming Wang Co-authored-by: Yuming Wang Signed-off-by: Yuming Wang --- .../sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala | 5 - .../sql/catalyst/optimizer/UnwrapCastInBinaryComparisonSuite.scala | 7 +++ 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala index 7e5242d968d..54b1dd419fb 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala @@ -315,7 +315,10 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] { GreaterThanOrEqual(fromExp, Cast(dateAddOne, fromExp.dataType, tz, evalMode)) case _: GreaterThanOrEqual => GreaterThanOrEqual(fromExp, Cast(date, fromExp.dataType, tz, evalMode)) - case Equality(_, _) => + case _: EqualTo => +And(GreaterThanOrEqual(fromExp, Cast(date, fromExp.dataType, tz, evalMode)), + LessThan(fromExp, Cast(dateAddOne, fromExp.dataType, tz, evalMode))) + case EqualNullSafe(left, _) if !left.nullable => And(GreaterThanOrEqual(fromExp, Cast(date, fromExp.dataType, tz, evalMode)), LessThan(fromExp, Cast(dateAddOne, fromExp.dataType, tz, evalMode))) case _: LessThan => diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparisonSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparisonSuite.scala index 8ceb26b906b..0f0acb669c2 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparisonSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparisonSuite.scala @@ -46,7 +46,7 @@ class UnwrapCastInBinaryComparisonSuite extends PlanTest with ExpressionEvalHelp val f2: BoundReference = $"b".float.canBeNull.at(1) val f3: BoundReference = $"c".decimal(5, 2).canBeNull.at(2) val f4: BoundReference = $"d".boolean.canBeNull.at(3) - val f5: BoundReference = $"e".timestamp.canBeNull.at(4) + val f5: BoundReference = $"e".timestamp.notNull.at(4) val f6: BoundReference = $"f".timestampNTZ.canBeNull.at(5) test("unwrap casts when literal == max") { @@ -392,9 +392,8 @@ class UnwrapCastInBinaryComparisonSuite extends PlanTest with ExpressionEvalHelp (f5 >= castTimestamp(dateLit) && f5 < castTimestamp(dateAddOne)) || (f6 >= castTimestampNTZ(dateLit) && f6 < castTimestampNTZ(dateAddOne))) assertEquivalent( - castDate(f5) <=> dateLit || castDate(f6) === dateLit, - (f5 >= castTimestamp(dateLit) && f5 < castTimestamp(dateAddOne)) || -(f6 >= castTimestampNTZ(dateLit) && f6 < castTimestampNTZ(dateAddOne))) + castDate(f5) <=> dateLit || castDate(f6) <=> dateLit, + (f5 >= castTimestamp(dateLit) && f5 < castTimestamp(dateAddOne)) || castDate(f6) <=> dateLit) assertEquivalent( dateLit < castDate(f5) || dateLit < castDate(f6), castTimestamp(dateAddOne) <= f5 || castTimestampNTZ(dateAddOne) <= f6) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For
[spark] branch master updated: [SPARK-43107][SQL] Coalesce buckets in join applied on broadcast join stream side
This is an automated email from the ASF dual-hosted git repository. yumwang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 157de2f6292 [SPARK-43107][SQL] Coalesce buckets in join applied on broadcast join stream side 157de2f6292 is described below commit 157de2f6292353dc1f55e523f36b2d5d07e93389 Author: Yuming Wang AuthorDate: Sat Apr 15 09:07:01 2023 +0800 [SPARK-43107][SQL] Coalesce buckets in join applied on broadcast join stream side ### What changes were proposed in this pull request? This PR adds support coalesce buckets in join applied on broadcast join stream side. ### Why are the changes needed? Reduce shuffle to improve query performance. Before | After -- | -- https://user-images.githubusercontent.com/5399861/231473104-4d9bbe4e-9cfe-4473-ba17-75d09f2eb1b9.png; width="400" height="630"> | https://user-images.githubusercontent.com/5399861/231470141-bd15031f-facb-4c59-84f2-e60f054257d6.png; width="400" height="630"> ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit test. Closes #40756 from wangyum/SPARK-43107. Authored-by: Yuming Wang Signed-off-by: Yuming Wang --- .../bucketing/CoalesceBucketsInJoin.scala | 14 +++- .../spark/sql/sources/BucketedReadSuite.scala | 26 +- 2 files changed, 33 insertions(+), 7 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoin.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoin.scala index 7a31e343766..bd79aca8e64 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoin.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoin.scala @@ -25,7 +25,7 @@ import org.apache.spark.sql.catalyst.optimizer.{BuildLeft, BuildRight} import org.apache.spark.sql.catalyst.plans.physical.{HashPartitioning, Partitioning} import org.apache.spark.sql.catalyst.rules.Rule import org.apache.spark.sql.execution.{FileSourceScanExec, FilterExec, ProjectExec, SparkPlan} -import org.apache.spark.sql.execution.joins.{ShuffledHashJoinExec, ShuffledJoin, SortMergeJoinExec} +import org.apache.spark.sql.execution.joins.{BroadcastHashJoinExec, BroadcastNestedLoopJoinExec, ShuffledHashJoinExec, ShuffledJoin, SortMergeJoinExec} /** * This rule coalesces one side of the `SortMergeJoin` and `ShuffledHashJoin` @@ -42,7 +42,7 @@ object CoalesceBucketsInJoin extends Rule[SparkPlan] { plan: SparkPlan, numCoalescedBuckets: Int): SparkPlan = { plan transformUp { - case f: FileSourceScanExec => + case f: FileSourceScanExec if f.relation.bucketSpec.nonEmpty => f.copy(optionalNumCoalescedBuckets = Some(numCoalescedBuckets)) } } @@ -115,7 +115,11 @@ object ExtractJoinWithBuckets { private def hasScanOperation(plan: SparkPlan): Boolean = plan match { case f: FilterExec => hasScanOperation(f.child) case p: ProjectExec => hasScanOperation(p.child) -case _: FileSourceScanExec => true +case j: BroadcastHashJoinExec => + if (j.buildSide == BuildLeft) hasScanOperation(j.right) else hasScanOperation(j.left) +case j: BroadcastNestedLoopJoinExec => + if (j.buildSide == BuildLeft) hasScanOperation(j.right) else hasScanOperation(j.left) +case f: FileSourceScanExec => f.relation.bucketSpec.nonEmpty case _ => false } @@ -142,9 +146,7 @@ object ExtractJoinWithBuckets { } private def isApplicable(j: ShuffledJoin): Boolean = { -(j.isInstanceOf[SortMergeJoinExec] || - j.isInstanceOf[ShuffledHashJoinExec]) && - hasScanOperation(j.left) && +hasScanOperation(j.left) && hasScanOperation(j.right) && satisfiesOutputPartitioning(j.leftKeys, j.left.outputPartitioning) && satisfiesOutputPartitioning(j.rightKeys, j.right.outputPartitioning) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala index a18c681e0fe..0a3824e0298 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala @@ -1011,9 +1011,10 @@ abstract class BucketedReadSuite extends QueryTest with SQLTestUtils with Adapti } test("bucket coalescing is applied when join expressions match with partitioning expressions") { -withTable("t1", "t2") { +withTable("t1", "t2", "t3") { df1.write.format("parquet").bucketBy(8, "i", "j").saveAsTable("t1") df2.write.format("parquet").bucketBy(4, "i", "j").saveAsTable("t2") +
[GitHub] [spark-website] dongjoon-hyun commented on a diff in pull request #448: Updated downloads.js
dongjoon-hyun commented on code in PR #448: URL: https://github.com/apache/spark-website/pull/448#discussion_r1167332304 ## js/downloads.js: ## @@ -1,31 +1,26 @@ -// Script for generating Spark download links -// No dependencies; pure javascript. - -releases = {}; +const releases = {}; function addRelease(version, releaseDate, packages, mirrored) { releases[version] = { released: releaseDate, -packages: packages, -mirrored: mirrored +packages, +mirrored }; } -var sources = {pretty: "Source Code", tag: "sources"}; -var hadoopFree = {pretty: "Pre-built with user-provided Apache Hadoop", tag: "without-hadoop"}; -var hadoop2p7 = {pretty: "Pre-built for Apache Hadoop 2.7", tag: "hadoop2.7"}; -var hadoop3p3 = {pretty: "Pre-built for Apache Hadoop 3.2 and later", tag: "hadoop3.2"}; -var hadoop3p3scala213 = {pretty: "Pre-built for Apache Hadoop 3.2 and later (Scala 2.13)", tag: "hadoop3.2-scala2.13"}; -var hadoop2p = {pretty: "Pre-built for Apache Hadoop 2.7", tag: "hadoop2"}; -var hadoop3p = {pretty: "Pre-built for Apache Hadoop 3.3 and later", tag: "hadoop3"}; -var hadoop3pscala213 = {pretty: "Pre-built for Apache Hadoop 3.3 and later (Scala 2.13)", tag: "hadoop3-scala2.13"}; - -// 3.2.0+ -var packagesV12 = [hadoop3p3, hadoop3p3scala213, hadoop2p7, hadoopFree, sources]; -// 3.3.0+ -var packagesV13 = [hadoop3p, hadoop3pscala213, hadoop2p, hadoopFree, sources]; - -addRelease("3.3.2", new Date("02/17/2023"), packagesV13, true); +const sources = {pretty: "Source Code", tag: "sources"}; +const hadoopFree = {pretty: "Pre-built with user-provided Apache Hadoop", tag: "without-hadoop"}; +const hadoop2p7 = {pretty: "Pre-built for Apache Hadoop 2.7", tag: "hadoop2.7"}; +const hadoop3p3 = {pretty: "Pre-built for Apache Hadoop 3.2 and later", tag: "hadoop3.2"}; +const hadoop3p3scala213 = {pretty: "Pre-built for Apache Hadoop 3.2 and later (Scala 2.13)", tag: "hadoop3.2-scala2.13"}; +const hadoop2p = {pretty: "Pre-built for Apache Hadoop 2.7", tag: "hadoop2"}; +const hadoop3p = {pretty: "Pre-built for Apache Hadoop 3.3 and later", tag: "hadoop3"}; +const hadoop3pscala213 = {pretty: "Pre-built for Apache Hadoop 3.3 and later (Scala 2.13)", tag: "hadoop3-scala2.13"}; + +const packagesV12 = [hadoop3p3, hadoop3p3scala213, hadoop2p7, hadoopFree, sources]; +const packagesV13 = [hadoop3p, hadoop3pscala213, hadoop2p, hadoopFree, sources]; + +addRelease("3.3.3", new Date("04/14/2023"), packagesV13, true); Review Comment: Ya, Apache Spark community doesn't release it yet. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] xinrong-meng commented on a diff in pull request #448: Updated downloads.js
xinrong-meng commented on code in PR #448: URL: https://github.com/apache/spark-website/pull/448#discussion_r1167257625 ## js/downloads.js: ## @@ -1,31 +1,26 @@ -// Script for generating Spark download links -// No dependencies; pure javascript. - -releases = {}; +const releases = {}; function addRelease(version, releaseDate, packages, mirrored) { releases[version] = { released: releaseDate, -packages: packages, -mirrored: mirrored +packages, +mirrored }; } -var sources = {pretty: "Source Code", tag: "sources"}; -var hadoopFree = {pretty: "Pre-built with user-provided Apache Hadoop", tag: "without-hadoop"}; -var hadoop2p7 = {pretty: "Pre-built for Apache Hadoop 2.7", tag: "hadoop2.7"}; -var hadoop3p3 = {pretty: "Pre-built for Apache Hadoop 3.2 and later", tag: "hadoop3.2"}; -var hadoop3p3scala213 = {pretty: "Pre-built for Apache Hadoop 3.2 and later (Scala 2.13)", tag: "hadoop3.2-scala2.13"}; -var hadoop2p = {pretty: "Pre-built for Apache Hadoop 2.7", tag: "hadoop2"}; -var hadoop3p = {pretty: "Pre-built for Apache Hadoop 3.3 and later", tag: "hadoop3"}; -var hadoop3pscala213 = {pretty: "Pre-built for Apache Hadoop 3.3 and later (Scala 2.13)", tag: "hadoop3-scala2.13"}; - -// 3.2.0+ -var packagesV12 = [hadoop3p3, hadoop3p3scala213, hadoop2p7, hadoopFree, sources]; -// 3.3.0+ -var packagesV13 = [hadoop3p, hadoop3pscala213, hadoop2p, hadoopFree, sources]; - -addRelease("3.3.2", new Date("02/17/2023"), packagesV13, true); +const sources = {pretty: "Source Code", tag: "sources"}; +const hadoopFree = {pretty: "Pre-built with user-provided Apache Hadoop", tag: "without-hadoop"}; +const hadoop2p7 = {pretty: "Pre-built for Apache Hadoop 2.7", tag: "hadoop2.7"}; +const hadoop3p3 = {pretty: "Pre-built for Apache Hadoop 3.2 and later", tag: "hadoop3.2"}; +const hadoop3p3scala213 = {pretty: "Pre-built for Apache Hadoop 3.2 and later (Scala 2.13)", tag: "hadoop3.2-scala2.13"}; +const hadoop2p = {pretty: "Pre-built for Apache Hadoop 2.7", tag: "hadoop2"}; +const hadoop3p = {pretty: "Pre-built for Apache Hadoop 3.3 and later", tag: "hadoop3"}; +const hadoop3pscala213 = {pretty: "Pre-built for Apache Hadoop 3.3 and later (Scala 2.13)", tag: "hadoop3-scala2.13"}; + +const packagesV12 = [hadoop3p3, hadoop3p3scala213, hadoop2p7, hadoopFree, sources]; +const packagesV13 = [hadoop3p, hadoop3pscala213, hadoop2p, hadoopFree, sources]; + +addRelease("3.3.3", new Date("04/14/2023"), packagesV13, true); Review Comment: I am wondering why `3.3.3` is associated with `04/14/2023`. The date should be the release date. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] xinrong-meng commented on pull request #443: [DO NOT MERGE] 3.4.0 API Auditing
xinrong-meng commented on PR #443: URL: https://github.com/apache/spark-website/pull/443#issuecomment-1509222840 Closed. Thanks @dongjoon-hyun for the reminder. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] xinrong-meng closed pull request #443: [DO NOT MERGE] 3.4.0 API Auditing
xinrong-meng closed pull request #443: [DO NOT MERGE] 3.4.0 API Auditing URL: https://github.com/apache/spark-website/pull/443 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r61288 - in /dev/spark: v3.2.4-rc1-docs/ v3.4.0-rc7-docs/
Author: xinrong Date: Fri Apr 14 20:31:17 2023 New Revision: 61288 Log: Removing RC artifacts. Removed: dev/spark/v3.2.4-rc1-docs/ dev/spark/v3.4.0-rc7-docs/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] xinrong-meng commented on pull request #451: Fix the download page of Spark 3.4.0
xinrong-meng commented on PR #451: URL: https://github.com/apache/spark-website/pull/451#issuecomment-1509172623 Merged to asf-site branch, thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-website] branch asf-site updated: Fix the download page of Spark 3.4.0
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new 624de69568 Fix the download page of Spark 3.4.0 624de69568 is described below commit 624de69568e5c743206a63cfc49d8647e41e1167 Author: Gengliang Wang AuthorDate: Fri Apr 14 13:03:59 2023 -0700 Fix the download page of Spark 3.4.0 Currently it shows 3.3.2 on top https://user-images.githubusercontent.com/1097932/232143660-9d97a7c0-5eb0-44af-9f06-41cb6386a2dd.png;> After fix: https://user-images.githubusercontent.com/1097932/232143685-ee5b06e6-3af9-43ea-8690-209f4d8cd25f.png;> Author: Gengliang Wang Closes #451 from gengliangwang/fixDownload. --- js/downloads.js | 2 +- site/js/downloads.js | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/js/downloads.js b/js/downloads.js index 915b9c8809..9781273310 100644 --- a/js/downloads.js +++ b/js/downloads.js @@ -25,9 +25,9 @@ var packagesV12 = [hadoop3p3, hadoop3p3scala213, hadoop2p7, hadoopFree, sources] // 3.3.0+ var packagesV13 = [hadoop3p, hadoop3pscala213, hadoop2p, hadoopFree, sources]; +addRelease("3.4.0", new Date("04/13/2023"), packagesV13, true); addRelease("3.3.2", new Date("02/17/2023"), packagesV13, true); addRelease("3.2.4", new Date("04/13/2023"), packagesV12, true); -addRelease("3.4.0", new Date("04/13/2023"), packagesV13, true); function append(el, contents) { el.innerHTML += contents; diff --git a/site/js/downloads.js b/site/js/downloads.js index 915b9c8809..9781273310 100644 --- a/site/js/downloads.js +++ b/site/js/downloads.js @@ -25,9 +25,9 @@ var packagesV12 = [hadoop3p3, hadoop3p3scala213, hadoop2p7, hadoopFree, sources] // 3.3.0+ var packagesV13 = [hadoop3p, hadoop3pscala213, hadoop2p, hadoopFree, sources]; +addRelease("3.4.0", new Date("04/13/2023"), packagesV13, true); addRelease("3.3.2", new Date("02/17/2023"), packagesV13, true); addRelease("3.2.4", new Date("04/13/2023"), packagesV12, true); -addRelease("3.4.0", new Date("04/13/2023"), packagesV13, true); function append(el, contents) { el.innerHTML += contents; - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] xinrong-meng closed pull request #451: Fix the download page of Spark 3.4.0
xinrong-meng closed pull request #451: Fix the download page of Spark 3.4.0 URL: https://github.com/apache/spark-website/pull/451 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] gengliangwang opened a new pull request, #451: Fix the download page of Spark 3.4.0
gengliangwang opened a new pull request, #451: URL: https://github.com/apache/spark-website/pull/451 Currently it shows 3.3.2 on top https://user-images.githubusercontent.com/1097932/232143660-9d97a7c0-5eb0-44af-9f06-41cb6386a2dd.png;> After fix: https://user-images.githubusercontent.com/1097932/232143685-ee5b06e6-3af9-43ea-8690-209f4d8cd25f.png;> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] xinrong-meng closed pull request #450: Fix the API docs of Spark 3.4.0
xinrong-meng closed pull request #450: Fix the API docs of Spark 3.4.0 URL: https://github.com/apache/spark-website/pull/450 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] xinrong-meng commented on pull request #450: Fix the API docs of Spark 3.4.0
xinrong-meng commented on PR #450: URL: https://github.com/apache/spark-website/pull/450#issuecomment-1509115348 Merged to asf-site branch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] xinrong-meng commented on pull request #450: Fix the API docs of Spark 3.4.0
xinrong-meng commented on PR #450: URL: https://github.com/apache/spark-website/pull/450#issuecomment-1509111397 Thank you Gengliang! I'll clean up the staging after the PR is in. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] gengliangwang opened a new pull request, #450: Fix the API docs of Spark 3.4.0
gengliangwang opened a new pull request, #450: URL: https://github.com/apache/spark-website/pull/450 Currently the API docs of Spark 3.4.0 are wrong. This PR is to use the doc from https://dist.apache.org/repos/dist/dev/spark/v3.4.0-rc7-docs/ to overwrite them -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r61281 - in /dev/spark/v3.4.0-rc7-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/R/articles/ _site/api/R/deps/ _site/api/R/deps/bootstrap-5.2.2/ _site/api/R/deps/jquery-3.6.0/ _site/api
Author: xinrong Date: Fri Apr 14 18:58:10 2023 New Revision: 61281 Log: Apache Spark v3.4.0-rc7 docs [This commit notification would consist of 2789 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] dongjoon-hyun commented on pull request #447: Add 3.4.0 release note and news and update links
dongjoon-hyun commented on PR #447: URL: https://github.com/apache/spark-website/pull/447#issuecomment-1508935369 +1, late LGTM. Thank you, all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] dongjoon-hyun commented on pull request #449: Fix a typo from release notes in Spark 3.4
dongjoon-hyun commented on PR #449: URL: https://github.com/apache/spark-website/pull/449#issuecomment-1508935110 Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] xinrong-meng commented on pull request #447: Add 3.4.0 release note and news and update links
xinrong-meng commented on PR #447: URL: https://github.com/apache/spark-website/pull/447#issuecomment-1508926338 Thank you all! Thanks @HyukjinKwon for the fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-43064][SQL] Spark SQL CLI SQL tab should only show once statement once
This is an automated email from the ASF dual-hosted git repository. sunchao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4708adc9185 [SPARK-43064][SQL] Spark SQL CLI SQL tab should only show once statement once 4708adc9185 is described below commit 4708adc91856d7665e54e99545ecd5249ce817f7 Author: Angerszh AuthorDate: Fri Apr 14 08:43:42 2023 -0700 [SPARK-43064][SQL] Spark SQL CLI SQL tab should only show once statement once ### What changes were proposed in this pull request? Before https://user-images.githubusercontent.com/46485123/230573688-42acb9f2-6fa0-48d0-bfde-c7ceeb306aef.png;> After https://user-images.githubusercontent.com/46485123/230573720-2c2a7731-d776-439c-ba6f-0dad9dc87a42.png;> ### Why are the changes needed? Don't need show twice, too weird ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? MT Closes #40701 from AngersZh/SPARK-43064. Authored-by: Angerszh Signed-off-by: Chao Sun --- .../apache/spark/sql/hive/thriftserver/SparkSQLDriver.scala | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLDriver.scala b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLDriver.scala index 18a0e57a2d3..8ae65a58608 100644 --- a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLDriver.scala +++ b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLDriver.scala @@ -29,6 +29,7 @@ import org.apache.hadoop.hive.ql.processors.CommandProcessorResponse import org.apache.spark.SparkThrowable import org.apache.spark.internal.Logging import org.apache.spark.sql.SQLContext +import org.apache.spark.sql.catalyst.plans.logical.CommandResult import org.apache.spark.sql.execution.{QueryExecution, SQLExecution} import org.apache.spark.sql.execution.HiveResult.hiveResultString import org.apache.spark.sql.internal.{SQLConf, VariableSubstitution} @@ -65,8 +66,15 @@ private[hive] class SparkSQLDriver(val context: SQLContext = SparkSQLEnv.sqlCont } context.sparkContext.setJobDescription(substitutorCommand) val execution = context.sessionState.executePlan(context.sql(command).logicalPlan) - hiveResponse = SQLExecution.withNewExecutionId(execution, Some("cli")) { -hiveResultString(execution.executedPlan) + // The SQL command has been executed above via `executePlan`, therefore we don't need to + // wrap it again with a new execution ID when getting Hive result. + execution.logical match { +case _: CommandResult => + hiveResponse = hiveResultString(execution.executedPlan) +case _ => + hiveResponse = SQLExecution.withNewExecutionId(execution, Some("cli")) { +hiveResultString(execution.executedPlan) + } } tableSchema = getResultSetSchema(execution) new CommandProcessorResponse(0) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-43104][BUILD] Set `shadeTestJar` of `protobuf` module to `false`
This is an automated email from the ASF dual-hosted git repository. sunchao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 70094dac1eb [SPARK-43104][BUILD] Set `shadeTestJar` of `protobuf` module to `false` 70094dac1eb is described below commit 70094dac1ebc2b755f76d945ea669b6f7de170dd Author: yangjie01 AuthorDate: Fri Apr 14 08:41:36 2023 -0700 [SPARK-43104][BUILD] Set `shadeTestJar` of `protobuf` module to `false` ### What changes were proposed in this pull request? This pr aims to set `shadeTestJar` of `protobuf` module to `false` to skip shade `spark-protobuf_**-tests.jar` process. ### Why are the changes needed? When `shadeTestJar` is true, `maven-shade-plugin` always try to find(sometime is downloading) some non-existent jars: ``` Downloading from central: https://repo.maven.apache.org/maven2/org/apache/hadoop/hadoop-client-api/3.3.5/hadoop-client-api-3.3.5-tests.jar [WARNING] Could not get tests for org.apache.hadoop:hadoop-client-api:jar:3.3.5:compile Downloading from central: https://repo.maven.apache.org/maven2/org/tukaani/xz/1.9/xz-1.9-tests.jar [WARNING] Could not get tests for org.tukaani:xz:jar:1.9:compile Downloading from gcs-maven-central-mirror: https://maven-central.storage-download.googleapis.com/maven2/org/slf4j/slf4j-api/2.0.7/slf4j-api-2.0.7-tests.jar Downloaded from gcs-maven-central-mirror: https://maven-central.storage-download.googleapis.com/maven2/org/slf4j/slf4j-api/2.0.7/slf4j-api-2.0.7-tests.jar (0 B at 0 B/s) [WARNING] Could not get tests for com.google.protobuf:protobuf-java:jar:3.22.0:compile Downloading from central: https://repo.maven.apache.org/maven2/org/apache/curator/curator-framework/2.13.0/curator-framework-2.13.0-tests.jar [WARNING] Could not get tests for org.apache.curator:curator-framework:jar:2.13.0:compile Downloading from central: https://repo.maven.apache.org/maven2/org/apache/logging/log4j/log4j-slf4j2-impl/2.20.0/log4j-slf4j2-impl-2.20.0-tests.jar [WARNING] Could not get tests for org.apache.logging.log4j:log4j-slf4j2-impl:jar:2.20.0:compile Downloading from central: https://repo.maven.apache.org/maven2/org/jetbrains/annotations/17.0.0/annotations-17.0.0-tests.jar [WARNING] Could not get tests for org.jetbrains:annotations:jar:17.0.0:compile Downloading from central: https://repo.maven.apache.org/maven2/org/apache/logging/log4j/log4j-1.2-api/2.20.0/log4j-1.2-api-2.20.0-tests.jar [WARNING] Could not get tests for org.apache.logging.log4j:log4j-1.2-api:jar:2.20.0:compile Downloading from central: https://repo.maven.apache.org/maven2/org/threeten/threeten-extra/1.7.1/threeten-extra-1.7.1-tests.jar [WARNING] Could not get tests for org.threeten:threeten-extra:jar:1.7.1:compile Downloading from central: https://repo.maven.apache.org/maven2/org/apache/yetus/audience-annotations/0.13.0/audience-annotations-0.13.0-tests.jar [WARNING] Could not get tests for org.apache.yetus:audience-annotations:jar:0.13.0:compile Downloading from central: https://repo.maven.apache.org/maven2/com/fasterxml/jackson/core/jackson-databind/2.14.2/jackson-databind-2.14.2-tests.jar [WARNING] Could not get tests for com.fasterxml.jackson.core:jackson-databind:jar:2.14.2:compile Downloading from central: https://repo.maven.apache.org/maven2/com/google/code/findbugs/jsr305/3.0.0/jsr305-3.0.0-tests.jar [WARNING] Could not get tests for com.google.code.findbugs:jsr305:jar:3.0.0:runtime Downloading from central: https://repo.maven.apache.org/maven2/org/xerial/snappy/snappy-java/1.1.9.1/snappy-java-1.1.9.1-tests.jar [WARNING] Could not get tests for org.xerial.snappy:snappy-java:jar:1.1.9.1:compile Downloading from central: https://repo.maven.apache.org/maven2/javax/activation/activation/1.1.1/activation-1.1.1-tests.jar [WARNING] Could not get tests for javax.activation:activation:jar:1.1.1:compile Downloading from central: https://repo.maven.apache.org/maven2/org/apache/hive/hive-storage-api/2.8.1/hive-storage-api-2.8.1-tests.jar [WARNING] Could not get tests for org.apache.hive:hive-storage-api:jar:2.8.1:compile Downloading from central: https://repo.maven.apache.org/maven2/org/spark-project/spark/unused/1.0.0/unused-1.0.0-tests.jar [WARNING] Could not get tests for org.spark-project.spark:unused:jar:1.0.0:compile Downloading from central: https://repo.maven.apache.org/maven2/org/scala-lang/scala-library/2.12.17/scala-library-2.12.17-tests.jar [WARNING] Could not get tests for org.scala-lang:scala-library:jar:2.12.17:compile Downloading from central: https://repo.maven.apache.org/maven2/com/github/luben/zstd-jni/1.5.4-2/zstd-jni-1.5.4-2-tests.jar [WARNING] Could not get tests for com.github.luben:zstd-jni:jar:1.5.4-2:compile Downloading from
[GitHub] [spark-website] PasunuriSrinidhi commented on pull request #448: Updated downloads.js
PasunuriSrinidhi commented on PR #448: URL: https://github.com/apache/spark-website/pull/448#issuecomment-1508533792 > What changed here, mostly adding const? does it do anything? You need to run jekyll to update the copy in site/ Maybe, just don't know if this matters. I converted normal function to es6 (Arrow function) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] srowen commented on pull request #448: Updated downloads.js
srowen commented on PR #448: URL: https://github.com/apache/spark-website/pull/448#issuecomment-1508503919 What changed here, mostly adding const? does it do anything? You need to run jekyll to update the copy in site/ Maybe, just don't know if this matters. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] HyukjinKwon closed pull request #449: Fix a typo from release notes in Spark 3.4
HyukjinKwon closed pull request #449: Fix a typo from release notes in Spark 3.4 URL: https://github.com/apache/spark-website/pull/449 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-website] branch asf-site updated: Fix a typo from release notes in Spark 3.4
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new ff55a784fa Fix a typo from release notes in Spark 3.4 ff55a784fa is described below commit ff55a784fad231b34ae17293859f8d27a80dd10e Author: Hyukjin Kwon AuthorDate: Fri Apr 14 18:42:42 2023 +0900 Fix a typo from release notes in Spark 3.4 This PR fixes "This will become a table of contents (this text will be scraped)." in the release notes that is a mistake. Author: Hyukjin Kwon Closes #449 from HyukjinKwon/minor-typo. --- releases/_posts/2023-04-13-spark-release-3-4-0.md | 2 -- site/releases/spark-release-3-4-0.html| 4 2 files changed, 6 deletions(-) diff --git a/releases/_posts/2023-04-13-spark-release-3-4-0.md b/releases/_posts/2023-04-13-spark-release-3-4-0.md index 42435b5064..840c189f0e 100644 --- a/releases/_posts/2023-04-13-spark-release-3-4-0.md +++ b/releases/_posts/2023-04-13-spark-release-3-4-0.md @@ -17,8 +17,6 @@ This release introduces Python client for Spark Connect, augments Structured Str To download Apache Spark 3.4.0, visit the [downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA for the [detailed changes](https://s.apache.org/spark-3.4.0). We have curated a list of high level changes here, grouped by major modules. -* This will become a table of contents (this text will be scraped). - {:toc} ### Highlight diff --git a/site/releases/spark-release-3-4-0.html b/site/releases/spark-release-3-4-0.html index 93d6688464..c3191a9016 100644 --- a/site/releases/spark-release-3-4-0.html +++ b/site/releases/spark-release-3-4-0.html @@ -131,10 +131,6 @@ To download Apache Spark 3.4.0, visit the https://spark.apache.org/downloads.html;>downloads page. You can consult JIRA for the https://s.apache.org/spark-3.4.0;>detailed changes. We have curated a list of high level changes here, grouped by major modules. - - This will become a table of contents (this text will be scraped). - - Highlight - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] HyukjinKwon commented on pull request #449: Fix a typo from release notes in Spark 3.4
HyukjinKwon commented on PR #449: URL: https://github.com/apache/spark-website/pull/449#issuecomment-1508240343 Merged to asf-site branch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-43123][SQL] Internal field metadata should not be leaked to catalogs
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4c938d62d79 [SPARK-43123][SQL] Internal field metadata should not be leaked to catalogs 4c938d62d79 is described below commit 4c938d62d791742b9f0c6a77b66fc06a90d7c0ad Author: Wenchen Fan AuthorDate: Fri Apr 14 17:08:29 2023 +0800 [SPARK-43123][SQL] Internal field metadata should not be leaked to catalogs ### What changes were proposed in this pull request? In Spark, we have defined some internal field metadata to help query resolution and compilation. For example, there are quite some field metadata that are related to metadata columns. However, when we create tables, these internal field metadata can be leaked. This PR updates CTAS/RTAS commands to remove these internal field metadata before creating tables. CREATE/REPLACE TABLE command is fine as users can't generate these internal field metadata via the type string. ### Why are the changes needed? to avoid potential issues, like mistakenly treating a data column as metadata column ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? new test Closes #40776 from cloud-fan/meta. Authored-by: Wenchen Fan Signed-off-by: Wenchen Fan --- .../spark/sql/catalyst/analysis/Analyzer.scala | 4 +-- .../apache/spark/sql/catalyst/util/package.scala | 34 ++ .../execution/command/createDataSourceTables.scala | 5 +-- .../datasources/v2/WriteToDataSourceV2Exec.scala | 42 +++--- .../spark/sql/connector/MetadataColumnSuite.scala | 16 + 5 files changed, 69 insertions(+), 32 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index 126b0618727..74592c15d23 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -41,7 +41,7 @@ import org.apache.spark.sql.catalyst.streaming.StreamingRelationV2 import org.apache.spark.sql.catalyst.trees.AlwaysProcess import org.apache.spark.sql.catalyst.trees.CurrentOrigin.withOrigin import org.apache.spark.sql.catalyst.trees.TreePattern._ -import org.apache.spark.sql.catalyst.util.{toPrettySQL, CharVarcharUtils, StringUtils} +import org.apache.spark.sql.catalyst.util.{toPrettySQL, AUTO_GENERATED_ALIAS, CharVarcharUtils, StringUtils} import org.apache.spark.sql.catalyst.util.ResolveDefaultColumns._ import org.apache.spark.sql.connector.catalog.{View => _, _} import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._ @@ -492,7 +492,7 @@ class Analyzer(override val catalogManager: CatalogManager) extends RuleExecutor case l: Literal => Alias(l, toPrettySQL(l))() case e => val metaForAutoGeneratedAlias = new MetadataBuilder() -.putString("__autoGeneratedAlias", "true") +.putString(AUTO_GENERATED_ALIAS, "true") .build() Alias(e, toPrettySQL(e))(explicitMetadata = Some(metaForAutoGeneratedAlias)) } diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala index e1ce45c1353..23ddb534af9 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala @@ -27,7 +27,7 @@ import com.google.common.io.ByteStreams import org.apache.spark.internal.Logging import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.internal.SQLConf -import org.apache.spark.sql.types.{MetadataBuilder, NumericType, StringType} +import org.apache.spark.sql.types.{MetadataBuilder, NumericType, StringType, StructType} import org.apache.spark.unsafe.types.UTF8String import org.apache.spark.util.Utils @@ -191,12 +191,13 @@ package object util extends Logging { val METADATA_COL_ATTR_KEY = "__metadata_col" + /** + * If set, this metadata column can only be accessed with qualifiers, e.g. `qualifiers.col` or + * `qualifiers.*`. If not set, metadata columns cannot be accessed via star. + */ + val QUALIFIED_ACCESS_ONLY = "__qualified_access_only" + implicit class MetadataColumnHelper(attr: Attribute) { -/** - * If set, this metadata column can only be accessed with qualifiers, e.g. `qualifiers.col` or - * `qualifiers.*`. If not set, metadata columns cannot be accessed via star. - */ -val QUALIFIED_ACCESS_ONLY = "__qualified_access_only" def
[GitHub] [spark-website] bjornjorgensen commented on pull request #449: Fix a typo from release notes in Spark 3.4
bjornjorgensen commented on PR #449: URL: https://github.com/apache/spark-website/pull/449#issuecomment-1508183332 Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] HyukjinKwon commented on pull request #447: Add 3.4.0 release note and news and update links
HyukjinKwon commented on PR #447: URL: https://github.com/apache/spark-website/pull/447#issuecomment-1508167653 https://github.com/apache/spark-website/pull/449 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] HyukjinKwon opened a new pull request, #449: Fix a typo from release notes in Spark 3.4
HyukjinKwon opened a new pull request, #449: URL: https://github.com/apache/spark-website/pull/449 This PR fixes "This will become a table of contents (this text will be scraped)." in the release notes that is a mistake. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] HyukjinKwon commented on pull request #447: Add 3.4.0 release note and news and update links
HyukjinKwon commented on PR #447: URL: https://github.com/apache/spark-website/pull/447#issuecomment-1508152159 uhoh, sure let me make a quick fix then. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] bjornjorgensen commented on pull request #447: Add 3.4.0 release note and news and update links
bjornjorgensen commented on PR #447: URL: https://github.com/apache/spark-website/pull/447#issuecomment-1508150253 @HyukjinKwon If you have a working install of Bundler this is the line https://github.com/apache/spark-website/blob/74099512789a0f4099716b5e6cf0882eed6f5d13/releases/_posts/2023-04-13-spark-release-3-4-0.md?plain=1#L20 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] bjornjorgensen commented on pull request #447: Add 3.4.0 release note and news and update links
bjornjorgensen commented on PR #447: URL: https://github.com/apache/spark-website/pull/447#issuecomment-1508130455 @HyukjinKwon I have some problems with ` bundle install Bundler 2.4.1 is running, but your lockfile was generated with 2.3.7. Installing Bundler 2.3.7 and restarting using that version. Fetching gem metadata from https://rubygems.org/. ` install failed! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] HyukjinKwon commented on pull request #447: Add 3.4.0 release note and news and update links
HyukjinKwon commented on PR #447: URL: https://github.com/apache/spark-website/pull/447#issuecomment-1508127146 @bjornjorgensen are you interested in making a PR please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] bjornjorgensen commented on pull request #447: Add 3.4.0 release note and news and update links
bjornjorgensen commented on PR #447: URL: https://github.com/apache/spark-website/pull/447#issuecomment-1508111219 We must remove "This will become a table of contents (this text will be scraped)." from https://spark.apache.org/releases/spark-release-3-4-0.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] PasunuriSrinidhi opened a new pull request, #448: Updated downloads.js
PasunuriSrinidhi opened a new pull request, #448: URL: https://github.com/apache/spark-website/pull/448 This will be the more advanced version and I think that the code provided looks well-structured and readable. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] HyukjinKwon closed pull request #447: Add 3.4.0 release note and news and update links
HyukjinKwon closed pull request #447: Add 3.4.0 release note and news and update links URL: https://github.com/apache/spark-website/pull/447 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] HyukjinKwon commented on pull request #447: Add 3.4.0 release note and news and update links
HyukjinKwon commented on PR #447: URL: https://github.com/apache/spark-website/pull/447#issuecomment-1508034288 Thanks all. Merged to asf-site branch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] HyukjinKwon commented on pull request #447: Add 3.4.0 release note and news and update links
HyukjinKwon commented on PR #447: URL: https://github.com/apache/spark-website/pull/447#issuecomment-1508021891 I fixed and force-pushed it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] yaooqinn commented on pull request #446: Add v3.4.0 docs
yaooqinn commented on PR #446: URL: https://github.com/apache/spark-website/pull/446#issuecomment-1507967472 late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] yaooqinn commented on pull request #444: Add v3.2.4 docs
yaooqinn commented on PR #444: URL: https://github.com/apache/spark-website/pull/444#issuecomment-1507966805 Thank you @dongjoon-hyun, late +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org