[GitHub] [spark] SparkQA commented on pull request #29497: [WIP][SPARK-32670][SQL]Group exception messages in Catalyst Analyzer in one file
SparkQA commented on pull request #29497: URL: https://github.com/apache/spark/pull/29497#issuecomment-727806593 **[Test build #131140 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131140/testReport)** for PR 29497 at commit [`645d81b`](https://github.com/apache/spark/commit/645d81bb4622c32119adab7c21c18ea3cce14fdb). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xuanyuanking commented on a change in pull request #30347: [SPARK-33209][SS] Refactor unit test of stream-stream join in UnsupportedOperationsSuite
xuanyuanking commented on a change in pull request #30347: URL: https://github.com/apache/spark/pull/30347#discussion_r523951738 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationsSuite.scala ## @@ -414,209 +412,135 @@ class UnsupportedOperationsSuite extends SparkFunSuite { batchStreamSupported = false, streamBatchSupported = false) - // Left outer joins: *-stream not allowed + // Left outer, left semi, left anti join: *-stream not allowed + Seq((LeftOuter, "LeftOuter join"), (LeftSemi, "LeftSemi join"), (LeftAnti, "Left anti join")) Review comment: super nit, let's keep naming style here? `LeftOut join` ... `LeftAnti join` or `left outer join` ... `left anti join`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30380: [SPARK-27421][SQL] Fix filter for int column and value class java.lang.String when pruning partition column
cloud-fan commented on a change in pull request #30380: URL: https://github.com/apache/spark/pull/30380#discussion_r523951682 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ## @@ -729,6 +729,7 @@ private[client] class Shim_v0_13 extends Shim_v0_12 { def unapply(expr: Expression): Option[Attribute] = { expr match { case attr: Attribute => Some(attr) + case Cast(IntegralType(), StringType, _) => None Review comment: good catch! I'm thinking if we should be more conservative here. How about ``` case Cast(child @ IntegralType(), dt: IntegralType, _) => if Cast.canUpCast... ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #30377: [SPARK-33453][SQL][TESTS] Unify v1 and v2 SHOW PARTITIONS tests
MaxGekk commented on a change in pull request #30377: URL: https://github.com/apache/spark/pull/30377#discussion_r523950223 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowPartitionsSuite.scala ## @@ -0,0 +1,198 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command.v1 + +import org.apache.spark.sql.{AnalysisException, Row, SaveMode} +import org.apache.spark.sql.catalyst.analysis.NoSuchTableException +import org.apache.spark.sql.connector.catalog.CatalogManager +import org.apache.spark.sql.execution.command +import org.apache.spark.sql.test.SharedSparkSession + +trait ShowPartitionsSuiteBase extends command.ShowPartitionsSuiteBase { + override def version: String = "V1" + override def catalog: String = CatalogManager.SESSION_CATALOG_NAME + override def defaultNamespace: Seq[String] = Seq("default") + override def defaultUsing: String = "USING parquet" + + protected def createDateTable(table: String): Unit = { +sql(s""" + |CREATE TABLE $table (price int, qty int, year int, month int) + |$defaultUsing + |partitioned by (year, month)""".stripMargin) + } + + protected def fillDateTable(table: String): Unit = { +sql(s"INSERT INTO $table PARTITION(year = 2015, month = 1) SELECT 1, 1") +sql(s"INSERT INTO $table PARTITION(year = 2015, month = 2) SELECT 2, 2") +sql(s"INSERT INTO $table PARTITION(year = 2016, month = 2) SELECT 3, 3") +sql(s"INSERT INTO $table PARTITION(year = 2016, month = 3) SELECT 3, 3") + } + + protected def createWideTable(table: String): Unit = { +sql(s""" + |CREATE TABLE $table ( + | price int, qty int, + | year int, month int, hour int, minute int, sec int, extra int) + |$defaultUsing + |PARTITIONED BY (year, month, hour, minute, sec, extra)""".stripMargin) + } + + protected def fillWideTable(table: String): Unit = { +sql(s""" + |INSERT INTO $table + |PARTITION(year = 2016, month = 3, hour = 10, minute = 10, sec = 10, extra = 1) SELECT 3, 3 + """.stripMargin) +sql(s""" + |INSERT INTO $table + |PARTITION(year = 2016, month = 4, hour = 10, minute = 10, sec = 10, extra = 1) SELECT 3, 3 + """.stripMargin) + } + + test("show everything") { +val table = "dateTable" +withTable(table) { + createDateTable(table) + fillDateTable(table) + checkAnswer( +sql(s"show partitions $table"), +Row("year=2015/month=1") :: + Row("year=2015/month=2") :: + Row("year=2016/month=2") :: + Row("year=2016/month=3") :: Nil) + + checkAnswer( +sql(s"show partitions default.$table"), +Row("year=2015/month=1") :: + Row("year=2015/month=2") :: + Row("year=2016/month=2") :: + Row("year=2016/month=3") :: Nil) +} + } + + test("filter by partitions") { +val table = "dateTable" +withTable(table) { + createDateTable(table) + fillDateTable(table) + checkAnswer( +sql(s"show partitions default.$table PARTITION(year=2015)"), +Row("year=2015/month=1") :: + Row("year=2015/month=2") :: Nil) + checkAnswer( +sql(s"show partitions default.$table PARTITION(year=2015, month=1)"), +Row("year=2015/month=1") :: Nil) + checkAnswer( +sql(s"show partitions default.$table PARTITION(month=2)"), +Row("year=2015/month=2") :: + Row("year=2016/month=2") :: Nil) +} + } + + test("show everything more than 5 part keys") { +val table = "wideTable" +withTable(table) { + createWideTable(table) + fillWideTable(table) + checkAnswer( +sql(s"show partitions $table"), +Row("year=2016/month=3/hour=10/minute=10/sec=10/extra=1") :: + Row("year=2016/month=4/hour=10/minute=10/sec=10/extra=1") :: Nil) +} + } + + test("non-partitioning columns") { +val table = "dateTable" +withTable(table) { + createDateTable(table) + fillDateTable(table) + val errMsg = intercept[AnalysisException] { +sql(s"SHOW PARTITIONS $table PARTITION(abcd=2015, xyz=1)") + }.getMessage +
[GitHub] [spark] SparkQA commented on pull request #30384: [SPARK-33456][SQL][TEST][FOLLOWUP] Fix SUBEXPRESSION_ELIMINATION_ENABLED config name
SparkQA commented on pull request #30384: URL: https://github.com/apache/spark/pull/30384#issuecomment-727800909 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35740/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30300: [SPARK-33399][SQL] Normalize output partitioning and sortorder with respect to aliases to avoid unneeded exchange/sort nodes
SparkQA commented on pull request #30300: URL: https://github.com/apache/spark/pull/30300#issuecomment-727799605 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35739/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29497: [WIP][SPARK-32670][SQL]Group exception messages in Catalyst Analyzer in one file
cloud-fan commented on a change in pull request #29497: URL: https://github.com/apache/spark/pull/29497#discussion_r523946524 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/QueryCompilationErrors.scala ## @@ -0,0 +1,164 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.errors + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.expressions.{Expression, GroupingID} +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.catalyst.util.toPrettySQL +import org.apache.spark.sql.connector.catalog.TableChange +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, DataType, StructType} + +/** + * Object for grouping all error messages in catalyst. + * Currently it includes all AnalysisExcpetions created and thrown directly in + * org.apache.spark.sql.catalyst.analysis.Analyzer. + */ +object QueryCompilationErrors { + def groupingIDMismatchError(groupingID: GroupingID, groupByExprs: Seq[Expression]): Throwable = { +new AnalysisException( + s"Columns of grouping_id (${groupingID.groupByExprs.mkString(",")}) " + +s"does not match grouping columns (${groupByExprs.mkString(",")})") + } + + def groupingColInvalidError(groupingCol: Expression, groupByExprs: Seq[Expression]): Throwable = { +new AnalysisException( + s"Column of grouping ($groupingCol) can't be found " + +s"in grouping columns ${groupByExprs.mkString(",")}") + } + + def groupingSizeTooLargeError(sizeLimit: Int): Throwable = { +new AnalysisException( + s"Grouping sets size cannot be greater than $sizeLimit") + } + + def unorderablePivotColError(pivotCol: Expression): Throwable = { +new AnalysisException( + s"Invalid pivot column '$pivotCol'. Pivot columns must be comparable." +) + } + + def nonliteralPivotValError(pivotVal: Expression): Throwable = { Review comment: `nonLiteral...` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29497: [WIP][SPARK-32670][SQL]Group exception messages in Catalyst Analyzer in one file
cloud-fan commented on a change in pull request #29497: URL: https://github.com/apache/spark/pull/29497#discussion_r523946122 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/QueryCompilationErrors.scala ## @@ -0,0 +1,164 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.errors + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.expressions.{Expression, GroupingID} +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.catalyst.util.toPrettySQL +import org.apache.spark.sql.connector.catalog.TableChange +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{AbstractDataType, DataType, StructType} + +/** + * Object for grouping all error messages in catalyst. Review comment: `Object for grouping all error messages of the query compilation.` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] luluorta commented on a change in pull request #30299: [SPARK-33389][SQL] Make internal classes of SparkSession always using active SQLConf
luluorta commented on a change in pull request #30299: URL: https://github.com/apache/spark/pull/30299#discussion_r523940307 ## File path: sql/core/src/test/resources/sql-tests/inputs/datetime.sql ## @@ -149,7 +149,7 @@ select to_timestamp('2019-10-06 A', '-MM-dd G'); select to_timestamp('22 05 2020 Friday', 'dd MM EE'); select to_timestamp('22 05 2020 Friday', 'dd MM E'); select unix_timestamp('22 05 2020 Friday', 'dd MM E'); -select from_json('{"time":"26/October/2015"}', 'time Timestamp', map('timestampFormat', 'dd/M/')); -select from_json('{"date":"26/October/2015"}', 'date Date', map('dateFormat', 'dd/M/')); -select from_csv('26/October/2015', 'time Timestamp', map('timestampFormat', 'dd/M/')); -select from_csv('26/October/2015', 'date Date', map('dateFormat', 'dd/M/')); +select from_json('{"ts":"26/October/2015"}', 'ts Timestamp', map('timestampFormat', 'dd/M/')); Review comment: I opened a new PR for this issue [https://github.com/apache/spark/pull/30357](https://github.com/apache/spark/pull/30357) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30299: [SPARK-33389][SQL] Make internal classes of SparkSession always using active SQLConf
cloud-fan commented on a change in pull request #30299: URL: https://github.com/apache/spark/pull/30299#discussion_r523944573 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala ## @@ -172,7 +172,7 @@ case class HiveTableScanExec( prunePartitions(hivePartitions) } } else { - if (sparkSession.sessionState.conf.metastorePartitionPruning && Review comment: let's keep this unchanged for now. We may override `def conf` in `SparkPlan` later, to always get conf from the captured spark session. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] luluorta commented on a change in pull request #30299: [SPARK-33389][SQL] Make internal classes of SparkSession always using active SQLConf
luluorta commented on a change in pull request #30299: URL: https://github.com/apache/spark/pull/30299#discussion_r523940307 ## File path: sql/core/src/test/resources/sql-tests/inputs/datetime.sql ## @@ -149,7 +149,7 @@ select to_timestamp('2019-10-06 A', '-MM-dd G'); select to_timestamp('22 05 2020 Friday', 'dd MM EE'); select to_timestamp('22 05 2020 Friday', 'dd MM E'); select unix_timestamp('22 05 2020 Friday', 'dd MM E'); -select from_json('{"time":"26/October/2015"}', 'time Timestamp', map('timestampFormat', 'dd/M/')); -select from_json('{"date":"26/October/2015"}', 'date Date', map('dateFormat', 'dd/M/')); -select from_csv('26/October/2015', 'time Timestamp', map('timestampFormat', 'dd/M/')); -select from_csv('26/October/2015', 'date Date', map('dateFormat', 'dd/M/')); +select from_json('{"ts":"26/October/2015"}', 'ts Timestamp', map('timestampFormat', 'dd/M/')); Review comment: I opened a new PR for this issue [https://github.com/apache/spark/pull/30357](https://github.com/apache/spark/pull/30357) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.
SparkQA commented on pull request #27735: URL: https://github.com/apache/spark/pull/27735#issuecomment-727796003 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35738/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] luluorta commented on a change in pull request #30299: [SPARK-33389][SQL] Make internal classes of SparkSession always using active SQLConf
luluorta commented on a change in pull request #30299: URL: https://github.com/apache/spark/pull/30299#discussion_r523940307 ## File path: sql/core/src/test/resources/sql-tests/inputs/datetime.sql ## @@ -149,7 +149,7 @@ select to_timestamp('2019-10-06 A', '-MM-dd G'); select to_timestamp('22 05 2020 Friday', 'dd MM EE'); select to_timestamp('22 05 2020 Friday', 'dd MM E'); select unix_timestamp('22 05 2020 Friday', 'dd MM E'); -select from_json('{"time":"26/October/2015"}', 'time Timestamp', map('timestampFormat', 'dd/M/')); -select from_json('{"date":"26/October/2015"}', 'date Date', map('dateFormat', 'dd/M/')); -select from_csv('26/October/2015', 'time Timestamp', map('timestampFormat', 'dd/M/')); -select from_csv('26/October/2015', 'date Date', map('dateFormat', 'dd/M/')); +select from_json('{"ts":"26/October/2015"}', 'ts Timestamp', map('timestampFormat', 'dd/M/')); Review comment: I opened a new PR for this issue [https://github.com/apache/spark/pull/30357](url) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30299: [SPARK-33389][SQL] Make internal classes of SparkSession always using active SQLConf
SparkQA commented on pull request #30299: URL: https://github.com/apache/spark/pull/30299#issuecomment-727791622 **[Test build #131139 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131139/testReport)** for PR 30299 at commit [`bf1e56a`](https://github.com/apache/spark/commit/bf1e56a87d662b9657e5f46380628b06b9c9e359). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30364: [SPARK-33140][SQL][FOLLOW-UP] Revert code that not use passed-in SparkSession to get SQLConf.
SparkQA commented on pull request #30364: URL: https://github.com/apache/spark/pull/30364#issuecomment-727788257 **[Test build #131138 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131138/testReport)** for PR 30364 at commit [`5cd23d7`](https://github.com/apache/spark/commit/5cd23d7836b17a400fc224afd691c79418352afb). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #30383: [SPARK-33458][SQL] Hive partition pruning support Contains, StartsWith and EndsWith predicate
cloud-fan closed pull request #30383: URL: https://github.com/apache/spark/pull/30383 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #30383: [SPARK-33458][SQL] Hive partition pruning support Contains, StartsWith and EndsWith predicate
cloud-fan commented on pull request #30383: URL: https://github.com/apache/spark/pull/30383#issuecomment-727786769 thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30384: [SPARK-33456][SQL][TEST][FOLLOWUP] Fix SUBEXPRESSION_ELIMINATION_ENABLED config name
SparkQA commented on pull request #30384: URL: https://github.com/apache/spark/pull/30384#issuecomment-727784817 **[Test build #131137 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131137/testReport)** for PR 30384 at commit [`6527137`](https://github.com/apache/spark/commit/65271378ed6b813f28f0a5753d8f9c61b86f0bb1). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #30358: [SPARK-33394][SQL][TESTS] Throw `NoSuchNamespaceException` for not existing namespace in `InMemoryTableCatalog.listTables()`
cloud-fan closed pull request #30358: URL: https://github.com/apache/spark/pull/30358 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #30358: [SPARK-33394][SQL][TESTS] Throw `NoSuchNamespaceException` for not existing namespace in `InMemoryTableCatalog.listTables()`
cloud-fan commented on pull request #30358: URL: https://github.com/apache/spark/pull/30358#issuecomment-727782490 thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #30381: [SPARK-33456][SQL][TEST] Add end-to-end test for subexpression elimination
viirya commented on a change in pull request #30381: URL: https://github.com/apache/spark/pull/30381#discussion_r523932172 ## File path: sql/core/src/test/resources/sql-tests/inputs/subexp-elimination.sql ## @@ -0,0 +1,37 @@ +-- Test for subexpression elimination. + +--SET spark.sql.optimizer.enableJsonExpressionOptimization=false + +--CONFIG_DIM1 spark.sql.codegen.wholeStage=true +--CONFIG_DIM1 spark.sql.codegen.wholeStage=false + +--CONFIG_DIM2 spark.sql.codegen.factoryMode=CODEGEN_ONLY +--CONFIG_DIM2 spark.sql.codegen.factoryMode=NO_CODEGEN + +--CONFIG_DIM3 SUBEXPRESSION_ELIMINATION_ENABLED=true +--CONFIG_DIM3 SUBEXPRESSION_ELIMINATION_ENABLED=false Review comment: Fix in #30384. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya opened a new pull request #30384: [SPARK-33456][SQL][TEST][FOLLOWUP] Fix SUBEXPRESSION_ELIMINATION_ENABLED config name
viirya opened a new pull request #30384: URL: https://github.com/apache/spark/pull/30384 ### What changes were proposed in this pull request? To fix wrong config name in `subexp-elimination.sql`. ### Why are the changes needed? `CONFIG_DIM` should use config name's key. ### Does this PR introduce _any_ user-facing change? No, dev only. ### How was this patch tested? Unit test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30368: [SPARK-33442][SQL] Change Combine Limit to Eliminate limit using max row
cloud-fan commented on a change in pull request #30368: URL: https://github.com/apache/spark/pull/30368#discussion_r523931544 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -1452,11 +1452,27 @@ object PushPredicateThroughJoin extends Rule[LogicalPlan] with PredicateHelper { } /** - * Combines two adjacent [[Limit]] operators into one, merging the - * expressions into one single expression. + * 1. Eliminate [[Limit]] operators if it's child max row <= limit. + * 2. Combines two adjacent [[Limit]] operators into one, merging the + *expressions into one single expression. */ -object CombineLimits extends Rule[LogicalPlan] { - def apply(plan: LogicalPlan): LogicalPlan = plan transform { +object EliminateLimits extends Rule[LogicalPlan] { + private def canEliminate(limitExpr: Expression, child: LogicalPlan): Boolean = { +// We skip such case that Sort is after Limit since +// SparkStrategies will convert them to TakeOrderedAndProjectExec +val skipEliminate = child match { + case Sort(_, true, _) => true Review comment: does `TakeOrderedAndProjectExec` really help if we need to get and sort all the output? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.
AmplabJenkins removed a comment on pull request #27735: URL: https://github.com/apache/spark/pull/27735#issuecomment-727779872 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/131135/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.
SparkQA removed a comment on pull request #27735: URL: https://github.com/apache/spark/pull/27735#issuecomment-727774687 **[Test build #131135 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131135/testReport)** for PR 27735 at commit [`ba93111`](https://github.com/apache/spark/commit/ba93111b8ccfc7958e4facf57280a7c980beed84). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.
AmplabJenkins removed a comment on pull request #27735: URL: https://github.com/apache/spark/pull/27735#issuecomment-727779863 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.
AmplabJenkins commented on pull request #27735: URL: https://github.com/apache/spark/pull/27735#issuecomment-727779863 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.
SparkQA commented on pull request #27735: URL: https://github.com/apache/spark/pull/27735#issuecomment-727779713 **[Test build #131135 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131135/testReport)** for PR 27735 at commit [`ba93111`](https://github.com/apache/spark/commit/ba93111b8ccfc7958e4facf57280a7c980beed84). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #30381: [SPARK-33456][SQL][TEST] Add end-to-end test for subexpression elimination
viirya commented on a change in pull request #30381: URL: https://github.com/apache/spark/pull/30381#discussion_r523929393 ## File path: sql/core/src/test/resources/sql-tests/inputs/subexp-elimination.sql ## @@ -0,0 +1,37 @@ +-- Test for subexpression elimination. + +--SET spark.sql.optimizer.enableJsonExpressionOptimization=false + +--CONFIG_DIM1 spark.sql.codegen.wholeStage=true +--CONFIG_DIM1 spark.sql.codegen.wholeStage=false + +--CONFIG_DIM2 spark.sql.codegen.factoryMode=CODEGEN_ONLY +--CONFIG_DIM2 spark.sql.codegen.factoryMode=NO_CODEGEN + +--CONFIG_DIM3 SUBEXPRESSION_ELIMINATION_ENABLED=true +--CONFIG_DIM3 SUBEXPRESSION_ELIMINATION_ENABLED=false Review comment: Oops, let me fix it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30300: [SPARK-33399][SQL] Normalize output partitioning and sortorder with respect to aliases to avoid unneeded exchange/sort nodes
SparkQA commented on pull request #30300: URL: https://github.com/apache/spark/pull/30300#issuecomment-727778057 **[Test build #131136 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131136/testReport)** for PR 30300 at commit [`16e1db2`](https://github.com/apache/spark/commit/16e1db202f3522ebe607894b93811bb54bdd9a0c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #30381: [SPARK-33456][SQL][TEST] Add end-to-end test for subexpression elimination
HyukjinKwon commented on a change in pull request #30381: URL: https://github.com/apache/spark/pull/30381#discussion_r523929112 ## File path: sql/core/src/test/resources/sql-tests/inputs/subexp-elimination.sql ## @@ -0,0 +1,37 @@ +-- Test for subexpression elimination. + +--SET spark.sql.optimizer.enableJsonExpressionOptimization=false + +--CONFIG_DIM1 spark.sql.codegen.wholeStage=true +--CONFIG_DIM1 spark.sql.codegen.wholeStage=false + +--CONFIG_DIM2 spark.sql.codegen.factoryMode=CODEGEN_ONLY +--CONFIG_DIM2 spark.sql.codegen.factoryMode=NO_CODEGEN + +--CONFIG_DIM3 SUBEXPRESSION_ELIMINATION_ENABLED=true +--CONFIG_DIM3 SUBEXPRESSION_ELIMINATION_ENABLED=false Review comment: Wait, shouldn't it be the actual config names? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #30381: [SPARK-33456][SQL][TEST] Add end-to-end test for subexpression elimination
cloud-fan commented on a change in pull request #30381: URL: https://github.com/apache/spark/pull/30381#discussion_r523929310 ## File path: sql/core/src/test/resources/sql-tests/inputs/subexp-elimination.sql ## @@ -0,0 +1,37 @@ +-- Test for subexpression elimination. + +--SET spark.sql.optimizer.enableJsonExpressionOptimization=false + +--CONFIG_DIM1 spark.sql.codegen.wholeStage=true +--CONFIG_DIM1 spark.sql.codegen.wholeStage=false + +--CONFIG_DIM2 spark.sql.codegen.factoryMode=CODEGEN_ONLY +--CONFIG_DIM2 spark.sql.codegen.factoryMode=NO_CODEGEN + +--CONFIG_DIM3 SUBEXPRESSION_ELIMINATION_ENABLED=true Review comment: is `SUBEXPRESSION_ELIMINATION_ENABLED` a config name? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate
AmplabJenkins removed a comment on pull request #28386: URL: https://github.com/apache/spark/pull/28386#issuecomment-727774065 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/35737/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate
AmplabJenkins removed a comment on pull request #28386: URL: https://github.com/apache/spark/pull/28386#issuecomment-727774055 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2
AmplabJenkins removed a comment on pull request #30378: URL: https://github.com/apache/spark/pull/30378#issuecomment-727774994 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29497: [WIP][SPARK-32670][SQL]Group exception messages in Catalyst Analyzer in one file
AmplabJenkins removed a comment on pull request #29497: URL: https://github.com/apache/spark/pull/29497#issuecomment-727775302 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #30358: [SPARK-33394][SQL][TESTS] Throw `NoSuchNamespaceException` for not existing namespace in `InMemoryTableCatalog.listTables()`
MaxGekk commented on a change in pull request #30358: URL: https://github.com/apache/spark/pull/30358#discussion_r523928527 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryTableCatalog.scala ## @@ -181,9 +181,21 @@ class InMemoryTableCatalog extends BasicInMemoryTableCatalog with SupportsNamesp override def dropNamespace(namespace: Array[String]): Boolean = { listNamespaces(namespace).foreach(dropNamespace) -listTables(namespace).foreach(dropTable) Review comment: @HyukjinKwon You are right, this is test only PR. I changed PR's title and description. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29497: [WIP][SPARK-32670][SQL]Group exception messages in Catalyst Analyzer in one file
AmplabJenkins commented on pull request #29497: URL: https://github.com/apache/spark/pull/29497#issuecomment-727775302 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2
AmplabJenkins commented on pull request #30378: URL: https://github.com/apache/spark/pull/30378#issuecomment-727774994 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29497: [WIP][SPARK-32670][SQL]Group exception messages in Catalyst Analyzer in one file
SparkQA commented on pull request #29497: URL: https://github.com/apache/spark/pull/29497#issuecomment-727775287 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35736/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2
SparkQA commented on pull request #30378: URL: https://github.com/apache/spark/pull/30378#issuecomment-727774961 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35735/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.
SparkQA commented on pull request #27735: URL: https://github.com/apache/spark/pull/27735#issuecomment-727774687 **[Test build #131135 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131135/testReport)** for PR 27735 at commit [`ba93111`](https://github.com/apache/spark/commit/ba93111b8ccfc7958e4facf57280a7c980beed84). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #30381: [SPARK-33456][SQL][TEST] Add end-to-end test for subexpression elimination
viirya commented on pull request #30381: URL: https://github.com/apache/spark/pull/30381#issuecomment-727774217 Thanks @HyukjinKwon @dongjoon-hyun This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate
SparkQA commented on pull request #28386: URL: https://github.com/apache/spark/pull/28386#issuecomment-727774014 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35737/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate
AmplabJenkins commented on pull request #28386: URL: https://github.com/apache/spark/pull/28386#issuecomment-727774055 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #30381: [SPARK-33456][SQL][TEST] Add end-to-end test for subexpression elimination
HyukjinKwon closed pull request #30381: URL: https://github.com/apache/spark/pull/30381 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #30381: [SPARK-33456][SQL][TEST] Add end-to-end test for subexpression elimination
HyukjinKwon commented on pull request #30381: URL: https://github.com/apache/spark/pull/30381#issuecomment-727773690 Merged to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MichaelChirico commented on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate
MichaelChirico commented on pull request #28386: URL: https://github.com/apache/spark/pull/28386#issuecomment-727768929 @HyukjinKwon test added, please have a look This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #29797: [SPARK-32932][SQL] Do not use local shuffle reader at final stage on write command
cloud-fan commented on pull request #29797: URL: https://github.com/apache/spark/pull/29797#issuecomment-727768565 I'm reading the classdoc of `OptimizeLocalShuffleReader`, and I do feel the design is a bit hacky. We add local shuffle reader if 1. the shuffle is the root node of a query stage. 2. the shuffle is BHJ build side. The reason for condition 1 is it will never introduce shuffle. This is true, but this may change the final output partitioning which may be bad for cases like write command. I like the idea from @maryannxue which is more general: 1) move LSR rule into postStageCreationRules; and 2) make the LSR rule match an Exchange first (so condition 1 becomes: the shuffle is a direct child of an exchange). By doing this we can skip LSR rule in the last stage, as the last stage's root node is not exchange. I'm not very sure why the current approach can add LSR to BHJ probe side. This seems like an accident to me as it's not mentioned in the classdoc. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30368: [SPARK-33442][SQL] Change Combine Limit to Eliminate limit using max row
AmplabJenkins removed a comment on pull request #30368: URL: https://github.com/apache/spark/pull/30368#issuecomment-727767077 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30368: [SPARK-33442][SQL] Change Combine Limit to Eliminate limit using max row
AmplabJenkins commented on pull request #30368: URL: https://github.com/apache/spark/pull/30368#issuecomment-727767077 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30368: [SPARK-33442][SQL] Change Combine Limit to Eliminate limit using max row
SparkQA removed a comment on pull request #30368: URL: https://github.com/apache/spark/pull/30368#issuecomment-727680480 **[Test build #131118 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131118/testReport)** for PR 30368 at commit [`7b4e4d6`](https://github.com/apache/spark/commit/7b4e4d6613b14704063883114116143cb97c3c74). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30368: [SPARK-33442][SQL] Change Combine Limit to Eliminate limit using max row
SparkQA commented on pull request #30368: URL: https://github.com/apache/spark/pull/30368#issuecomment-727766024 **[Test build #131118 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131118/testReport)** for PR 30368 at commit [`7b4e4d6`](https://github.com/apache/spark/commit/7b4e4d6613b14704063883114116143cb97c3c74). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate
AmplabJenkins removed a comment on pull request #28386: URL: https://github.com/apache/spark/pull/28386#issuecomment-727764812 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate
SparkQA removed a comment on pull request #28386: URL: https://github.com/apache/spark/pull/28386#issuecomment-727750806 **[Test build #131134 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131134/testReport)** for PR 28386 at commit [`3359fe3`](https://github.com/apache/spark/commit/3359fe3985ff03d4bec328bef3ce3a9e6be48cae). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate
AmplabJenkins commented on pull request #28386: URL: https://github.com/apache/spark/pull/28386#issuecomment-727764812 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2
SparkQA commented on pull request #30378: URL: https://github.com/apache/spark/pull/30378#issuecomment-727764668 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35735/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate
SparkQA commented on pull request #28386: URL: https://github.com/apache/spark/pull/28386#issuecomment-727764685 **[Test build #131134 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131134/testReport)** for PR 28386 at commit [`3359fe3`](https://github.com/apache/spark/commit/3359fe3985ff03d4bec328bef3ce3a9e6be48cae). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate
SparkQA commented on pull request #28386: URL: https://github.com/apache/spark/pull/28386#issuecomment-727763827 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35737/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29497: [WIP][SPARK-32670][SQL]Group exception messages in Catalyst Analyzer in one file
SparkQA commented on pull request #29497: URL: https://github.com/apache/spark/pull/29497#issuecomment-727762739 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35736/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2
dongjoon-hyun commented on pull request #30378: URL: https://github.com/apache/spark/pull/30378#issuecomment-727758955 Yay! Thank you, @viirya and @HyukjinKwon !  This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2
HyukjinKwon closed pull request #30378: URL: https://github.com/apache/spark/pull/30378 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2
HyukjinKwon commented on pull request #30378: URL: https://github.com/apache/spark/pull/30378#issuecomment-727757148 Merged to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate
AmplabJenkins removed a comment on pull request #28386: URL: https://github.com/apache/spark/pull/28386#issuecomment-727751358 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate
AmplabJenkins commented on pull request #28386: URL: https://github.com/apache/spark/pull/28386#issuecomment-727751358 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate
SparkQA commented on pull request #28386: URL: https://github.com/apache/spark/pull/28386#issuecomment-727751348 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35733/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate
SparkQA commented on pull request #28386: URL: https://github.com/apache/spark/pull/28386#issuecomment-727750806 **[Test build #131134 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131134/testReport)** for PR 28386 at commit [`3359fe3`](https://github.com/apache/spark/commit/3359fe3985ff03d4bec328bef3ce3a9e6be48cae). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2
dongjoon-hyun commented on pull request #30378: URL: https://github.com/apache/spark/pull/30378#issuecomment-727750403 Could you approve once more please? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2
dongjoon-hyun commented on pull request #30378: URL: https://github.com/apache/spark/pull/30378#issuecomment-727750221 Now, the new job is green. ![Screen Shot 2020-11-15 at 9 46 58 PM](https://user-images.githubusercontent.com/9700541/99217092-1a97e900-278c-11eb-9155-1734d0aec251.png) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on pull request #30164: [SPARK-32919][SHUFFLE][test-maven][test-hadoop2.7] Driver side changes for coordinating push based shuffle by selecting external shuffle serv
Ngone51 commented on pull request #30164: URL: https://github.com/apache/spark/pull/30164#issuecomment-727749010 LGTM if all tests pass. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2
dongjoon-hyun commented on pull request #30378: URL: https://github.com/apache/spark/pull/30378#issuecomment-727749128 Sorry for the rebasing after your approval, @HyukjinKwon , @wangyum , @viirya . It was inevitable to bring the latest master branch. (I didn't notice that GitHub Action doesn't use the latest master.) Also, I switched to Java 8 according to @HyukjinKwon 's advice. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #30164: [SPARK-32919][SHUFFLE][test-maven][test-hadoop2.7] Driver side changes for coordinating push based shuffle by selecting external
Ngone51 commented on a change in pull request #30164: URL: https://github.com/apache/spark/pull/30164#discussion_r523909743 ## File path: core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala ## @@ -657,6 +688,43 @@ class BlockManagerMasterEndpoint( } } + private def getShufflePushMergerLocations( + numMergersNeeded: Int, + hostsToFilter: Set[String]): Seq[BlockManagerId] = { +val blockManagersWithExecutors = blockManagerIdByExecutor.groupBy(_._2.host) + .mapValues(_.head).values.map(_._2).toSet +val filteredBlockManagersWithExecutors = blockManagersWithExecutors + .filterNot(x => hostsToFilter.contains(x.host)) +val filteredMergersWithExecutors = filteredBlockManagersWithExecutors.map( + x => BlockManagerId(x.executorId, x.host, StorageUtils.externalShuffleServicePort(conf))) + +// Enough mergers are available as part of active executors list +if (filteredMergersWithExecutors.size >= numMergersNeeded) { + filteredMergersWithExecutors.toSeq +} else { + // Delta mergers added from inactive mergers list to the active mergers list + val filteredMergersWithExecutorsHosts = filteredMergersWithExecutors.map(_.host) + val filteredMergersWithoutExecutors = shuffleMergerLocations.values +.filterNot(x => hostsToFilter.contains(x.host)) +.filterNot(x => filteredMergersWithExecutorsHosts.contains(x.host)) + val randomFilteredMergersLocations = +if (filteredMergersWithoutExecutors.size > + numMergersNeeded - filteredMergersWithExecutors.size) { + Utils.randomize(filteredMergersWithoutExecutors) +} else { + filteredMergersWithoutExecutors +} + filteredMergersWithExecutors.toSeq ++ randomFilteredMergersLocations +.take(numMergersNeeded - filteredMergersWithExecutors.size) Review comment: We can only perform `take()` when `randomize()` is performed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2
HyukjinKwon commented on a change in pull request #30378: URL: https://github.com/apache/spark/pull/30378#discussion_r523909079 ## File path: .github/workflows/build_and_test.yml ## @@ -14,6 +14,28 @@ on: required: true jobs: + # This is on the top to give the most visibility in case of failures + hadoop-2: +name: Hadoop 2 build +runs-on: ubuntu-20.04 +steps: +- name: Checkout Spark repository + uses: actions/checkout@v2 +- name: Cache Coursier local repository + uses: actions/cache@v2 + with: +path: ~/.cache/coursier +key: hadoop-2-coursier-${{ hashFiles('**/pom.xml', '**/plugins.sbt') }} +restore-keys: | + hadoop-2-coursier- +- name: Install Java 11 + uses: actions/setup-java@v1 + with: +java-version: 11 Review comment: Sure, I guess it's fine. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29497: [WIP][SPARK-32670][SQL]Group exception messages in Catalyst Analyzer in one file
SparkQA commented on pull request #29497: URL: https://github.com/apache/spark/pull/29497#issuecomment-727745405 **[Test build #131133 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131133/testReport)** for PR 29497 at commit [`f391721`](https://github.com/apache/spark/commit/f39172149a266072128f5adf8308cad55efe2d45). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2
SparkQA commented on pull request #30378: URL: https://github.com/apache/spark/pull/30378#issuecomment-727745360 **[Test build #131132 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131132/testReport)** for PR 30378 at commit [`f26fc30`](https://github.com/apache/spark/commit/f26fc30f6c068b7381741505cb19369c720c49f3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate
SparkQA commented on pull request #28386: URL: https://github.com/apache/spark/pull/28386#issuecomment-727743297 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35733/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29497: [WIP][SPARK-32670][SQL]Group exception messages in Catalyst Analyzer in one file
AmplabJenkins removed a comment on pull request #29497: URL: https://github.com/apache/spark/pull/29497#issuecomment-727741699 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/131131/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29497: [WIP][SPARK-32670][SQL]Group exception messages in Catalyst Analyzer in one file
SparkQA removed a comment on pull request #29497: URL: https://github.com/apache/spark/pull/29497#issuecomment-727740247 **[Test build #131131 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131131/testReport)** for PR 29497 at commit [`032c916`](https://github.com/apache/spark/commit/032c9160b3cbaefa2be2a3ba7c4c57530a3e4d6c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29497: [WIP][SPARK-32670][SQL]Group exception messages in Catalyst Analyzer in one file
AmplabJenkins removed a comment on pull request #29497: URL: https://github.com/apache/spark/pull/29497#issuecomment-727741364 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29497: [WIP][SPARK-32670][SQL]Group exception messages in Catalyst Analyzer in one file
SparkQA commented on pull request #29497: URL: https://github.com/apache/spark/pull/29497#issuecomment-727741674 **[Test build #131131 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131131/testReport)** for PR 29497 at commit [`032c916`](https://github.com/apache/spark/commit/032c9160b3cbaefa2be2a3ba7c4c57530a3e4d6c). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29497: [WIP][SPARK-32670][SQL]Group exception messages in Catalyst Analyzer in one file
AmplabJenkins commented on pull request #29497: URL: https://github.com/apache/spark/pull/29497#issuecomment-727741693 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29497: [WIP][SPARK-32670][SQL]Group exception messages in Catalyst Analyzer in one file
AmplabJenkins removed a comment on pull request #29497: URL: https://github.com/apache/spark/pull/29497#issuecomment-727741355 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29497: [WIP][SPARK-32670][SQL]Group exception messages in Catalyst Analyzer in one file
AmplabJenkins commented on pull request #29497: URL: https://github.com/apache/spark/pull/29497#issuecomment-727741355 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2
wangyum commented on pull request #30378: URL: https://github.com/apache/spark/pull/30378#issuecomment-727740477 Thank you @dongjoon-hyun. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2
dongjoon-hyun commented on pull request #30378: URL: https://github.com/apache/spark/pull/30378#issuecomment-727740466 It's weird. The new code is not consumed at the GitHub Action re-trigger. I'll rebase this to the master~ ``` [error] /home/runner/work/spark/spark/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala:320:52: type mismatch; 706 [error] found : Long 707 [error] required: Int 708 [error] Resource.newInstance(resourcesWithDefaults.totalMemMiB, resourcesWithDefaults.cores) 709 [error]^ 710 [error] one error found ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29497: [WIP][SPARK-32670][SQL]Group exception messages in Catalyst Analyzer in one file
SparkQA commented on pull request #29497: URL: https://github.com/apache/spark/pull/29497#issuecomment-727740247 **[Test build #131131 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131131/testReport)** for PR 29497 at commit [`032c916`](https://github.com/apache/spark/commit/032c9160b3cbaefa2be2a3ba7c4c57530a3e4d6c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2
dongjoon-hyun edited a comment on pull request #30378: URL: https://github.com/apache/spark/pull/30378#issuecomment-727739673 Thank you, @wangyum . I'm looking at both this PR and vanilla master branch, too. The previous failure might happen by another unknown reason instead of the API issue. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2
dongjoon-hyun commented on pull request #30378: URL: https://github.com/apache/spark/pull/30378#issuecomment-727739673 Thank you, @wangyum . I'm looking at both this PR and vanilla master branch, too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate
AmplabJenkins removed a comment on pull request #28386: URL: https://github.com/apache/spark/pull/28386#issuecomment-727739056 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/35732/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate
AmplabJenkins removed a comment on pull request #28386: URL: https://github.com/apache/spark/pull/28386#issuecomment-727739053 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2
dongjoon-hyun edited a comment on pull request #30378: URL: https://github.com/apache/spark/pull/30378#issuecomment-727738689 BTW, @HyukjinKwon . Do you still want to use Java 8 in this PR? `Scala 2.13 build` job is also using Java 11. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate
SparkQA commented on pull request #28386: URL: https://github.com/apache/spark/pull/28386#issuecomment-727739036 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35732/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2
dongjoon-hyun edited a comment on pull request #30378: URL: https://github.com/apache/spark/pull/30378#issuecomment-727738689 BTW, @HyukjinKwon . Do you still want to use Java 8 in this PR? For Scala 2.13 build, I also use Java 11. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #30164: [SPARK-32919][SHUFFLE][test-maven][test-hadoop2.7] Driver side changes for coordinating push based shuffle by selecting external
Ngone51 commented on a change in pull request #30164: URL: https://github.com/apache/spark/pull/30164#discussion_r523903398 ## File path: core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala ## @@ -657,6 +688,38 @@ class BlockManagerMasterEndpoint( } } + private def getShufflePushMergerLocations( + numMergersNeeded: Int, + hostsToFilter: Set[String]): Seq[BlockManagerId] = { +val blockManagersWithExecutors = blockManagerIdByExecutor.groupBy(_._2.host) + .mapValues(_.head).values.map(_._2).toSet +val filteredBlockManagersWithExecutors = blockManagersWithExecutors + .filterNot(x => hostsToFilter.contains(x.host)) +val filteredMergersWithExecutors = filteredBlockManagersWithExecutors.map( + x => BlockManagerId(x.executorId, x.host, StorageUtils.externalShuffleServicePort(conf))) + +// Enough mergers are available as part of active executors list +if (filteredMergersWithExecutors.size >= numMergersNeeded) { + filteredMergersWithExecutors.toSeq +} else { + // Delta mergers added from inactive mergers list to the active mergers list + val filteredMergersWithExecutorsHosts = filteredMergersWithExecutors.map(_.host) + // Pick random hosts instead of preferring the top of the list + val randomizedShuffleMergerLocations = Utils.randomize(shuffleMergerLocations.values.toSeq) Review comment: @Victsm You're right @venkata91 fixed in the right way. ## File path: core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala ## @@ -657,6 +688,38 @@ class BlockManagerMasterEndpoint( } } + private def getShufflePushMergerLocations( + numMergersNeeded: Int, + hostsToFilter: Set[String]): Seq[BlockManagerId] = { +val blockManagersWithExecutors = blockManagerIdByExecutor.groupBy(_._2.host) + .mapValues(_.head).values.map(_._2).toSet +val filteredBlockManagersWithExecutors = blockManagersWithExecutors + .filterNot(x => hostsToFilter.contains(x.host)) +val filteredMergersWithExecutors = filteredBlockManagersWithExecutors.map( + x => BlockManagerId(x.executorId, x.host, StorageUtils.externalShuffleServicePort(conf))) + +// Enough mergers are available as part of active executors list +if (filteredMergersWithExecutors.size >= numMergersNeeded) { + filteredMergersWithExecutors.toSeq +} else { + // Delta mergers added from inactive mergers list to the active mergers list + val filteredMergersWithExecutorsHosts = filteredMergersWithExecutors.map(_.host) + // Pick random hosts instead of preferring the top of the list + val randomizedShuffleMergerLocations = Utils.randomize(shuffleMergerLocations.values.toSeq) Review comment: @Victsm You're right. @venkata91 fixed in the right way. ## File path: core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala ## @@ -657,6 +688,38 @@ class BlockManagerMasterEndpoint( } } + private def getShufflePushMergerLocations( + numMergersNeeded: Int, + hostsToFilter: Set[String]): Seq[BlockManagerId] = { +val blockManagersWithExecutors = blockManagerIdByExecutor.groupBy(_._2.host) + .mapValues(_.head).values.map(_._2).toSet +val filteredBlockManagersWithExecutors = blockManagersWithExecutors + .filterNot(x => hostsToFilter.contains(x.host)) +val filteredMergersWithExecutors = filteredBlockManagersWithExecutors.map( + x => BlockManagerId(x.executorId, x.host, StorageUtils.externalShuffleServicePort(conf))) + +// Enough mergers are available as part of active executors list +if (filteredMergersWithExecutors.size >= numMergersNeeded) { + filteredMergersWithExecutors.toSeq +} else { + // Delta mergers added from inactive mergers list to the active mergers list + val filteredMergersWithExecutorsHosts = filteredMergersWithExecutors.map(_.host) + // Pick random hosts instead of preferring the top of the list + val randomizedShuffleMergerLocations = Utils.randomize(shuffleMergerLocations.values.toSeq) Review comment: @Victsm You're right. @venkata91 fixed it in the right way. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate
AmplabJenkins commented on pull request #28386: URL: https://github.com/apache/spark/pull/28386#issuecomment-727739053 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2
dongjoon-hyun commented on pull request #30378: URL: https://github.com/apache/spark/pull/30378#issuecomment-727738689 BTW, @HyukjinKwon . Do you still want to use Java 8 in this PR? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2
dongjoon-hyun commented on a change in pull request #30378: URL: https://github.com/apache/spark/pull/30378#discussion_r523902143 ## File path: .github/workflows/build_and_test.yml ## @@ -14,6 +14,28 @@ on: required: true jobs: + # This is on the top to give the most visibility in case of failures + hadoop-2: +name: Hadoop 2 build +runs-on: ubuntu-20.04 +steps: +- name: Checkout Spark repository + uses: actions/checkout@v2 +- name: Cache Coursier local repository + uses: actions/cache@v2 + with: +path: ~/.cache/coursier +key: hadoop-2-coursier-${{ hashFiles('**/pom.xml', '**/plugins.sbt') }} +restore-keys: | + hadoop-2-coursier- +- name: Install Java 11 + uses: actions/setup-java@v1 + with: +java-version: 11 Review comment: @HyukjinKwon . It will be okay~ This is only for compilation test for Hadoop 2.7 API instead of running real tests. After we make `branch-3.1`, Jenkins will be created for all release profiles, too. ## File path: .github/workflows/build_and_test.yml ## @@ -14,6 +14,28 @@ on: required: true jobs: + # This is on the top to give the most visibility in case of failures + hadoop-2: +name: Hadoop 2 build +runs-on: ubuntu-20.04 +steps: +- name: Checkout Spark repository + uses: actions/checkout@v2 +- name: Cache Coursier local repository + uses: actions/cache@v2 + with: +path: ~/.cache/coursier +key: hadoop-2-coursier-${{ hashFiles('**/pom.xml', '**/plugins.sbt') }} +restore-keys: | + hadoop-2-coursier- +- name: Install Java 11 + uses: actions/setup-java@v1 + with: +java-version: 11 Review comment: @HyukjinKwon . It will be okay~ This is only for compilation test for Hadoop 2.7 API instead of running real tests. After we make `branch-3.1`, Jenkins jobs will be created for all release profiles, too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2
dongjoon-hyun commented on pull request #30378: URL: https://github.com/apache/spark/pull/30378#issuecomment-727737781 It seems that there exists some delay on GitHub Action. I retriggered GitHub Action because the previous run fails with compilation error again. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate
AmplabJenkins removed a comment on pull request #28386: URL: https://github.com/apache/spark/pull/28386#issuecomment-727737091 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/131130/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate
SparkQA removed a comment on pull request #28386: URL: https://github.com/apache/spark/pull/28386#issuecomment-727728851 **[Test build #131130 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131130/testReport)** for PR 28386 at commit [`7833d65`](https://github.com/apache/spark/commit/7833d65f0078b441a92249b2cf3a7e00c70a9e32). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org