[GitHub] [spark] SparkQA commented on pull request #29497: [WIP][SPARK-32670][SQL]Group exception messages in Catalyst Analyzer in one file

2020-11-15 Thread GitBox


SparkQA commented on pull request #29497:
URL: https://github.com/apache/spark/pull/29497#issuecomment-727806593


   **[Test build #131140 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131140/testReport)**
 for PR 29497 at commit 
[`645d81b`](https://github.com/apache/spark/commit/645d81bb4622c32119adab7c21c18ea3cce14fdb).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xuanyuanking commented on a change in pull request #30347: [SPARK-33209][SS] Refactor unit test of stream-stream join in UnsupportedOperationsSuite

2020-11-15 Thread GitBox


xuanyuanking commented on a change in pull request #30347:
URL: https://github.com/apache/spark/pull/30347#discussion_r523951738



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationsSuite.scala
##
@@ -414,209 +412,135 @@ class UnsupportedOperationsSuite extends SparkFunSuite {
 batchStreamSupported = false,
 streamBatchSupported = false)
 
-  // Left outer joins: *-stream not allowed
+  // Left outer, left semi, left anti join: *-stream not allowed
+  Seq((LeftOuter, "LeftOuter join"), (LeftSemi, "LeftSemi join"), (LeftAnti, 
"Left anti join"))

Review comment:
   super nit, let's keep naming style here? `LeftOut join` ... `LeftAnti 
join` or `left outer join` ... `left anti join`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30380: [SPARK-27421][SQL] Fix filter for int column and value class java.lang.String when pruning partition column

2020-11-15 Thread GitBox


cloud-fan commented on a change in pull request #30380:
URL: https://github.com/apache/spark/pull/30380#discussion_r523951682



##
File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala
##
@@ -729,6 +729,7 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
   def unapply(expr: Expression): Option[Attribute] = {
 expr match {
   case attr: Attribute => Some(attr)
+  case Cast(IntegralType(), StringType, _) => None

Review comment:
   good catch! I'm thinking if we should be more conservative here. How 
about
   ```
   case Cast(child @ IntegralType(), dt: IntegralType, _) => if 
Cast.canUpCast...
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on a change in pull request #30377: [SPARK-33453][SQL][TESTS] Unify v1 and v2 SHOW PARTITIONS tests

2020-11-15 Thread GitBox


MaxGekk commented on a change in pull request #30377:
URL: https://github.com/apache/spark/pull/30377#discussion_r523950223



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowPartitionsSuite.scala
##
@@ -0,0 +1,198 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command.v1
+
+import org.apache.spark.sql.{AnalysisException, Row, SaveMode}
+import org.apache.spark.sql.catalyst.analysis.NoSuchTableException
+import org.apache.spark.sql.connector.catalog.CatalogManager
+import org.apache.spark.sql.execution.command
+import org.apache.spark.sql.test.SharedSparkSession
+
+trait ShowPartitionsSuiteBase extends command.ShowPartitionsSuiteBase {
+  override def version: String = "V1"
+  override def catalog: String = CatalogManager.SESSION_CATALOG_NAME
+  override def defaultNamespace: Seq[String] = Seq("default")
+  override def defaultUsing: String = "USING parquet"
+
+  protected def createDateTable(table: String): Unit = {
+sql(s"""
+  |CREATE TABLE $table (price int, qty int, year int, month int)
+  |$defaultUsing
+  |partitioned by (year, month)""".stripMargin)
+  }
+
+  protected def fillDateTable(table: String): Unit = {
+sql(s"INSERT INTO $table PARTITION(year = 2015, month = 1) SELECT 1, 1")
+sql(s"INSERT INTO $table PARTITION(year = 2015, month = 2) SELECT 2, 2")
+sql(s"INSERT INTO $table PARTITION(year = 2016, month = 2) SELECT 3, 3")
+sql(s"INSERT INTO $table PARTITION(year = 2016, month = 3) SELECT 3, 3")
+  }
+
+  protected def createWideTable(table: String): Unit = {
+sql(s"""
+  |CREATE TABLE $table (
+  |  price int, qty int,
+  |  year int, month int, hour int, minute int, sec int, extra int)
+  |$defaultUsing
+  |PARTITIONED BY (year, month, hour, minute, sec, extra)""".stripMargin)
+  }
+
+  protected def fillWideTable(table: String): Unit = {
+sql(s"""
+  |INSERT INTO $table
+  |PARTITION(year = 2016, month = 3, hour = 10, minute = 10, sec = 10, 
extra = 1) SELECT 3, 3
+  """.stripMargin)
+sql(s"""
+  |INSERT INTO $table
+  |PARTITION(year = 2016, month = 4, hour = 10, minute = 10, sec = 10, 
extra = 1) SELECT 3, 3
+  """.stripMargin)
+  }
+
+  test("show everything") {
+val table = "dateTable"
+withTable(table) {
+  createDateTable(table)
+  fillDateTable(table)
+  checkAnswer(
+sql(s"show partitions $table"),
+Row("year=2015/month=1") ::
+  Row("year=2015/month=2") ::
+  Row("year=2016/month=2") ::
+  Row("year=2016/month=3") :: Nil)
+
+  checkAnswer(
+sql(s"show partitions default.$table"),
+Row("year=2015/month=1") ::
+  Row("year=2015/month=2") ::
+  Row("year=2016/month=2") ::
+  Row("year=2016/month=3") :: Nil)
+}
+  }
+
+  test("filter by partitions") {
+val table = "dateTable"
+withTable(table) {
+  createDateTable(table)
+  fillDateTable(table)
+  checkAnswer(
+sql(s"show partitions default.$table PARTITION(year=2015)"),
+Row("year=2015/month=1") ::
+  Row("year=2015/month=2") :: Nil)
+  checkAnswer(
+sql(s"show partitions default.$table PARTITION(year=2015, month=1)"),
+Row("year=2015/month=1") :: Nil)
+  checkAnswer(
+sql(s"show partitions default.$table PARTITION(month=2)"),
+Row("year=2015/month=2") ::
+  Row("year=2016/month=2") :: Nil)
+}
+  }
+
+  test("show everything more than 5 part keys") {
+val table = "wideTable"
+withTable(table) {
+  createWideTable(table)
+  fillWideTable(table)
+  checkAnswer(
+sql(s"show partitions $table"),
+Row("year=2016/month=3/hour=10/minute=10/sec=10/extra=1") ::
+  Row("year=2016/month=4/hour=10/minute=10/sec=10/extra=1") :: Nil)
+}
+  }
+
+  test("non-partitioning columns") {
+val table = "dateTable"
+withTable(table) {
+  createDateTable(table)
+  fillDateTable(table)
+  val errMsg = intercept[AnalysisException] {
+sql(s"SHOW PARTITIONS $table PARTITION(abcd=2015, xyz=1)")
+  }.getMessage
+  

[GitHub] [spark] SparkQA commented on pull request #30384: [SPARK-33456][SQL][TEST][FOLLOWUP] Fix SUBEXPRESSION_ELIMINATION_ENABLED config name

2020-11-15 Thread GitBox


SparkQA commented on pull request #30384:
URL: https://github.com/apache/spark/pull/30384#issuecomment-727800909


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35740/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30300: [SPARK-33399][SQL] Normalize output partitioning and sortorder with respect to aliases to avoid unneeded exchange/sort nodes

2020-11-15 Thread GitBox


SparkQA commented on pull request #30300:
URL: https://github.com/apache/spark/pull/30300#issuecomment-727799605


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35739/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29497: [WIP][SPARK-32670][SQL]Group exception messages in Catalyst Analyzer in one file

2020-11-15 Thread GitBox


cloud-fan commented on a change in pull request #29497:
URL: https://github.com/apache/spark/pull/29497#discussion_r523946524



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/QueryCompilationErrors.scala
##
@@ -0,0 +1,164 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.errors
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.{Expression, GroupingID}
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.util.toPrettySQL
+import org.apache.spark.sql.connector.catalog.TableChange
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types.{AbstractDataType, DataType, StructType}
+
+/**
+ * Object for grouping all error messages in catalyst.
+ * Currently it includes all AnalysisExcpetions created and thrown directly in
+ * org.apache.spark.sql.catalyst.analysis.Analyzer.
+ */
+object QueryCompilationErrors {
+  def groupingIDMismatchError(groupingID: GroupingID, groupByExprs: 
Seq[Expression]): Throwable = {
+new AnalysisException(
+  s"Columns of grouping_id (${groupingID.groupByExprs.mkString(",")}) " +
+s"does not match grouping columns (${groupByExprs.mkString(",")})")
+  }
+
+  def groupingColInvalidError(groupingCol: Expression, groupByExprs: 
Seq[Expression]): Throwable = {
+new AnalysisException(
+  s"Column of grouping ($groupingCol) can't be found " +
+s"in grouping columns ${groupByExprs.mkString(",")}")
+  }
+
+  def groupingSizeTooLargeError(sizeLimit: Int): Throwable = {
+new AnalysisException(
+  s"Grouping sets size cannot be greater than $sizeLimit")
+  }
+
+  def unorderablePivotColError(pivotCol: Expression): Throwable = {
+new AnalysisException(
+  s"Invalid pivot column '$pivotCol'. Pivot columns must be comparable."
+)
+  }
+
+  def nonliteralPivotValError(pivotVal: Expression): Throwable = {

Review comment:
   `nonLiteral...`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29497: [WIP][SPARK-32670][SQL]Group exception messages in Catalyst Analyzer in one file

2020-11-15 Thread GitBox


cloud-fan commented on a change in pull request #29497:
URL: https://github.com/apache/spark/pull/29497#discussion_r523946122



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/QueryCompilationErrors.scala
##
@@ -0,0 +1,164 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.errors
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.{Expression, GroupingID}
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.util.toPrettySQL
+import org.apache.spark.sql.connector.catalog.TableChange
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types.{AbstractDataType, DataType, StructType}
+
+/**
+ * Object for grouping all error messages in catalyst.

Review comment:
   `Object for grouping all error messages of the query compilation.`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] luluorta commented on a change in pull request #30299: [SPARK-33389][SQL] Make internal classes of SparkSession always using active SQLConf

2020-11-15 Thread GitBox


luluorta commented on a change in pull request #30299:
URL: https://github.com/apache/spark/pull/30299#discussion_r523940307



##
File path: sql/core/src/test/resources/sql-tests/inputs/datetime.sql
##
@@ -149,7 +149,7 @@ select to_timestamp('2019-10-06 A', '-MM-dd G');
 select to_timestamp('22 05 2020 Friday', 'dd MM  EE');
 select to_timestamp('22 05 2020 Friday', 'dd MM  E');
 select unix_timestamp('22 05 2020 Friday', 'dd MM  E');
-select from_json('{"time":"26/October/2015"}', 'time Timestamp', 
map('timestampFormat', 'dd/M/'));
-select from_json('{"date":"26/October/2015"}', 'date Date', map('dateFormat', 
'dd/M/'));
-select from_csv('26/October/2015', 'time Timestamp', map('timestampFormat', 
'dd/M/'));
-select from_csv('26/October/2015', 'date Date', map('dateFormat', 
'dd/M/'));
+select from_json('{"ts":"26/October/2015"}', 'ts Timestamp', 
map('timestampFormat', 'dd/M/'));

Review comment:
   I opened a new PR for this issue 
[https://github.com/apache/spark/pull/30357](https://github.com/apache/spark/pull/30357)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30299: [SPARK-33389][SQL] Make internal classes of SparkSession always using active SQLConf

2020-11-15 Thread GitBox


cloud-fan commented on a change in pull request #30299:
URL: https://github.com/apache/spark/pull/30299#discussion_r523944573



##
File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala
##
@@ -172,7 +172,7 @@ case class HiveTableScanExec(
 prunePartitions(hivePartitions)
   }
 } else {
-  if (sparkSession.sessionState.conf.metastorePartitionPruning &&

Review comment:
   let's keep this unchanged for now. We may override `def conf` in 
`SparkPlan` later, to always get conf from the captured spark session.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] luluorta commented on a change in pull request #30299: [SPARK-33389][SQL] Make internal classes of SparkSession always using active SQLConf

2020-11-15 Thread GitBox


luluorta commented on a change in pull request #30299:
URL: https://github.com/apache/spark/pull/30299#discussion_r523940307



##
File path: sql/core/src/test/resources/sql-tests/inputs/datetime.sql
##
@@ -149,7 +149,7 @@ select to_timestamp('2019-10-06 A', '-MM-dd G');
 select to_timestamp('22 05 2020 Friday', 'dd MM  EE');
 select to_timestamp('22 05 2020 Friday', 'dd MM  E');
 select unix_timestamp('22 05 2020 Friday', 'dd MM  E');
-select from_json('{"time":"26/October/2015"}', 'time Timestamp', 
map('timestampFormat', 'dd/M/'));
-select from_json('{"date":"26/October/2015"}', 'date Date', map('dateFormat', 
'dd/M/'));
-select from_csv('26/October/2015', 'time Timestamp', map('timestampFormat', 
'dd/M/'));
-select from_csv('26/October/2015', 'date Date', map('dateFormat', 
'dd/M/'));
+select from_json('{"ts":"26/October/2015"}', 'ts Timestamp', 
map('timestampFormat', 'dd/M/'));

Review comment:
   I opened a new PR for this issue
   
[https://github.com/apache/spark/pull/30357](https://github.com/apache/spark/pull/30357)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.

2020-11-15 Thread GitBox


SparkQA commented on pull request #27735:
URL: https://github.com/apache/spark/pull/27735#issuecomment-727796003


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35738/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] luluorta commented on a change in pull request #30299: [SPARK-33389][SQL] Make internal classes of SparkSession always using active SQLConf

2020-11-15 Thread GitBox


luluorta commented on a change in pull request #30299:
URL: https://github.com/apache/spark/pull/30299#discussion_r523940307



##
File path: sql/core/src/test/resources/sql-tests/inputs/datetime.sql
##
@@ -149,7 +149,7 @@ select to_timestamp('2019-10-06 A', '-MM-dd G');
 select to_timestamp('22 05 2020 Friday', 'dd MM  EE');
 select to_timestamp('22 05 2020 Friday', 'dd MM  E');
 select unix_timestamp('22 05 2020 Friday', 'dd MM  E');
-select from_json('{"time":"26/October/2015"}', 'time Timestamp', 
map('timestampFormat', 'dd/M/'));
-select from_json('{"date":"26/October/2015"}', 'date Date', map('dateFormat', 
'dd/M/'));
-select from_csv('26/October/2015', 'time Timestamp', map('timestampFormat', 
'dd/M/'));
-select from_csv('26/October/2015', 'date Date', map('dateFormat', 
'dd/M/'));
+select from_json('{"ts":"26/October/2015"}', 'ts Timestamp', 
map('timestampFormat', 'dd/M/'));

Review comment:
   I opened a new PR for this issue
   [https://github.com/apache/spark/pull/30357](url)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30299: [SPARK-33389][SQL] Make internal classes of SparkSession always using active SQLConf

2020-11-15 Thread GitBox


SparkQA commented on pull request #30299:
URL: https://github.com/apache/spark/pull/30299#issuecomment-727791622


   **[Test build #131139 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131139/testReport)**
 for PR 30299 at commit 
[`bf1e56a`](https://github.com/apache/spark/commit/bf1e56a87d662b9657e5f46380628b06b9c9e359).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30364: [SPARK-33140][SQL][FOLLOW-UP] Revert code that not use passed-in SparkSession to get SQLConf.

2020-11-15 Thread GitBox


SparkQA commented on pull request #30364:
URL: https://github.com/apache/spark/pull/30364#issuecomment-727788257


   **[Test build #131138 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131138/testReport)**
 for PR 30364 at commit 
[`5cd23d7`](https://github.com/apache/spark/commit/5cd23d7836b17a400fc224afd691c79418352afb).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #30383: [SPARK-33458][SQL] Hive partition pruning support Contains, StartsWith and EndsWith predicate

2020-11-15 Thread GitBox


cloud-fan closed pull request #30383:
URL: https://github.com/apache/spark/pull/30383


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #30383: [SPARK-33458][SQL] Hive partition pruning support Contains, StartsWith and EndsWith predicate

2020-11-15 Thread GitBox


cloud-fan commented on pull request #30383:
URL: https://github.com/apache/spark/pull/30383#issuecomment-727786769


   thanks, merging to master!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30384: [SPARK-33456][SQL][TEST][FOLLOWUP] Fix SUBEXPRESSION_ELIMINATION_ENABLED config name

2020-11-15 Thread GitBox


SparkQA commented on pull request #30384:
URL: https://github.com/apache/spark/pull/30384#issuecomment-727784817


   **[Test build #131137 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131137/testReport)**
 for PR 30384 at commit 
[`6527137`](https://github.com/apache/spark/commit/65271378ed6b813f28f0a5753d8f9c61b86f0bb1).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #30358: [SPARK-33394][SQL][TESTS] Throw `NoSuchNamespaceException` for not existing namespace in `InMemoryTableCatalog.listTables()`

2020-11-15 Thread GitBox


cloud-fan closed pull request #30358:
URL: https://github.com/apache/spark/pull/30358


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #30358: [SPARK-33394][SQL][TESTS] Throw `NoSuchNamespaceException` for not existing namespace in `InMemoryTableCatalog.listTables()`

2020-11-15 Thread GitBox


cloud-fan commented on pull request #30358:
URL: https://github.com/apache/spark/pull/30358#issuecomment-727782490


   thanks, merging to master!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #30381: [SPARK-33456][SQL][TEST] Add end-to-end test for subexpression elimination

2020-11-15 Thread GitBox


viirya commented on a change in pull request #30381:
URL: https://github.com/apache/spark/pull/30381#discussion_r523932172



##
File path: sql/core/src/test/resources/sql-tests/inputs/subexp-elimination.sql
##
@@ -0,0 +1,37 @@
+-- Test for subexpression elimination.
+
+--SET spark.sql.optimizer.enableJsonExpressionOptimization=false
+
+--CONFIG_DIM1 spark.sql.codegen.wholeStage=true
+--CONFIG_DIM1 spark.sql.codegen.wholeStage=false
+
+--CONFIG_DIM2 spark.sql.codegen.factoryMode=CODEGEN_ONLY
+--CONFIG_DIM2 spark.sql.codegen.factoryMode=NO_CODEGEN
+
+--CONFIG_DIM3 SUBEXPRESSION_ELIMINATION_ENABLED=true
+--CONFIG_DIM3 SUBEXPRESSION_ELIMINATION_ENABLED=false

Review comment:
   Fix in #30384.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya opened a new pull request #30384: [SPARK-33456][SQL][TEST][FOLLOWUP] Fix SUBEXPRESSION_ELIMINATION_ENABLED config name

2020-11-15 Thread GitBox


viirya opened a new pull request #30384:
URL: https://github.com/apache/spark/pull/30384


   
   
   ### What changes were proposed in this pull request?
   
   
   To fix wrong config name in `subexp-elimination.sql`.
   
   ### Why are the changes needed?
   
   
   `CONFIG_DIM` should use config name's key.
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No, dev only.
   
   ### How was this patch tested?
   
   
   Unit test.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30368: [SPARK-33442][SQL] Change Combine Limit to Eliminate limit using max row

2020-11-15 Thread GitBox


cloud-fan commented on a change in pull request #30368:
URL: https://github.com/apache/spark/pull/30368#discussion_r523931544



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##
@@ -1452,11 +1452,27 @@ object PushPredicateThroughJoin extends 
Rule[LogicalPlan] with PredicateHelper {
 }
 
 /**
- * Combines two adjacent [[Limit]] operators into one, merging the
- * expressions into one single expression.
+ * 1. Eliminate [[Limit]] operators if it's child max row <= limit.
+ * 2. Combines two adjacent [[Limit]] operators into one, merging the
+ *expressions into one single expression.
  */
-object CombineLimits extends Rule[LogicalPlan] {
-  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+object EliminateLimits extends Rule[LogicalPlan] {
+  private def canEliminate(limitExpr: Expression, child: LogicalPlan): Boolean 
= {
+// We skip such case that Sort is after Limit since
+// SparkStrategies will convert them to TakeOrderedAndProjectExec
+val skipEliminate = child match {
+  case Sort(_, true, _) => true

Review comment:
   does `TakeOrderedAndProjectExec` really help if we need to get and sort 
all the output?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.

2020-11-15 Thread GitBox


AmplabJenkins removed a comment on pull request #27735:
URL: https://github.com/apache/spark/pull/27735#issuecomment-727779872


   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/131135/
   Test PASSed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.

2020-11-15 Thread GitBox


SparkQA removed a comment on pull request #27735:
URL: https://github.com/apache/spark/pull/27735#issuecomment-727774687


   **[Test build #131135 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131135/testReport)**
 for PR 27735 at commit 
[`ba93111`](https://github.com/apache/spark/commit/ba93111b8ccfc7958e4facf57280a7c980beed84).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.

2020-11-15 Thread GitBox


AmplabJenkins removed a comment on pull request #27735:
URL: https://github.com/apache/spark/pull/27735#issuecomment-727779863


   Merged build finished. Test PASSed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.

2020-11-15 Thread GitBox


AmplabJenkins commented on pull request #27735:
URL: https://github.com/apache/spark/pull/27735#issuecomment-727779863







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.

2020-11-15 Thread GitBox


SparkQA commented on pull request #27735:
URL: https://github.com/apache/spark/pull/27735#issuecomment-727779713


   **[Test build #131135 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131135/testReport)**
 for PR 27735 at commit 
[`ba93111`](https://github.com/apache/spark/commit/ba93111b8ccfc7958e4facf57280a7c980beed84).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #30381: [SPARK-33456][SQL][TEST] Add end-to-end test for subexpression elimination

2020-11-15 Thread GitBox


viirya commented on a change in pull request #30381:
URL: https://github.com/apache/spark/pull/30381#discussion_r523929393



##
File path: sql/core/src/test/resources/sql-tests/inputs/subexp-elimination.sql
##
@@ -0,0 +1,37 @@
+-- Test for subexpression elimination.
+
+--SET spark.sql.optimizer.enableJsonExpressionOptimization=false
+
+--CONFIG_DIM1 spark.sql.codegen.wholeStage=true
+--CONFIG_DIM1 spark.sql.codegen.wholeStage=false
+
+--CONFIG_DIM2 spark.sql.codegen.factoryMode=CODEGEN_ONLY
+--CONFIG_DIM2 spark.sql.codegen.factoryMode=NO_CODEGEN
+
+--CONFIG_DIM3 SUBEXPRESSION_ELIMINATION_ENABLED=true
+--CONFIG_DIM3 SUBEXPRESSION_ELIMINATION_ENABLED=false

Review comment:
   Oops, let me fix it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30300: [SPARK-33399][SQL] Normalize output partitioning and sortorder with respect to aliases to avoid unneeded exchange/sort nodes

2020-11-15 Thread GitBox


SparkQA commented on pull request #30300:
URL: https://github.com/apache/spark/pull/30300#issuecomment-727778057


   **[Test build #131136 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131136/testReport)**
 for PR 30300 at commit 
[`16e1db2`](https://github.com/apache/spark/commit/16e1db202f3522ebe607894b93811bb54bdd9a0c).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #30381: [SPARK-33456][SQL][TEST] Add end-to-end test for subexpression elimination

2020-11-15 Thread GitBox


HyukjinKwon commented on a change in pull request #30381:
URL: https://github.com/apache/spark/pull/30381#discussion_r523929112



##
File path: sql/core/src/test/resources/sql-tests/inputs/subexp-elimination.sql
##
@@ -0,0 +1,37 @@
+-- Test for subexpression elimination.
+
+--SET spark.sql.optimizer.enableJsonExpressionOptimization=false
+
+--CONFIG_DIM1 spark.sql.codegen.wholeStage=true
+--CONFIG_DIM1 spark.sql.codegen.wholeStage=false
+
+--CONFIG_DIM2 spark.sql.codegen.factoryMode=CODEGEN_ONLY
+--CONFIG_DIM2 spark.sql.codegen.factoryMode=NO_CODEGEN
+
+--CONFIG_DIM3 SUBEXPRESSION_ELIMINATION_ENABLED=true
+--CONFIG_DIM3 SUBEXPRESSION_ELIMINATION_ENABLED=false

Review comment:
   Wait, shouldn't it be the actual config names?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30381: [SPARK-33456][SQL][TEST] Add end-to-end test for subexpression elimination

2020-11-15 Thread GitBox


cloud-fan commented on a change in pull request #30381:
URL: https://github.com/apache/spark/pull/30381#discussion_r523929310



##
File path: sql/core/src/test/resources/sql-tests/inputs/subexp-elimination.sql
##
@@ -0,0 +1,37 @@
+-- Test for subexpression elimination.
+
+--SET spark.sql.optimizer.enableJsonExpressionOptimization=false
+
+--CONFIG_DIM1 spark.sql.codegen.wholeStage=true
+--CONFIG_DIM1 spark.sql.codegen.wholeStage=false
+
+--CONFIG_DIM2 spark.sql.codegen.factoryMode=CODEGEN_ONLY
+--CONFIG_DIM2 spark.sql.codegen.factoryMode=NO_CODEGEN
+
+--CONFIG_DIM3 SUBEXPRESSION_ELIMINATION_ENABLED=true

Review comment:
   is `SUBEXPRESSION_ELIMINATION_ENABLED` a config name?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate

2020-11-15 Thread GitBox


AmplabJenkins removed a comment on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-727774065


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/35737/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate

2020-11-15 Thread GitBox


AmplabJenkins removed a comment on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-727774055


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2

2020-11-15 Thread GitBox


AmplabJenkins removed a comment on pull request #30378:
URL: https://github.com/apache/spark/pull/30378#issuecomment-727774994







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29497: [WIP][SPARK-32670][SQL]Group exception messages in Catalyst Analyzer in one file

2020-11-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29497:
URL: https://github.com/apache/spark/pull/29497#issuecomment-727775302







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on a change in pull request #30358: [SPARK-33394][SQL][TESTS] Throw `NoSuchNamespaceException` for not existing namespace in `InMemoryTableCatalog.listTables()`

2020-11-15 Thread GitBox


MaxGekk commented on a change in pull request #30358:
URL: https://github.com/apache/spark/pull/30358#discussion_r523928527



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryTableCatalog.scala
##
@@ -181,9 +181,21 @@ class InMemoryTableCatalog extends 
BasicInMemoryTableCatalog with SupportsNamesp
 
   override def dropNamespace(namespace: Array[String]): Boolean = {
 listNamespaces(namespace).foreach(dropNamespace)
-listTables(namespace).foreach(dropTable)

Review comment:
   @HyukjinKwon You are right, this is test only PR. I changed PR's title 
and description.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29497: [WIP][SPARK-32670][SQL]Group exception messages in Catalyst Analyzer in one file

2020-11-15 Thread GitBox


AmplabJenkins commented on pull request #29497:
URL: https://github.com/apache/spark/pull/29497#issuecomment-727775302







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2

2020-11-15 Thread GitBox


AmplabJenkins commented on pull request #30378:
URL: https://github.com/apache/spark/pull/30378#issuecomment-727774994







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29497: [WIP][SPARK-32670][SQL]Group exception messages in Catalyst Analyzer in one file

2020-11-15 Thread GitBox


SparkQA commented on pull request #29497:
URL: https://github.com/apache/spark/pull/29497#issuecomment-727775287


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35736/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2

2020-11-15 Thread GitBox


SparkQA commented on pull request #30378:
URL: https://github.com/apache/spark/pull/30378#issuecomment-727774961


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35735/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.

2020-11-15 Thread GitBox


SparkQA commented on pull request #27735:
URL: https://github.com/apache/spark/pull/27735#issuecomment-727774687


   **[Test build #131135 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131135/testReport)**
 for PR 27735 at commit 
[`ba93111`](https://github.com/apache/spark/commit/ba93111b8ccfc7958e4facf57280a7c980beed84).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #30381: [SPARK-33456][SQL][TEST] Add end-to-end test for subexpression elimination

2020-11-15 Thread GitBox


viirya commented on pull request #30381:
URL: https://github.com/apache/spark/pull/30381#issuecomment-727774217


   Thanks @HyukjinKwon @dongjoon-hyun 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate

2020-11-15 Thread GitBox


SparkQA commented on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-727774014


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35737/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate

2020-11-15 Thread GitBox


AmplabJenkins commented on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-727774055







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #30381: [SPARK-33456][SQL][TEST] Add end-to-end test for subexpression elimination

2020-11-15 Thread GitBox


HyukjinKwon closed pull request #30381:
URL: https://github.com/apache/spark/pull/30381


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #30381: [SPARK-33456][SQL][TEST] Add end-to-end test for subexpression elimination

2020-11-15 Thread GitBox


HyukjinKwon commented on pull request #30381:
URL: https://github.com/apache/spark/pull/30381#issuecomment-727773690


   Merged to master.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MichaelChirico commented on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate

2020-11-15 Thread GitBox


MichaelChirico commented on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-727768929


   @HyukjinKwon test added, please have a look



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #29797: [SPARK-32932][SQL] Do not use local shuffle reader at final stage on write command

2020-11-15 Thread GitBox


cloud-fan commented on pull request #29797:
URL: https://github.com/apache/spark/pull/29797#issuecomment-727768565


   I'm reading the classdoc of `OptimizeLocalShuffleReader`, and I do feel the 
design is a bit hacky. We add local shuffle reader if
   1. the shuffle is the root node of a query stage.
   2. the shuffle is BHJ build side.
   
   The reason for condition 1 is it will never introduce shuffle. This is true, 
but this may change the final output partitioning which may be bad for cases 
like write command.
   
   I like the idea from @maryannxue which is more general: 1) move LSR rule 
into postStageCreationRules; and 2) make the LSR rule match an Exchange first 
(so condition 1 becomes: the shuffle is a direct child of an exchange). By 
doing this we can skip LSR rule in the last stage, as the last stage's root 
node is not exchange.
   
   I'm not very sure why the current approach can add LSR to BHJ probe side. 
This seems like an accident to me as it's not mentioned in the classdoc.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30368: [SPARK-33442][SQL] Change Combine Limit to Eliminate limit using max row

2020-11-15 Thread GitBox


AmplabJenkins removed a comment on pull request #30368:
URL: https://github.com/apache/spark/pull/30368#issuecomment-727767077







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30368: [SPARK-33442][SQL] Change Combine Limit to Eliminate limit using max row

2020-11-15 Thread GitBox


AmplabJenkins commented on pull request #30368:
URL: https://github.com/apache/spark/pull/30368#issuecomment-727767077







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30368: [SPARK-33442][SQL] Change Combine Limit to Eliminate limit using max row

2020-11-15 Thread GitBox


SparkQA removed a comment on pull request #30368:
URL: https://github.com/apache/spark/pull/30368#issuecomment-727680480


   **[Test build #131118 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131118/testReport)**
 for PR 30368 at commit 
[`7b4e4d6`](https://github.com/apache/spark/commit/7b4e4d6613b14704063883114116143cb97c3c74).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30368: [SPARK-33442][SQL] Change Combine Limit to Eliminate limit using max row

2020-11-15 Thread GitBox


SparkQA commented on pull request #30368:
URL: https://github.com/apache/spark/pull/30368#issuecomment-727766024


   **[Test build #131118 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131118/testReport)**
 for PR 30368 at commit 
[`7b4e4d6`](https://github.com/apache/spark/commit/7b4e4d6613b14704063883114116143cb97c3c74).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate

2020-11-15 Thread GitBox


AmplabJenkins removed a comment on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-727764812







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate

2020-11-15 Thread GitBox


SparkQA removed a comment on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-727750806


   **[Test build #131134 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131134/testReport)**
 for PR 28386 at commit 
[`3359fe3`](https://github.com/apache/spark/commit/3359fe3985ff03d4bec328bef3ce3a9e6be48cae).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate

2020-11-15 Thread GitBox


AmplabJenkins commented on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-727764812







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2

2020-11-15 Thread GitBox


SparkQA commented on pull request #30378:
URL: https://github.com/apache/spark/pull/30378#issuecomment-727764668


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35735/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate

2020-11-15 Thread GitBox


SparkQA commented on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-727764685


   **[Test build #131134 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131134/testReport)**
 for PR 28386 at commit 
[`3359fe3`](https://github.com/apache/spark/commit/3359fe3985ff03d4bec328bef3ce3a9e6be48cae).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate

2020-11-15 Thread GitBox


SparkQA commented on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-727763827


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35737/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29497: [WIP][SPARK-32670][SQL]Group exception messages in Catalyst Analyzer in one file

2020-11-15 Thread GitBox


SparkQA commented on pull request #29497:
URL: https://github.com/apache/spark/pull/29497#issuecomment-727762739


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35736/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2

2020-11-15 Thread GitBox


dongjoon-hyun commented on pull request #30378:
URL: https://github.com/apache/spark/pull/30378#issuecomment-727758955


   Yay! Thank you, @viirya and @HyukjinKwon !  



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2

2020-11-15 Thread GitBox


HyukjinKwon closed pull request #30378:
URL: https://github.com/apache/spark/pull/30378


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2

2020-11-15 Thread GitBox


HyukjinKwon commented on pull request #30378:
URL: https://github.com/apache/spark/pull/30378#issuecomment-727757148


   Merged to master.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate

2020-11-15 Thread GitBox


AmplabJenkins removed a comment on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-727751358







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate

2020-11-15 Thread GitBox


AmplabJenkins commented on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-727751358







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate

2020-11-15 Thread GitBox


SparkQA commented on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-727751348


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35733/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate

2020-11-15 Thread GitBox


SparkQA commented on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-727750806


   **[Test build #131134 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131134/testReport)**
 for PR 28386 at commit 
[`3359fe3`](https://github.com/apache/spark/commit/3359fe3985ff03d4bec328bef3ce3a9e6be48cae).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2

2020-11-15 Thread GitBox


dongjoon-hyun commented on pull request #30378:
URL: https://github.com/apache/spark/pull/30378#issuecomment-727750403


   Could you approve once more please?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2

2020-11-15 Thread GitBox


dongjoon-hyun commented on pull request #30378:
URL: https://github.com/apache/spark/pull/30378#issuecomment-727750221


   Now, the new job is green.
   ![Screen Shot 2020-11-15 at 9 46 58 
PM](https://user-images.githubusercontent.com/9700541/99217092-1a97e900-278c-11eb-9155-1734d0aec251.png)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on pull request #30164: [SPARK-32919][SHUFFLE][test-maven][test-hadoop2.7] Driver side changes for coordinating push based shuffle by selecting external shuffle serv

2020-11-15 Thread GitBox


Ngone51 commented on pull request #30164:
URL: https://github.com/apache/spark/pull/30164#issuecomment-727749010


   LGTM if all tests pass.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2

2020-11-15 Thread GitBox


dongjoon-hyun commented on pull request #30378:
URL: https://github.com/apache/spark/pull/30378#issuecomment-727749128


   Sorry for the rebasing after your approval, @HyukjinKwon , @wangyum , 
@viirya . It was inevitable to bring the latest master branch. (I didn't notice 
that GitHub Action doesn't use the latest master.)
   
   Also, I switched to Java 8 according to @HyukjinKwon 's advice.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #30164: [SPARK-32919][SHUFFLE][test-maven][test-hadoop2.7] Driver side changes for coordinating push based shuffle by selecting external

2020-11-15 Thread GitBox


Ngone51 commented on a change in pull request #30164:
URL: https://github.com/apache/spark/pull/30164#discussion_r523909743



##
File path: 
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
##
@@ -657,6 +688,43 @@ class BlockManagerMasterEndpoint(
 }
   }
 
+  private def getShufflePushMergerLocations(
+  numMergersNeeded: Int,
+  hostsToFilter: Set[String]): Seq[BlockManagerId] = {
+val blockManagersWithExecutors = 
blockManagerIdByExecutor.groupBy(_._2.host)
+  .mapValues(_.head).values.map(_._2).toSet
+val filteredBlockManagersWithExecutors = blockManagersWithExecutors
+  .filterNot(x => hostsToFilter.contains(x.host))
+val filteredMergersWithExecutors = filteredBlockManagersWithExecutors.map(
+  x => BlockManagerId(x.executorId, x.host, 
StorageUtils.externalShuffleServicePort(conf)))
+
+// Enough mergers are available as part of active executors list
+if (filteredMergersWithExecutors.size >= numMergersNeeded) {
+  filteredMergersWithExecutors.toSeq
+} else {
+  // Delta mergers added from inactive mergers list to the active mergers 
list
+  val filteredMergersWithExecutorsHosts = 
filteredMergersWithExecutors.map(_.host)
+  val filteredMergersWithoutExecutors = shuffleMergerLocations.values
+.filterNot(x => hostsToFilter.contains(x.host))
+.filterNot(x => filteredMergersWithExecutorsHosts.contains(x.host))
+  val randomFilteredMergersLocations =
+if (filteredMergersWithoutExecutors.size >
+  numMergersNeeded - filteredMergersWithExecutors.size) {
+  Utils.randomize(filteredMergersWithoutExecutors)
+} else {
+  filteredMergersWithoutExecutors
+}
+  filteredMergersWithExecutors.toSeq ++ randomFilteredMergersLocations
+.take(numMergersNeeded - filteredMergersWithExecutors.size)

Review comment:
   We can only perform `take()` when `randomize()` is performed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2

2020-11-15 Thread GitBox


HyukjinKwon commented on a change in pull request #30378:
URL: https://github.com/apache/spark/pull/30378#discussion_r523909079



##
File path: .github/workflows/build_and_test.yml
##
@@ -14,6 +14,28 @@ on:
 required: true
 
 jobs:
+  # This is on the top to give the most visibility in case of failures
+  hadoop-2:
+name: Hadoop 2 build
+runs-on: ubuntu-20.04
+steps:
+- name: Checkout Spark repository
+  uses: actions/checkout@v2
+- name: Cache Coursier local repository
+  uses: actions/cache@v2
+  with:
+path: ~/.cache/coursier
+key: hadoop-2-coursier-${{ hashFiles('**/pom.xml', '**/plugins.sbt') }}
+restore-keys: |
+  hadoop-2-coursier-
+- name: Install Java 11
+  uses: actions/setup-java@v1
+  with:
+java-version: 11

Review comment:
   Sure, I guess it's fine.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29497: [WIP][SPARK-32670][SQL]Group exception messages in Catalyst Analyzer in one file

2020-11-15 Thread GitBox


SparkQA commented on pull request #29497:
URL: https://github.com/apache/spark/pull/29497#issuecomment-727745405


   **[Test build #131133 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131133/testReport)**
 for PR 29497 at commit 
[`f391721`](https://github.com/apache/spark/commit/f39172149a266072128f5adf8308cad55efe2d45).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2

2020-11-15 Thread GitBox


SparkQA commented on pull request #30378:
URL: https://github.com/apache/spark/pull/30378#issuecomment-727745360


   **[Test build #131132 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131132/testReport)**
 for PR 30378 at commit 
[`f26fc30`](https://github.com/apache/spark/commit/f26fc30f6c068b7381741505cb19369c720c49f3).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate

2020-11-15 Thread GitBox


SparkQA commented on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-727743297


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35733/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29497: [WIP][SPARK-32670][SQL]Group exception messages in Catalyst Analyzer in one file

2020-11-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29497:
URL: https://github.com/apache/spark/pull/29497#issuecomment-727741699


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/131131/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29497: [WIP][SPARK-32670][SQL]Group exception messages in Catalyst Analyzer in one file

2020-11-15 Thread GitBox


SparkQA removed a comment on pull request #29497:
URL: https://github.com/apache/spark/pull/29497#issuecomment-727740247


   **[Test build #131131 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131131/testReport)**
 for PR 29497 at commit 
[`032c916`](https://github.com/apache/spark/commit/032c9160b3cbaefa2be2a3ba7c4c57530a3e4d6c).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29497: [WIP][SPARK-32670][SQL]Group exception messages in Catalyst Analyzer in one file

2020-11-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29497:
URL: https://github.com/apache/spark/pull/29497#issuecomment-727741364







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29497: [WIP][SPARK-32670][SQL]Group exception messages in Catalyst Analyzer in one file

2020-11-15 Thread GitBox


SparkQA commented on pull request #29497:
URL: https://github.com/apache/spark/pull/29497#issuecomment-727741674


   **[Test build #131131 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131131/testReport)**
 for PR 29497 at commit 
[`032c916`](https://github.com/apache/spark/commit/032c9160b3cbaefa2be2a3ba7c4c57530a3e4d6c).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29497: [WIP][SPARK-32670][SQL]Group exception messages in Catalyst Analyzer in one file

2020-11-15 Thread GitBox


AmplabJenkins commented on pull request #29497:
URL: https://github.com/apache/spark/pull/29497#issuecomment-727741693







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29497: [WIP][SPARK-32670][SQL]Group exception messages in Catalyst Analyzer in one file

2020-11-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29497:
URL: https://github.com/apache/spark/pull/29497#issuecomment-727741355


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29497: [WIP][SPARK-32670][SQL]Group exception messages in Catalyst Analyzer in one file

2020-11-15 Thread GitBox


AmplabJenkins commented on pull request #29497:
URL: https://github.com/apache/spark/pull/29497#issuecomment-727741355







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2

2020-11-15 Thread GitBox


wangyum commented on pull request #30378:
URL: https://github.com/apache/spark/pull/30378#issuecomment-727740477


   Thank you @dongjoon-hyun.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2

2020-11-15 Thread GitBox


dongjoon-hyun commented on pull request #30378:
URL: https://github.com/apache/spark/pull/30378#issuecomment-727740466


   It's weird. The new code is not consumed at the GitHub Action re-trigger. 
I'll rebase this to the master~
   ```
   [error] 
/home/runner/work/spark/spark/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala:320:52:
 type mismatch;
   706
   [error]  found   : Long
   707
   [error]  required: Int
   708
   [error] Resource.newInstance(resourcesWithDefaults.totalMemMiB, 
resourcesWithDefaults.cores)
   709
   [error]^
   710
   [error] one error found
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29497: [WIP][SPARK-32670][SQL]Group exception messages in Catalyst Analyzer in one file

2020-11-15 Thread GitBox


SparkQA commented on pull request #29497:
URL: https://github.com/apache/spark/pull/29497#issuecomment-727740247


   **[Test build #131131 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131131/testReport)**
 for PR 29497 at commit 
[`032c916`](https://github.com/apache/spark/commit/032c9160b3cbaefa2be2a3ba7c4c57530a3e4d6c).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2

2020-11-15 Thread GitBox


dongjoon-hyun edited a comment on pull request #30378:
URL: https://github.com/apache/spark/pull/30378#issuecomment-727739673


   Thank you, @wangyum . I'm looking at both this PR and vanilla master branch, 
too.
   
   The previous failure might happen by another unknown reason instead of the 
API issue.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2

2020-11-15 Thread GitBox


dongjoon-hyun commented on pull request #30378:
URL: https://github.com/apache/spark/pull/30378#issuecomment-727739673


   Thank you, @wangyum . I'm looking at both this PR and vanilla master branch, 
too.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate

2020-11-15 Thread GitBox


AmplabJenkins removed a comment on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-727739056


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/35732/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate

2020-11-15 Thread GitBox


AmplabJenkins removed a comment on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-727739053


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2

2020-11-15 Thread GitBox


dongjoon-hyun edited a comment on pull request #30378:
URL: https://github.com/apache/spark/pull/30378#issuecomment-727738689


   BTW, @HyukjinKwon . Do you still want to use Java 8 in this PR?
   
   `Scala 2.13 build` job is also using Java 11.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate

2020-11-15 Thread GitBox


SparkQA commented on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-727739036


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35732/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2

2020-11-15 Thread GitBox


dongjoon-hyun edited a comment on pull request #30378:
URL: https://github.com/apache/spark/pull/30378#issuecomment-727738689


   BTW, @HyukjinKwon . Do you still want to use Java 8 in this PR?
   
   For Scala 2.13 build, I also use Java 11.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #30164: [SPARK-32919][SHUFFLE][test-maven][test-hadoop2.7] Driver side changes for coordinating push based shuffle by selecting external

2020-11-15 Thread GitBox


Ngone51 commented on a change in pull request #30164:
URL: https://github.com/apache/spark/pull/30164#discussion_r523903398



##
File path: 
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
##
@@ -657,6 +688,38 @@ class BlockManagerMasterEndpoint(
 }
   }
 
+  private def getShufflePushMergerLocations(
+  numMergersNeeded: Int,
+  hostsToFilter: Set[String]): Seq[BlockManagerId] = {
+val blockManagersWithExecutors = 
blockManagerIdByExecutor.groupBy(_._2.host)
+  .mapValues(_.head).values.map(_._2).toSet
+val filteredBlockManagersWithExecutors = blockManagersWithExecutors
+  .filterNot(x => hostsToFilter.contains(x.host))
+val filteredMergersWithExecutors = filteredBlockManagersWithExecutors.map(
+  x => BlockManagerId(x.executorId, x.host, 
StorageUtils.externalShuffleServicePort(conf)))
+
+// Enough mergers are available as part of active executors list
+if (filteredMergersWithExecutors.size >= numMergersNeeded) {
+  filteredMergersWithExecutors.toSeq
+} else {
+  // Delta mergers added from inactive mergers list to the active mergers 
list
+  val filteredMergersWithExecutorsHosts = 
filteredMergersWithExecutors.map(_.host)
+  // Pick random hosts instead of preferring the top of the list
+  val randomizedShuffleMergerLocations = 
Utils.randomize(shuffleMergerLocations.values.toSeq)

Review comment:
   @Victsm You're right @venkata91 fixed in the right way.

##
File path: 
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
##
@@ -657,6 +688,38 @@ class BlockManagerMasterEndpoint(
 }
   }
 
+  private def getShufflePushMergerLocations(
+  numMergersNeeded: Int,
+  hostsToFilter: Set[String]): Seq[BlockManagerId] = {
+val blockManagersWithExecutors = 
blockManagerIdByExecutor.groupBy(_._2.host)
+  .mapValues(_.head).values.map(_._2).toSet
+val filteredBlockManagersWithExecutors = blockManagersWithExecutors
+  .filterNot(x => hostsToFilter.contains(x.host))
+val filteredMergersWithExecutors = filteredBlockManagersWithExecutors.map(
+  x => BlockManagerId(x.executorId, x.host, 
StorageUtils.externalShuffleServicePort(conf)))
+
+// Enough mergers are available as part of active executors list
+if (filteredMergersWithExecutors.size >= numMergersNeeded) {
+  filteredMergersWithExecutors.toSeq
+} else {
+  // Delta mergers added from inactive mergers list to the active mergers 
list
+  val filteredMergersWithExecutorsHosts = 
filteredMergersWithExecutors.map(_.host)
+  // Pick random hosts instead of preferring the top of the list
+  val randomizedShuffleMergerLocations = 
Utils.randomize(shuffleMergerLocations.values.toSeq)

Review comment:
   @Victsm You're right. @venkata91 fixed in the right way.

##
File path: 
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
##
@@ -657,6 +688,38 @@ class BlockManagerMasterEndpoint(
 }
   }
 
+  private def getShufflePushMergerLocations(
+  numMergersNeeded: Int,
+  hostsToFilter: Set[String]): Seq[BlockManagerId] = {
+val blockManagersWithExecutors = 
blockManagerIdByExecutor.groupBy(_._2.host)
+  .mapValues(_.head).values.map(_._2).toSet
+val filteredBlockManagersWithExecutors = blockManagersWithExecutors
+  .filterNot(x => hostsToFilter.contains(x.host))
+val filteredMergersWithExecutors = filteredBlockManagersWithExecutors.map(
+  x => BlockManagerId(x.executorId, x.host, 
StorageUtils.externalShuffleServicePort(conf)))
+
+// Enough mergers are available as part of active executors list
+if (filteredMergersWithExecutors.size >= numMergersNeeded) {
+  filteredMergersWithExecutors.toSeq
+} else {
+  // Delta mergers added from inactive mergers list to the active mergers 
list
+  val filteredMergersWithExecutorsHosts = 
filteredMergersWithExecutors.map(_.host)
+  // Pick random hosts instead of preferring the top of the list
+  val randomizedShuffleMergerLocations = 
Utils.randomize(shuffleMergerLocations.values.toSeq)

Review comment:
   @Victsm You're right. @venkata91 fixed it in the right way.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate

2020-11-15 Thread GitBox


AmplabJenkins commented on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-727739053







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2

2020-11-15 Thread GitBox


dongjoon-hyun commented on pull request #30378:
URL: https://github.com/apache/spark/pull/30378#issuecomment-727738689


   BTW, @HyukjinKwon . Do you still want to use Java 8 in this PR?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2

2020-11-15 Thread GitBox


dongjoon-hyun commented on a change in pull request #30378:
URL: https://github.com/apache/spark/pull/30378#discussion_r523902143



##
File path: .github/workflows/build_and_test.yml
##
@@ -14,6 +14,28 @@ on:
 required: true
 
 jobs:
+  # This is on the top to give the most visibility in case of failures
+  hadoop-2:
+name: Hadoop 2 build
+runs-on: ubuntu-20.04
+steps:
+- name: Checkout Spark repository
+  uses: actions/checkout@v2
+- name: Cache Coursier local repository
+  uses: actions/cache@v2
+  with:
+path: ~/.cache/coursier
+key: hadoop-2-coursier-${{ hashFiles('**/pom.xml', '**/plugins.sbt') }}
+restore-keys: |
+  hadoop-2-coursier-
+- name: Install Java 11
+  uses: actions/setup-java@v1
+  with:
+java-version: 11

Review comment:
   @HyukjinKwon . It will be okay~ This is only for compilation test for 
Hadoop 2.7 API instead of running real tests.
   
   After we make `branch-3.1`, Jenkins will be created for all release 
profiles, too.

##
File path: .github/workflows/build_and_test.yml
##
@@ -14,6 +14,28 @@ on:
 required: true
 
 jobs:
+  # This is on the top to give the most visibility in case of failures
+  hadoop-2:
+name: Hadoop 2 build
+runs-on: ubuntu-20.04
+steps:
+- name: Checkout Spark repository
+  uses: actions/checkout@v2
+- name: Cache Coursier local repository
+  uses: actions/cache@v2
+  with:
+path: ~/.cache/coursier
+key: hadoop-2-coursier-${{ hashFiles('**/pom.xml', '**/plugins.sbt') }}
+restore-keys: |
+  hadoop-2-coursier-
+- name: Install Java 11
+  uses: actions/setup-java@v1
+  with:
+java-version: 11

Review comment:
   @HyukjinKwon . It will be okay~ This is only for compilation test for 
Hadoop 2.7 API instead of running real tests.
   
   After we make `branch-3.1`, Jenkins jobs will be created for all release 
profiles, too.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #30378: [SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2

2020-11-15 Thread GitBox


dongjoon-hyun commented on pull request #30378:
URL: https://github.com/apache/spark/pull/30378#issuecomment-727737781


   It seems that there exists some delay on GitHub Action. I retriggered GitHub 
Action because the previous run fails with compilation error again.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate

2020-11-15 Thread GitBox


AmplabJenkins removed a comment on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-727737091


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/131130/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28386: [SPARK-26199][SPARK-31517][R] Fix strategy for handling ... names in mutate

2020-11-15 Thread GitBox


SparkQA removed a comment on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-727728851


   **[Test build #131130 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131130/testReport)**
 for PR 28386 at commit 
[`7833d65`](https://github.com/apache/spark/commit/7833d65f0078b441a92249b2cf3a7e00c70a9e32).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >