[GitHub] [spark] HyukjinKwon commented on pull request #29853: [SPARK-SQL][SPARK-32977] Fix JavaDoc on Default Save Mode

2020-09-23 Thread GitBox


HyukjinKwon commented on pull request #29853:
URL: https://github.com/apache/spark/pull/29853#issuecomment-698079990


   Yeah, that's a known flaky test. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #29828: [SPARK-32948][SQL] Optimize to_json and from_json expression chain

2020-09-23 Thread GitBox


viirya commented on a change in pull request #29828:
URL: https://github.com/apache/spark/pull/29828#discussion_r494011718



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/JsonSuite.scala
##
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.RuleExecutor
+import org.apache.spark.sql.types._
+
+class JsonSuite extends PlanTest with ExpressionEvalHelper {
+
+  object Optimizer extends RuleExecutor[LogicalPlan] {
+val batches = Batch("Json optimization", FixedPoint(10), 
OptimizeJsonExprs) :: Nil
+  }
+
+  val schema = StructType.fromDDL("a int, b int")
+
+  private val structAtt = 'struct.struct(schema).notNull
+
+  private val testRelation = LocalRelation(structAtt)
+
+  test("SPARK-32948: optimize from_json + to_json") {
+val options = Map.empty[String, String]
+
+val query1 = testRelation
+  .select(JsonToStructs(schema, options, StructsToJson(options, 
'struct)).as("struct"))
+val optimized1 = Optimizer.execute(query1.analyze)
+
+val expected = testRelation.select('struct.as("struct")).analyze
+comparePlans(optimized1, expected)
+
+val query2 = testRelation
+  .select(
+JsonToStructs(schema, options,
+  StructsToJson(options,
+JsonToStructs(schema, options,
+  StructsToJson(options, 'struct.as("struct"))
+val optimized2 = Optimizer.execute(query2.analyze)
+
+comparePlans(optimized2, expected)
+  }
+
+  test("SPARK-32948: not optimize from_json + to_json if schema is different") 
{
+val options = Map.empty[String, String]
+val schema = StructType.fromDDL("a int")
+
+val query = testRelation
+  .select(JsonToStructs(schema, options, StructsToJson(options, 
'struct)).as("struct"))
+val optimized = Optimizer.execute(query.analyze)
+
+val expected = testRelation.select(
+  JsonToStructs(schema, options, StructsToJson(options, 
'struct)).as("struct")).analyze
+comparePlans(optimized, expected)
+  }
+
+  test("SPARK-32948: not optimize from_json + to_json if option is not empty") 
{

Review comment:
   ok.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #29859: [SPARK-32971][K8S][FOLLOWUP] Add `.toSeq` for Scala 2.13 compilation

2020-09-23 Thread GitBox


dongjoon-hyun closed pull request #29859:
URL: https://github.com/apache/spark/pull/29859


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29859: [SPARK-32971][K8S][FOLLOWUP] Add `.toSeq` for Scala 2.13 compilation

2020-09-23 Thread GitBox


dongjoon-hyun commented on pull request #29859:
URL: https://github.com/apache/spark/pull/29859#issuecomment-698084404


   Scala 2.13 GA job passed. Thank you, @viirya and @HyukjinKwon .
   Merged to master.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #29861: [SPARK-32971][K8S][FOLLOWUP] Fix k8s core module compile in Scala 2.13

2020-09-23 Thread GitBox


LuciferYang commented on pull request #29861:
URL: https://github.com/apache/spark/pull/29861#issuecomment-698100268


   cc @dongjoon-hyun The change of SPARK-32971 blocking GitHub 2.13 build Action



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29756: [SPARK-32885][SS] Add DataStreamReader.table API

2020-09-23 Thread GitBox


SparkQA commented on pull request #29756:
URL: https://github.com/apache/spark/pull/29756#issuecomment-698100892


   **[Test build #129062 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129062/testReport)**
 for PR 29756 at commit 
[`97761d2`](https://github.com/apache/spark/commit/97761d23c723561781afe322a44d6af8757fc8a6).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xuanyuanking commented on a change in pull request #29756: [SPARK-32885][SS] Add DataStreamReader.table API

2020-09-23 Thread GitBox


xuanyuanking commented on a change in pull request #29756:
URL: https://github.com/apache/spark/pull/29756#discussion_r494027847



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
##
@@ -260,19 +264,47 @@ class FindDataSourceTable(sparkSession: SparkSession) 
extends Rule[LogicalPlan]
 })
   }
 
+  private def getStreamingRelation(
+  table: CatalogTable,
+  extraOptions: CaseInsensitiveStringMap): StreamingRelation = {
+val dsOptions = DataSourceUtils.generateDatasourceOptions(extraOptions, 
table)
+val dataSource = DataSource(
+  sparkSession,
+  className = table.provider.get,
+  userSpecifiedSchema = Some(table.schema),
+  options = dsOptions)
+StreamingRelation(dataSource)
+  }
+
+
   override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
-case i @ InsertIntoStatement(UnresolvedCatalogRelation(tableMeta, 
options), _, _, _, _)
+case i @ InsertIntoStatement(UnresolvedCatalogRelation(tableMeta, options, 
false), _, _, _, _)
 if DDLUtils.isDatasourceTable(tableMeta) =>
   i.copy(table = readDataSourceTable(tableMeta, options))
 
-case i @ InsertIntoStatement(UnresolvedCatalogRelation(tableMeta, _), _, 
_, _, _) =>
+case i @ InsertIntoStatement(UnresolvedCatalogRelation(tableMeta, _, 
false), _, _, _, _) =>
   i.copy(table = DDLUtils.readHiveTable(tableMeta))
 
-case UnresolvedCatalogRelation(tableMeta, options) if 
DDLUtils.isDatasourceTable(tableMeta) =>
+case UnresolvedCatalogRelation(tableMeta, options, false)
+if DDLUtils.isDatasourceTable(tableMeta) =>
   readDataSourceTable(tableMeta, options)
 
-case UnresolvedCatalogRelation(tableMeta, _) =>
+case UnresolvedCatalogRelation(tableMeta, _, false) =>
   DDLUtils.readHiveTable(tableMeta)
+
+case UnresolvedCatalogRelation(tableMeta, extraOptions, true) =>
+  getStreamingRelation(tableMeta, extraOptions)
+
+case s @ StreamingRelationV2(
+_, _, table, extraOptions, _, _, _, 
Some(UnresolvedCatalogRelation(tableMeta, _, true))) =>
+  import 
org.apache.spark.sql.execution.datasources.v2.DataSourceV2Implicits._
+  if (table.isInstanceOf[SupportsRead]
+  && table.supportsAny(MICRO_BATCH_READ, CONTINUOUS_READ)) {
+s.copy(v1Relation = None)

Review comment:
   Yes, done in 97761d2

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/TableCapabilityCheck.scala
##
@@ -43,7 +43,8 @@ object TableCapabilityCheck extends (LogicalPlan => Unit) {
   case r: DataSourceV2Relation if !r.table.supports(BATCH_READ) =>
 failAnalysis(s"Table ${r.table.name()} does not support batch scan.")
 
-  case r: StreamingRelationV2 if !r.table.supportsAny(MICRO_BATCH_READ, 
CONTINUOUS_READ) =>
+  case r: StreamingRelationV2
+  if !r.table.supportsAny(MICRO_BATCH_READ, CONTINUOUS_READ) && 
r.v1Relation.isEmpty =>

Review comment:
   Wrongly commit, reverted in 97761d2

##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/test/DataStreamTableAPISuite.scala
##
@@ -0,0 +1,231 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.streaming.test
+
+import java.util
+
+import scala.collection.JavaConverters._
+
+import org.scalatest.BeforeAndAfter
+
+import org.apache.spark.sql.{AnalysisException, Row}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException
+import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, 
CatalogTable, CatalogTableType, V2TableWithV1Fallback}
+import org.apache.spark.sql.connector.{FakeV2Provider, InMemoryTableCatalog}
+import org.apache.spark.sql.connector.catalog.{Identifier, SupportsRead, 
Table, TableCapability}
+import org.apache.spark.sql.connector.expressions.Transform
+import org.apache.spark.sql.connector.read.ScanBuilder
+import org.apache.spark.sql.execution.streaming.{MemoryStream, 
MemoryStreamScanBuilder, StreamingRelation}
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.streaming.StreamTest
+import 

[GitHub] [spark] xuanyuanking commented on a change in pull request #29756: [SPARK-32885][SS] Add DataStreamReader.table API

2020-09-23 Thread GitBox


xuanyuanking commented on a change in pull request #29756:
URL: https://github.com/apache/spark/pull/29756#discussion_r494027786



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
##
@@ -731,3 +732,11 @@ case class HiveTableRelation(
 s"$nodeName $metadataStr"
   }
 }
+
+/**
+ * A V2 table with V1 fallback support. This is used to fallback to V1 table 
when the V2 one
+ * doesn't implement specific capabilities but V1 already has.
+ */
+trait V2TableWithV1Fallback extends Table {

Review comment:
   Copy, moved in 97761d2





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xuanyuanking commented on a change in pull request #29756: [SPARK-32885][SS] Add DataStreamReader.table API

2020-09-23 Thread GitBox


xuanyuanking commented on a change in pull request #29756:
URL: https://github.com/apache/spark/pull/29756#discussion_r494027734



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -1020,16 +1039,35 @@ class Analyzer(
 // 3) If a v1 table is found, create a v1 relation. Otherwise, create a v2 
relation.
 private def lookupRelation(
 identifier: Seq[String],
-options: CaseInsensitiveStringMap): Option[LogicalPlan] = {
+options: CaseInsensitiveStringMap,
+isStreaming: Boolean): Option[LogicalPlan] = {
   expandRelationName(identifier) match {
 case SessionCatalogAndIdentifier(catalog, ident) =>
   lazy val loaded = CatalogV2Util.loadTable(catalog, ident).map {
 case v1Table: V1Table =>
-  v1SessionCatalog.getRelation(v1Table.v1Table, options)
+  if (isStreaming) {
+SubqueryAlias(

Review comment:
   Sure, done in 97761d2.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29861: [SPARK-32971][K8S][FOLLOWUP] Fix k8s core module compile in Scala 2.13

2020-09-23 Thread GitBox


SparkQA commented on pull request #29861:
URL: https://github.com/apache/spark/pull/29861#issuecomment-698100859


   **[Test build #129061 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129061/testReport)**
 for PR 29861 at commit 
[`6ef5c2b`](https://github.com/apache/spark/commit/6ef5c2b325b1a4b94b8d5815d1d7e35cd029deb7).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29853: [SPARK-SQL][SPARK-32977] Fix JavaDoc on Default Save Mode

2020-09-23 Thread GitBox


AmplabJenkins removed a comment on pull request #29853:
URL: https://github.com/apache/spark/pull/29853#issuecomment-697979195







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29828: [SPARK-32948][SQL] Optimize to_json and from_json expression chain

2020-09-23 Thread GitBox


SparkQA commented on pull request #29828:
URL: https://github.com/apache/spark/pull/29828#issuecomment-697983565


   **[Test build #129038 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129038/testReport)**
 for PR 29828 at commit 
[`08cd0a7`](https://github.com/apache/spark/commit/08cd0a7172fd6fe3eb690a2fb27ccc3f14d536a8).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29828: [SPARK-32948][SQL] Optimize to_json and from_json expression chain

2020-09-23 Thread GitBox


SparkQA removed a comment on pull request #29828:
URL: https://github.com/apache/spark/pull/29828#issuecomment-697669398


   **[Test build #129038 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129038/testReport)**
 for PR 29828 at commit 
[`08cd0a7`](https://github.com/apache/spark/commit/08cd0a7172fd6fe3eb690a2fb27ccc3f14d536a8).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29828: [SPARK-32948][SQL] Optimize to_json and from_json expression chain

2020-09-23 Thread GitBox


AmplabJenkins commented on pull request #29828:
URL: https://github.com/apache/spark/pull/29828#issuecomment-697984381







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29854: [WIP][SPARK-32937][SPARK-32980][K8S] Fix decom & launcher tests and add some comments to reduce chance of breakage

2020-09-23 Thread GitBox


AmplabJenkins removed a comment on pull request #29854:
URL: https://github.com/apache/spark/pull/29854#issuecomment-697990499







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29854: [WIP][SPARK-32937][SPARK-32980][K8S] Fix decom & launcher tests and add some comments to reduce chance of breakage

2020-09-23 Thread GitBox


AmplabJenkins commented on pull request #29854:
URL: https://github.com/apache/spark/pull/29854#issuecomment-697990499







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun opened a new pull request #29856: [SPARK-32981][BUILD] Remove hive-1.2/hadoop-2.7 from Apache Spark 3.1 distribution

2020-09-23 Thread GitBox


dongjoon-hyun opened a new pull request #29856:
URL: https://github.com/apache/spark/pull/29856


   ### What changes were proposed in this pull request?
   
   Apache Spark 3.0 switches its Hive execution version from 1.2 to 2.3, but it 
still provides the unofficial forked Hive 1.2 version from our distribution 
like the following. This PR aims to remove it from Apache Spark 3.1.0 
officially while keeping `hive-1.2` profile.
   ```
   spark-3.0.1-bin-hadoop2.7-hive1.2.tgz
   spark-3.0.1-bin-hadoop2.7-hive1.2.tgz.asc
   spark-3.0.1-bin-hadoop2.7-hive1.2.tgz.sha512 
   ```
   
   ### Why are the changes needed?
   
   The unofficial Hive 1.2.1 fork has many bugs and is not maintained for a 
long time. We had better not recommend this in the official Apache Spark 
distribution.
   
   ### Does this PR introduce _any_ user-facing change?
   
   There is no user-facing change in the default distribution (Hadoop 3.2/Hive 
2.3).
   
   ### How was this patch tested?
   
   Manually because this is a change in release script .



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29854: [WIP][SPARK-32937][SPARK-32980][K8S] Fix decom & launcher tests and add some comments to reduce chance of breakage

2020-09-23 Thread GitBox


dongjoon-hyun commented on pull request #29854:
URL: https://github.com/apache/spark/pull/29854#issuecomment-698007198


   Yes! Finally, it passed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] holdenk commented on pull request #29533: [SPARK-24266][K8S][3.0] Restart the watcher when we receive a version changed from k8s

2020-09-23 Thread GitBox


holdenk commented on pull request #29533:
URL: https://github.com/apache/spark/pull/29533#issuecomment-698013767


   Jenkins retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] holdenk commented on pull request #29788: [SPARK-32913][CORE][K8S] Improve ExecutorDecommissionInfo and ExecutorDecommissionState for different use cases

2020-09-23 Thread GitBox


holdenk commented on pull request #29788:
URL: https://github.com/apache/spark/pull/29788#issuecomment-698021508


   I think it would be good to see your proposal in code @Ngone51 because I'm 
not 100% sure what you mean.
   I would really like to see both this and the precursor tested more 
thoroughly.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] holdenk commented on a change in pull request #29788: [SPARK-32913][CORE][K8S] Improve ExecutorDecommissionInfo and ExecutorDecommissionState for different use cases

2020-09-23 Thread GitBox


holdenk commented on a change in pull request #29788:
URL: https://github.com/apache/spark/pull/29788#discussion_r493950073



##
File path: 
core/src/main/scala/org/apache/spark/scheduler/ExecutorLossReason.scala
##
@@ -71,7 +71,8 @@ case class ExecutorProcessLost(
  * This is used by the task scheduler to remove state associated with the 
executor, but
  * not yet fail any tasks that were running in the executor before the 
executor is "fully" lost.
  *
- * @param workerHost it is defined when the worker is decommissioned too
+ * @param reason the reason why the executor is decommissioned
+ * @param host it is defined when the host where the executor located is 
decommissioned too
  */
-private [spark] case class ExecutorDecommission(workerHost: Option[String] = 
None)
- extends ExecutorLossReason("Executor decommission.")
+private [spark] case class ExecutorDecommission(reason: String, host: 
Option[String] = None)

Review comment:
   You can right unapply methods if you need to do pattern matching with 
something other than a case class.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29846: [SPARK-32971][K8S] Support dynamic PVC creation/deletion for K8s executors

2020-09-23 Thread GitBox


SparkQA commented on pull request #29846:
URL: https://github.com/apache/spark/pull/29846#issuecomment-698021654


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33671/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on a change in pull request #29837: [SPARK-32463][DOCS] SQL data type compatibility

2020-09-23 Thread GitBox


huaxingao commented on a change in pull request #29837:
URL: https://github.com/apache/spark/pull/29837#discussion_r493965798



##
File path: docs/sql-ref-datatypes.md
##
@@ -314,3 +314,33 @@ SELECT COUNT(*), c2 FROM test GROUP BY c2;
 |3| Infinity|
 +-+-+
 ```
+
+ Data type compatibility
+
+The following is the hierarchy of data type compatibility. In an operation 
involving different and compatible data types, these will be promoted to the 
lowest common top type to perform the operation.
+
+For example, if you have an add operation between an integer and a float, the 
integer will be treated as a float, the least common compatible type, resulting 
the operation in a float.
+
+The most common operations where this hierarchy is applied are:

Review comment:
   @gatorsmile WDYT? Do you want to have a table to list the most common 
rules like the one in 
https://www.ibm.com/support/knowledgecenter/SSEPGG_10.1.0/com.ibm.db2.luw.sql.ref.doc/doc/r0008477.html?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] closed pull request #27096: [SPARK-28148][SQL] Repartition after join is not optimized away

2020-09-23 Thread GitBox


github-actions[bot] closed pull request #27096:
URL: https://github.com/apache/spark/pull/27096


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #29585: [SPARK-32741][SQL] Check if the same ExprId refers to the unique attribute in logical plans

2020-09-23 Thread GitBox


maropu commented on a change in pull request #29585:
URL: https://github.com/apache/spark/pull/29585#discussion_r493976424



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala
##
@@ -458,7 +463,10 @@ object RewriteCorrelatedScalarSubquery extends 
Rule[LogicalPlan] {
 sys.error(s"Unexpected operator in scalar subquery: $lp")
 }
 
-val resultMap = evalPlan(plan)
+val resultMap = evalPlan(plan).mapValues { _.transform {
+case a: Alias => a.newInstance() // Assigns a new `ExprId`

Review comment:
   I've checked that we don't need this change because of the same reason 
with https://github.com/apache/spark/pull/29585#discussion_r490318779. So, I've 
reverted it in the latest commit.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29585: [SPARK-32741][SQL] Check if the same ExprId refers to the unique attribute in logical plans

2020-09-23 Thread GitBox


AmplabJenkins removed a comment on pull request #29585:
URL: https://github.com/apache/spark/pull/29585#issuecomment-698050221


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29856: [SPARK-32981][BUILD] Remove hive-1.2/hadoop-2.7 from Apache Spark 3.1 distribution

2020-09-23 Thread GitBox


AmplabJenkins removed a comment on pull request #29856:
URL: https://github.com/apache/spark/pull/29856#issuecomment-698050222


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29856: [SPARK-32981][BUILD] Remove hive-1.2/hadoop-2.7 from Apache Spark 3.1 distribution

2020-09-23 Thread GitBox


AmplabJenkins commented on pull request #29856:
URL: https://github.com/apache/spark/pull/29856#issuecomment-698050222







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29585: [SPARK-32741][SQL] Check if the same ExprId refers to the unique attribute in logical plans

2020-09-23 Thread GitBox


AmplabJenkins commented on pull request #29585:
URL: https://github.com/apache/spark/pull/29585#issuecomment-698050221







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29804: [SPARK-32859][SQL] Introduce physical rule to decide bucketing dynamically

2020-09-23 Thread GitBox


AmplabJenkins commented on pull request #29804:
URL: https://github.com/apache/spark/pull/29804#issuecomment-698050224


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum closed pull request #28642: [SPARK-31809][SQL] Infer IsNotNull for non null intolerant child of null intolerant in join condition

2020-09-23 Thread GitBox


wangyum closed pull request #28642:
URL: https://github.com/apache/spark/pull/28642


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29857: [SPARK-32972][ML] Fix UTs of `mllib` module in Scala 2.13 except RandomForestRegressorSuite

2020-09-23 Thread GitBox


AmplabJenkins removed a comment on pull request #29857:
URL: https://github.com/apache/spark/pull/29857#issuecomment-698070610


   Merged build finished. Test PASSed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29857: [SPARK-32972][ML] Fix UTs of `mllib` module in Scala 2.13 except RandomForestRegressorSuite

2020-09-23 Thread GitBox


AmplabJenkins commented on pull request #29857:
URL: https://github.com/apache/spark/pull/29857#issuecomment-698070610







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29857: [SPARK-32972][ML] Fix UTs of `mllib` module in Scala 2.13 except RandomForestRegressorSuite

2020-09-23 Thread GitBox


SparkQA commented on pull request #29857:
URL: https://github.com/apache/spark/pull/29857#issuecomment-698070226


   **[Test build #129054 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129054/testReport)**
 for PR 29857 at commit 
[`b7f2f47`](https://github.com/apache/spark/commit/b7f2f47c6c22af15478b89201fb6c685f47d66bd).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29858: [SPARK-32981][BUILD][FOLLOW-UP] Remove hive-1.2 profiles in PIP installation option

2020-09-23 Thread GitBox


SparkQA commented on pull request #29858:
URL: https://github.com/apache/spark/pull/29858#issuecomment-698074003


   **[Test build #129055 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129055/testReport)**
 for PR 29858 at commit 
[`41e83f3`](https://github.com/apache/spark/commit/41e83f3932d7680cb077bdd5c0b909b2c112fe76).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29585: [SPARK-32741][SQL] Check if the same ExprId refers to the unique attribute in logical plans

2020-09-23 Thread GitBox


AmplabJenkins removed a comment on pull request #29585:
URL: https://github.com/apache/spark/pull/29585#issuecomment-698073630


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129053/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29585: [SPARK-32741][SQL] Check if the same ExprId refers to the unique attribute in logical plans

2020-09-23 Thread GitBox


AmplabJenkins removed a comment on pull request #29585:
URL: https://github.com/apache/spark/pull/29585#issuecomment-698073627


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29585: [SPARK-32741][SQL] Check if the same ExprId refers to the unique attribute in logical plans

2020-09-23 Thread GitBox


SparkQA commented on pull request #29585:
URL: https://github.com/apache/spark/pull/29585#issuecomment-698073524


   **[Test build #129053 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129053/testReport)**
 for PR 29585 at commit 
[`95c16b6`](https://github.com/apache/spark/commit/95c16b6a0c7e3c2ad02e6c3ff8104210142785f8).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `case class UnionExec(children: Seq[SparkPlan], output: Seq[Attribute]) 
extends SparkPlan `



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29585: [SPARK-32741][SQL] Check if the same ExprId refers to the unique attribute in logical plans

2020-09-23 Thread GitBox


AmplabJenkins commented on pull request #29585:
URL: https://github.com/apache/spark/pull/29585#issuecomment-698073627







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29859: [SPARK-32971][K8S][FOLLOWUP] Add `.toSeq` for Scala 2.13 compilation

2020-09-23 Thread GitBox


AmplabJenkins removed a comment on pull request #29859:
URL: https://github.com/apache/spark/pull/29859#issuecomment-698078601







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29859: [SPARK-32971][K8S][FOLLOWUP] Add `.toSeq` for Scala 2.13 compilation

2020-09-23 Thread GitBox


SparkQA removed a comment on pull request #29859:
URL: https://github.com/apache/spark/pull/29859#issuecomment-698075792


   **[Test build #129056 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129056/testReport)**
 for PR 29859 at commit 
[`19d9a2f`](https://github.com/apache/spark/commit/19d9a2f302baf0cf9c9382f28622b83355103d7e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29859: [SPARK-32971][K8S][FOLLOWUP] Add `.toSeq` for Scala 2.13 compilation

2020-09-23 Thread GitBox


AmplabJenkins commented on pull request #29859:
URL: https://github.com/apache/spark/pull/29859#issuecomment-698078601







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29859: [SPARK-32971][K8S][FOLLOWUP] Add `.toSeq` for Scala 2.13 compilation

2020-09-23 Thread GitBox


SparkQA commented on pull request #29859:
URL: https://github.com/apache/spark/pull/29859#issuecomment-698078494


   **[Test build #129056 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129056/testReport)**
 for PR 29859 at commit 
[`19d9a2f`](https://github.com/apache/spark/commit/19d9a2f302baf0cf9c9382f28622b83355103d7e).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29858: [SPARK-32982][BUILD] Remove hive-1.2 profiles in PIP installation option

2020-09-23 Thread GitBox


dongjoon-hyun commented on a change in pull request #29858:
URL: https://github.com/apache/spark/pull/29858#discussion_r494006706



##
File path: python/pyspark/install.py
##
@@ -26,18 +26,13 @@
 DEFAULT_HADOOP = "hadoop3.2"
 DEFAULT_HIVE = "hive2.3"
 SUPPORTED_HADOOP_VERSIONS = ["hadoop2.7", "hadoop3.2", "without-hadoop"]
-SUPPORTED_HIVE_VERSIONS = ["hive1.2", "hive2.3"]
+SUPPORTED_HIVE_VERSIONS = ["hive2.3"]

Review comment:
   cc @gatorsmile 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #29828: [SPARK-32948][SQL] Optimize to_json and from_json expression chain

2020-09-23 Thread GitBox


maropu commented on a change in pull request #29828:
URL: https://github.com/apache/spark/pull/29828#discussion_r494010395



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/JsonSuite.scala
##
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.RuleExecutor
+import org.apache.spark.sql.types._
+
+class JsonSuite extends PlanTest with ExpressionEvalHelper {
+
+  object Optimizer extends RuleExecutor[LogicalPlan] {
+val batches = Batch("Json optimization", FixedPoint(10), 
OptimizeJsonExprs) :: Nil
+  }
+
+  val schema = StructType.fromDDL("a int, b int")
+
+  private val structAtt = 'struct.struct(schema).notNull
+
+  private val testRelation = LocalRelation(structAtt)
+
+  test("SPARK-32948: optimize from_json + to_json") {
+val options = Map.empty[String, String]
+
+val query1 = testRelation
+  .select(JsonToStructs(schema, options, StructsToJson(options, 
'struct)).as("struct"))
+val optimized1 = Optimizer.execute(query1.analyze)
+
+val expected = testRelation.select('struct.as("struct")).analyze
+comparePlans(optimized1, expected)
+
+val query2 = testRelation
+  .select(
+JsonToStructs(schema, options,
+  StructsToJson(options,
+JsonToStructs(schema, options,
+  StructsToJson(options, 'struct.as("struct"))
+val optimized2 = Optimizer.execute(query2.analyze)
+
+comparePlans(optimized2, expected)
+  }
+
+  test("SPARK-32948: not optimize from_json + to_json if schema is different") 
{
+val options = Map.empty[String, String]
+val schema = StructType.fromDDL("a int")
+
+val query = testRelation
+  .select(JsonToStructs(schema, options, StructsToJson(options, 
'struct)).as("struct"))
+val optimized = Optimizer.execute(query.analyze)
+
+val expected = testRelation.select(
+  JsonToStructs(schema, options, StructsToJson(options, 
'struct)).as("struct")).analyze
+comparePlans(optimized, expected)
+  }
+
+  test("SPARK-32948: not optimize from_json + to_json if option is not empty") 
{

Review comment:
   Could you add tests with different timezone cases, too?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #29853: [SPARK-32977][SQL][DOCS] Fix JavaDoc on Default Save Mode

2020-09-23 Thread GitBox


dongjoon-hyun closed pull request #29853:
URL: https://github.com/apache/spark/pull/29853


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29857: [SPARK-32972][ML] Fix UTs of `mllib` module in Scala 2.13 except RandomForestRegressorSuite

2020-09-23 Thread GitBox


SparkQA commented on pull request #29857:
URL: https://github.com/apache/spark/pull/29857#issuecomment-698087474


   **[Test build #129054 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129054/testReport)**
 for PR 29857 at commit 
[`b7f2f47`](https://github.com/apache/spark/commit/b7f2f47c6c22af15478b89201fb6c685f47d66bd).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-09-23 Thread GitBox


AmplabJenkins commented on pull request #29800:
URL: https://github.com/apache/spark/pull/29800#issuecomment-698087113







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29859: [SPARK-32971][K8S][FOLLOWUP] Add `.toSeq` for Scala 2.13 compilation

2020-09-23 Thread GitBox


SparkQA commented on pull request #29859:
URL: https://github.com/apache/spark/pull/29859#issuecomment-698087229


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33677/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-09-23 Thread GitBox


SparkQA commented on pull request #29800:
URL: https://github.com/apache/spark/pull/29800#issuecomment-698086807


   **[Test build #129058 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129058/testReport)**
 for PR 29800 at commit 
[`b5a48b8`](https://github.com/apache/spark/commit/b5a48b8ac45200b3147870642243cf32b9966e38).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-09-23 Thread GitBox


AmplabJenkins removed a comment on pull request #29800:
URL: https://github.com/apache/spark/pull/29800#issuecomment-698087113







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on pull request #29860: [SPARK-32984][TESTS][SQL] Improve showing the differences between approved and actual plans of PlanStabilitySuite

2020-09-23 Thread GitBox


Ngone51 commented on pull request #29860:
URL: https://github.com/apache/spark/pull/29860#issuecomment-698090163


   cc @cloud-fan @maropu Please take a look, thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 opened a new pull request #29860: [SPARK-32984][TESTS][SQL] Improve showing the differences between approved and actual plans of PlanStabilitySuite

2020-09-23 Thread GitBox


Ngone51 opened a new pull request #29860:
URL: https://github.com/apache/spark/pull/29860


   
   
   ### What changes were proposed in this pull request?
   
   
   This PR proposes to add the caret hints, e.g., `^`,  to the approved and 
actual plans where they first become different.
   
   ### Why are the changes needed?
   
   
   It's hard to find the difference between the approved and actual plan since 
the plans of TPC-DS queries are often huge. Adding the hints would help 
developers to locate the plan differences quickly.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   Yes, after this change, there're hits added to the plans to highlight the 
differences. For example,
   
   ```scala
   [info]   last approved simplified plan: 
/Users/yi.wu/IdeaProjects/spark/sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q41.sf100/simplified.txt
   [info]   last approved explain plan: 
/Users/yi.wu/IdeaProjects/spark/sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q41.sf100/explain.txt
   [info]   
   [info]   TakeOrderedAndProject [i_product_name]
   [info] WholeStageCodegen (4)
   [info]   HashAggregate [i_product_name]
   [info] InputAdapter
   [info]   Exchange [i_product_name] #1
   [info] WholeStageCodegen (3)
   [info]   HashAggregate [i_product_name]
   [info] Project [i_product_name]
   [info]   BroadcastHashJoin [i_manufact,i_manufact]
   [info] Project [i_manufact,i_product_name]
   [info]   Filter [i_manufact_id,i_manufact]
   [info] ColumnarToRow
   [info]   InputAdapter
   [info] Scan parquet default.item 
[i_manufact_id,i_manufact,i_product_name]
   [info] InputAdapter
   [info]   BroadcastExchange #2
   [info] WholeStageCodegen (2)
   [info]   Project [i_manufact]
   [info] Filter [item_cnt]
   [info]   HashAggregate [i_manufact,count] 
[count(1),item_cnt,i_manufact,count]
   [info]   
   ^^
   [info] InputAdapter
   [info]   Exchange [i_manufact] #3
   [info] WholeStageCodegen (1)
   [info]   HashAggregate [i_manufact] 
[count,count]
   [info] Project [i_manufact]
   [info]   Filter 
[i_category,i_color,i_units,i_size,i_manufact]
   [info] ColumnarToRow
   [info]   InputAdapter
   [info] Scan parquet 
default.item [i_category,i_manufact,i_size,i_color,i_units]
   [info]   
   [info]   actual simplified plan: 
/Users/yi.wu/IdeaProjects/spark/target/tmp/q41.sf100.actual.simplified.txt
   [info]   actual explain plan: 
/Users/yi.wu/IdeaProjects/spark/target/tmp/q41.sf100.actual.explain.txt
   [info]   
   [info]   TakeOrderedAndProject [i_product_name]
   [info] WholeStageCodegen (4)
   [info]   HashAggregate [i_product_name]
   [info] InputAdapter
   [info]   Exchange [i_product_name] #1
   [info] WholeStageCodegen (3)
   [info]   HashAggregate [i_product_name]
   [info] Project [i_product_name]
   [info]   BroadcastHashJoin [i_manufact,i_manufact]
   [info] Project [i_manufact,i_product_name]
   [info]   Filter [i_manufact_id,i_manufact]
   [info] ColumnarToRow
   [info]   InputAdapter
   [info] Scan parquet default.item 
[i_manufact_id,i_manufact,i_product_name]
   [info] InputAdapter
   [info]   BroadcastExchange #2
   [info] WholeStageCodegen (2)
   [info]   Project [i_manufact]
   [info] Filter [alwaysTrue,item_cnt]
   [info]   HashAggregate [i_manufact,count] 
[count(1),item_cnt,i_manufact,alwaysTrue,count]
   [info]   
   ^^
   [info] InputAdapter
   [info]   Exchange [i_manufact] #3
   [info] WholeStageCodegen (1)
   [info]   HashAggregate [i_manufact] 
[count,count]
   [info] Project [i_manufact]
   [info]   

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29828: [SPARK-32948][SQL] Optimize to_json and from_json expression chain

2020-09-23 Thread GitBox


AmplabJenkins removed a comment on pull request #29828:
URL: https://github.com/apache/spark/pull/29828#issuecomment-698089064







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29756: [SPARK-32885][SS] Add DataStreamReader.table API

2020-09-23 Thread GitBox


AmplabJenkins removed a comment on pull request #29756:
URL: https://github.com/apache/spark/pull/29756#issuecomment-698101187







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29756: [SPARK-32885][SS] Add DataStreamReader.table API

2020-09-23 Thread GitBox


AmplabJenkins commented on pull request #29756:
URL: https://github.com/apache/spark/pull/29756#issuecomment-698101187







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29855: SPARK-32915 Network-layer and shuffle RPC layer changes to support push shuffle blocks

2020-09-23 Thread GitBox


AmplabJenkins commented on pull request #29855:
URL: https://github.com/apache/spark/pull/29855#issuecomment-697980595







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Victsm commented on pull request #29855: SPARK-32915 Network-layer and shuffle RPC layer changes to support push shuffle blocks

2020-09-23 Thread GitBox


Victsm commented on pull request #29855:
URL: https://github.com/apache/spark/pull/29855#issuecomment-697980575


   Fixed the Java style issue and the 1 UT failure.
   Test build should be clean now.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29855: SPARK-32915 Network-layer and shuffle RPC layer changes to support push shuffle blocks

2020-09-23 Thread GitBox


SparkQA commented on pull request #29855:
URL: https://github.com/apache/spark/pull/29855#issuecomment-697980048


   **[Test build #129046 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129046/testReport)**
 for PR 29855 at commit 
[`3e9e9e1`](https://github.com/apache/spark/commit/3e9e9e1fe0da1383e0a26bd8610032b18a94cf1d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zero323 commented on pull request #29591: [SPARK-32714][PYTHON] Initial pyspark-stubs port.

2020-09-23 Thread GitBox


zero323 commented on pull request #29591:
URL: https://github.com/apache/spark/pull/29591#issuecomment-697980912


   Jenkins, retest this please.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29855: SPARK-32915 Network-layer and shuffle RPC layer changes to support push shuffle blocks

2020-09-23 Thread GitBox


AmplabJenkins removed a comment on pull request #29855:
URL: https://github.com/apache/spark/pull/29855#issuecomment-697980595







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29828: [SPARK-32948][SQL] Optimize to_json and from_json expression chain

2020-09-23 Thread GitBox


AmplabJenkins removed a comment on pull request #29828:
URL: https://github.com/apache/spark/pull/29828#issuecomment-697984381







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29854: [WIP][SPARK-32937][SPARK-32980][K8S] Fix decom & launcher tests and add some comments to reduce chance of breakage

2020-09-23 Thread GitBox


SparkQA commented on pull request #29854:
URL: https://github.com/apache/spark/pull/29854#issuecomment-697989432


   **[Test build #129042 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129042/testReport)**
 for PR 29854 at commit 
[`64d6dd8`](https://github.com/apache/spark/commit/64d6dd8be689d59785f2205a6586f665dc8057c7).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29854: [WIP][SPARK-32937][SPARK-32980][K8S] Fix decom & launcher tests and add some comments to reduce chance of breakage

2020-09-23 Thread GitBox


dongjoon-hyun commented on pull request #29854:
URL: https://github.com/apache/spark/pull/29854#issuecomment-697989257


   Thank you, @holdenk !  Let's see K8s IT result.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29824: [SPARK-32954][YARN][TEST][test-hadoop2.7][test-maven] Add jakarta.servlet-api test dependency to yarn module to avoid UTs badcase

2020-09-23 Thread GitBox


SparkQA removed a comment on pull request #29824:
URL: https://github.com/apache/spark/pull/29824#issuecomment-697376701


   **[Test build #129032 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129032/testReport)**
 for PR 29824 at commit 
[`250e397`](https://github.com/apache/spark/commit/250e397e32ac6b55ea84d68cacd340c4cbc37870).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29828: [SPARK-32948][SQL] Optimize to_json and from_json expression chain

2020-09-23 Thread GitBox


SparkQA commented on pull request #29828:
URL: https://github.com/apache/spark/pull/29828#issuecomment-697994187


   **[Test build #129039 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129039/testReport)**
 for PR 29828 at commit 
[`078dc84`](https://github.com/apache/spark/commit/078dc84cd32d6fb0ccda20b0d161699a4e85355f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29824: [SPARK-32954][YARN][TEST][test-hadoop2.7][test-maven] Add jakarta.servlet-api test dependency to yarn module to avoid UTs badca

2020-09-23 Thread GitBox


AmplabJenkins removed a comment on pull request #29824:
URL: https://github.com/apache/spark/pull/29824#issuecomment-697993651







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29856: [SPARK-32981][BUILD] Remove hive-1.2/hadoop-2.7 from Apache Spark 3.1 distribution

2020-09-23 Thread GitBox


AmplabJenkins commented on pull request #29856:
URL: https://github.com/apache/spark/pull/29856#issuecomment-698004577







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29856: [SPARK-32981][BUILD] Remove hive-1.2/hadoop-2.7 from Apache Spark 3.1 distribution

2020-09-23 Thread GitBox


AmplabJenkins removed a comment on pull request #29856:
URL: https://github.com/apache/spark/pull/29856#issuecomment-698004577







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29856: [SPARK-32981][BUILD] Remove hive-1.2/hadoop-2.7 from Apache Spark 3.1 distribution

2020-09-23 Thread GitBox


SparkQA commented on pull request #29856:
URL: https://github.com/apache/spark/pull/29856#issuecomment-698004283


   **[Test build #129049 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129049/testReport)**
 for PR 29856 at commit 
[`996cbe2`](https://github.com/apache/spark/commit/996cbe204110e0667a5b693c48c1a3b2cb7b3e26).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29854: [WIP][SPARK-32937][SPARK-32980][K8S] Fix decom & launcher tests and add some comments to reduce chance of breakage

2020-09-23 Thread GitBox


SparkQA commented on pull request #29854:
URL: https://github.com/apache/spark/pull/29854#issuecomment-698006612


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33669/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #29856: [SPARK-32981][BUILD] Remove hive-1.2/hadoop-2.7 from Apache Spark 3.1 distribution

2020-09-23 Thread GitBox


dongjoon-hyun closed pull request #29856:
URL: https://github.com/apache/spark/pull/29856


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29854: [WIP][SPARK-32937][SPARK-32980][K8S] Fix decom & launcher tests and add some comments to reduce chance of breakage

2020-09-23 Thread GitBox


AmplabJenkins commented on pull request #29854:
URL: https://github.com/apache/spark/pull/29854#issuecomment-698006632







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29854: [WIP][SPARK-32937][SPARK-32980][K8S] Fix decom & launcher tests and add some comments to reduce chance of breakage

2020-09-23 Thread GitBox


AmplabJenkins removed a comment on pull request #29854:
URL: https://github.com/apache/spark/pull/29854#issuecomment-698006632







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #29854: [SPARK-32937][SPARK-32980][K8S] Fix decom & launcher tests and add some comments to reduce chance of breakage

2020-09-23 Thread GitBox


dongjoon-hyun closed pull request #29854:
URL: https://github.com/apache/spark/pull/29854


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29846: [SPARK-32971][K8S] Support dynamic PVC creation/deletion for K8s executors

2020-09-23 Thread GitBox


dongjoon-hyun commented on a change in pull request #29846:
URL: https://github.com/apache/spark/pull/29846#discussion_r493952674



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesVolumeSpec.scala
##
@@ -21,7 +21,10 @@ private[spark] sealed trait KubernetesVolumeSpecificConf
 private[spark] case class KubernetesHostPathVolumeConf(hostPath: String)
   extends KubernetesVolumeSpecificConf
 
-private[spark] case class KubernetesPVCVolumeConf(claimName: String)
+private[spark] case class KubernetesPVCVolumeConf(
+claimName: String,
+storageClass: Option[String] = None,
+size: Option[String] = None)

Review comment:
   Yes. We need both. In this PR, if one of them is missing, the fallback 
operation is the existing PVC mounting behavior. So, it doesn't try to create 
PVC and assume the PVC exists with the given PVC name.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29533: [SPARK-24266][K8S][3.0] Restart the watcher when we receive a version changed from k8s

2020-09-23 Thread GitBox


SparkQA commented on pull request #29533:
URL: https://github.com/apache/spark/pull/29533#issuecomment-698036245


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33672/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #29804: [SPARK-32859][SQL] Introduce physical rule to decide bucketing dynamically

2020-09-23 Thread GitBox


maropu commented on a change in pull request #29804:
URL: https://github.com/apache/spark/pull/29804#discussion_r493991276



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala
##
@@ -1012,4 +1014,43 @@ abstract class BucketedReadSuite extends QueryTest with 
SQLTestUtils {
   }
 }
   }
+
+  test("SPARK-32859: disable unnecessary bucketed table scan based on query 
plan") {
+withTable("t1", "t2") {
+  df1.write.format("parquet").bucketBy(8, "i").saveAsTable("t1")
+  df2.write.format("parquet").bucketBy(4, "i").saveAsTable("t2")
+
+  def checkNumBucketedScan(query: String, expectedNumBucketedScan: Int): 
Unit = {
+val plan = sql(query).queryExecution.executedPlan
+val bucketedScan = plan.collect { case s: FileSourceScanExec if 
s.bucketedScan => s }
+assert(bucketedScan.length == expectedNumBucketedScan)
+  }
+
+  Seq(
+("SELECT * FROM t1 JOIN t2 ON t1.i = t2.i", 1, 2),
+("SELECT * FROM t1 JOIN t2 ON t1.i = t2.j", 1, 2),
+("SELECT * FROM t1 JOIN t2 ON t1.j = t2.j", 0, 2),
+("SELECT SUM(i) FROM t1 GROUP BY i", 1, 1),
+("SELECT SUM(i) FROM t1 GROUP BY j", 0, 1),
+("SELECT * FROM t1 WHERE i = 1", 1, 1),
+("SELECT * FROM t1 WHERE j = 1", 0, 1),

Review comment:
   I left two comments about the test;
- Could you add more test cases, e.g., multiple join cases, multiple bucket 
column cases, ...?
- Could you split this single test unit into multiple ones having 
meaningful test titles?, e.g., `test("SPARK-32859: disable unnecessary bucketed 
table scan based on query plan - multiple join test")`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29817: [SPARK-32850][CORE][K8S] Simplify the RPC message flow of decommission

2020-09-23 Thread GitBox


HyukjinKwon commented on pull request #29817:
URL: https://github.com/apache/spark/pull/29817#issuecomment-698062962


   > but that's only half true, this PR just broke the test some more.
   
   @holdenk, you're kidding right? There was only one test failure that was not 
caused by this PR in K8S tests. That test was just fixed by you. How come this 
PR broke more tests? Can you be more explicit on that? Which tests were more 
broken, and how did you test?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] holdenk commented on pull request #29817: [SPARK-32850][CORE][K8S] Simplify the RPC message flow of decommission

2020-09-23 Thread GitBox


holdenk commented on pull request #29817:
URL: https://github.com/apache/spark/pull/29817#issuecomment-698064374


   If it helps I called out the part of the PR I believe most likely 
responsible for that during my code review already.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29846: [SPARK-32971][K8S] Support dynamic PVC creation/deletion for K8s executors

2020-09-23 Thread GitBox


dongjoon-hyun commented on pull request #29846:
URL: https://github.com/apache/spark/pull/29846#issuecomment-698069211


   Oh, thank you for reporting! Sure!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #29824: [SPARK-32954][YARN][TEST][test-hadoop2.7][test-maven] Add jakarta.servlet-api test dependency to yarn module to avoid UTs badcase

2020-09-23 Thread GitBox


LuciferYang commented on pull request #29824:
URL: https://github.com/apache/spark/pull/29824#issuecomment-698069141


Jenkins test finally passed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon opened a new pull request #29858: [SPARK-32981][BUILD][FOLLOW-UP] Remove hive-1.2 profiles in PIP installation option

2020-09-23 Thread GitBox


HyukjinKwon opened a new pull request #29858:
URL: https://github.com/apache/spark/pull/29858


   ### What changes were proposed in this pull request?
   
   This PR removes Hive 1.2 option (and therefore `HIVE_VERSION` environment 
variable as well).
   
   ### Why are the changes needed?
   
   To remove unsupported options.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yeah, `HIVE_VERSION` and Hive 1.2 are removed.
   
   ### How was this patch tested?
   
   Manually tested:
   
   ```bash
   SPARK_VERSION=3.0.1 HADOOP_VERSION=3.2 pip install pyspark-3.1.0.dev0.tar.gz 
-v
   SPARK_VERSION=3.0.1 HADOOP_VERSION=2.7 pip install pyspark-3.1.0.dev0.tar.gz 
-v
   SPARK_VERSION=3.0.1 HADOOP_VERSION=invalid pip install 
pyspark-3.1.0.dev0.tar.gz -v
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #29817: [SPARK-32850][CORE][K8S] Simplify the RPC message flow of decommission

2020-09-23 Thread GitBox


Ngone51 commented on a change in pull request #29817:
URL: https://github.com/apache/spark/pull/29817#discussion_r494005398



##
File path: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala
##
@@ -70,7 +70,10 @@ private[deploy] class Worker(
   if (conf.get(config.DECOMMISSION_ENABLED)) {
 logInfo("Registering SIGPWR handler to trigger decommissioning.")
 SignalUtils.register("PWR", "Failed to register SIGPWR handler - " +
-  "disabling worker decommission feature.")(decommissionSelf)
+  "disabling worker decommission feature.") {
+   self.send(WorkerSigPWRReceived)

Review comment:
   This's Worker, I guess you cares more about the executor? In Worker, 
`decommissionSelf` always returns true. and In exeutor, there's a change to 
return false to fail the decommissionSelf but seems rarely happen. If you would 
insist on returning the value, I think we can use askSync instead.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon removed a comment on pull request #29858: [SPARK-32981][BUILD][FOLLOW-UP] Remove hive-1.2 profiles in PIP installation option

2020-09-23 Thread GitBox


HyukjinKwon removed a comment on pull request #29858:
URL: https://github.com/apache/spark/pull/29858#issuecomment-698075993


   @dongjoon-hyun, this affects end users, not the dev people who will use 
`hive-1.2` profile.
   I can file a separate JIRA but whenever the release distributions are 
changed, these should be updated together. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #29806: [SPARK-32187][PYTHON][DOCS] Doc on Python packaging

2020-09-23 Thread GitBox


HyukjinKwon commented on a change in pull request #29806:
URL: https://github.com/apache/spark/pull/29806#discussion_r494009107



##
File path: python/docs/source/user_guide/python_packaging.rst
##
@@ -0,0 +1,220 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+..http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+
+
+Python packaging
+
+
+When you want to run your PySpark application on a cluster (like YARN, 
Kubernetes, Mesos, ..) you need to make sure that the your code
+and all used libraries are available on the executors.
+
+As an example let's say you may want to run the `Pandas UDF's examples 
`_.
+As it uses pyarrow as an underlying implementation we need to make sure to 
have pyarrow installed on each executor on the cluster. Otherwise you may get 
errors such as 
+``ModuleNotFoundError: No module named 'pyarrow'``.
+
+Here is the script ``main.py`` from the previous example that will be executed 
on the cluster:
+
+.. code-block:: python
+
+  import pandas as pd
+  from pyspark.sql.functions import pandas_udf, PandasUDFType
+  from pyspark.sql import SparkSession
+
+  def main(spark):
+df = spark.createDataFrame(
+  [(1, 1.0), (1, 2.0), (2, 3.0), (2, 5.0), (2, 10.0)],
+  ("id", "v"))
+
+@pandas_udf("double", PandasUDFType.GROUPED_AGG)
+def mean_udf(v: pd.Series):
+  return v.mean()
+
+print(df.groupby("id").agg(mean_udf(df['v'])).collect())
+
+
+  if __name__ == "__main__":
+spark = SparkSession.builder.getOrCreate()
+main(spark)
+
+
+There are multiple ways to ship the dependencies to the cluster:
+
+- Using py-files
+- Using a zipped virtual environment
+- Using PEX
+- Using Docker
+
+
+**
+Using py-files
+**
+
+PySpark allows to upload python files to the executors by setting the 
configuration setting ``spark.submit.pyFiles`` or by directly calling `addPyFile
+<../reference/api/pyspark.SparkContext.addPyFile.rst>`_ on the SparkContext.
+
+This is an easy way to ship additional custom Python code to the cluster. You 
can just add individual files or zip whole packages and upload them. 
+Using `addPyFile <../reference/api/pyspark.SparkContext.addPyFile.rst>`_ 
allows to upload code even after having started your job.
+
+It doesn't allow to add packages built as `Wheels 
`_ and therefore doesn't allowing to 
include dependencies with native code.
+
+
+**
+Using a zipped virtual environment

Review comment:
   Yeah, it does.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on pull request #29817: [SPARK-32850][CORE][K8S] Simplify the RPC message flow of decommission

2020-09-23 Thread GitBox


Ngone51 commented on pull request #29817:
URL: https://github.com/apache/spark/pull/29817#issuecomment-698080541


   > That being said I still have concerns this PR is not sufficiently tested, 
can you add some more tests for the new flows you've introduced?
   
   There's only one new flow that is from Master to Worker. I can update the 
existing test by verifying Worker's decommission status... What kind of other 
concerns do you have? Could you elaborate more? So I can improve the PR 
accordingly.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #29828: [SPARK-32948][SQL] Optimize to_json and from_json expression chain

2020-09-23 Thread GitBox


viirya commented on a change in pull request #29828:
URL: https://github.com/apache/spark/pull/29828#discussion_r494015049



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/JsonSuite.scala
##
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.RuleExecutor
+import org.apache.spark.sql.types._
+
+class JsonSuite extends PlanTest with ExpressionEvalHelper {
+
+  object Optimizer extends RuleExecutor[LogicalPlan] {
+val batches = Batch("Json optimization", FixedPoint(10), 
OptimizeJsonExprs) :: Nil
+  }
+
+  val schema = StructType.fromDDL("a int, b int")
+
+  private val structAtt = 'struct.struct(schema).notNull
+
+  private val testRelation = LocalRelation(structAtt)
+
+  test("SPARK-32948: optimize from_json + to_json") {
+val options = Map.empty[String, String]
+
+val query1 = testRelation
+  .select(JsonToStructs(schema, options, StructsToJson(options, 
'struct)).as("struct"))
+val optimized1 = Optimizer.execute(query1.analyze)
+
+val expected = testRelation.select('struct.as("struct")).analyze
+comparePlans(optimized1, expected)
+
+val query2 = testRelation
+  .select(
+JsonToStructs(schema, options,
+  StructsToJson(options,
+JsonToStructs(schema, options,
+  StructsToJson(options, 'struct.as("struct"))
+val optimized2 = Optimizer.execute(query2.analyze)
+
+comparePlans(optimized2, expected)
+  }
+
+  test("SPARK-32948: not optimize from_json + to_json if schema is different") 
{
+val options = Map.empty[String, String]
+val schema = StructType.fromDDL("a int")
+
+val query = testRelation
+  .select(JsonToStructs(schema, options, StructsToJson(options, 
'struct)).as("struct"))
+val optimized = Optimizer.execute(query.analyze)
+
+val expected = testRelation.select(
+  JsonToStructs(schema, options, StructsToJson(options, 
'struct)).as("struct")).analyze
+comparePlans(optimized, expected)
+  }
+
+  test("SPARK-32948: not optimize from_json + to_json if option is not empty") 
{

Review comment:
   added.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29857: [SPARK-32972][ML] Fix UTs of `mllib` module in Scala 2.13 except RandomForestRegressorSuite

2020-09-23 Thread GitBox


AmplabJenkins commented on pull request #29857:
URL: https://github.com/apache/spark/pull/29857#issuecomment-698087732







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29857: [SPARK-32972][ML] Fix UTs of `mllib` module in Scala 2.13 except RandomForestRegressorSuite

2020-09-23 Thread GitBox


SparkQA removed a comment on pull request #29857:
URL: https://github.com/apache/spark/pull/29857#issuecomment-698070226


   **[Test build #129054 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129054/testReport)**
 for PR 29857 at commit 
[`b7f2f47`](https://github.com/apache/spark/commit/b7f2f47c6c22af15478b89201fb6c685f47d66bd).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29857: [SPARK-32972][ML] Fix UTs of `mllib` module in Scala 2.13 except RandomForestRegressorSuite

2020-09-23 Thread GitBox


AmplabJenkins removed a comment on pull request #29857:
URL: https://github.com/apache/spark/pull/29857#issuecomment-698087732







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #29857: [SPARK-32972][ML] Fix UTs of `mllib` module in Scala 2.13 except RandomForestRegressorSuite

2020-09-23 Thread GitBox


LuciferYang commented on pull request #29857:
URL: https://github.com/apache/spark/pull/29857#issuecomment-698100080


   cc @dongjoon-hyun https://github.com/apache/spark/pull/29861 fix GitHub 2.13 
build Action, related to k8s module, I will rebase this pr after it merged.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang opened a new pull request #29861: [SPARK-32971][K8S][FOLLOWUP] Fix k8s core module compile in Scala 2.13

2020-09-23 Thread GitBox


LuciferYang opened a new pull request #29861:
URL: https://github.com/apache/spark/pull/29861


   ### What changes were proposed in this pull request?
   Manual call `toSeq` of 
`MountVolumesFeatureStep.getAdditionalKubernetesResources` method because 
`ArrayBuffer` not a `Seq` in Scala 2.13
   
   ### Why are the changes needed?
   We need to support a Scala 2.13 build.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   
   - Scala 2.12: Pass the Jenkins or GitHub Action
   
   - Scala 2.13: Pass GitHub 2.13 Build Action
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] holdenk commented on pull request #29854: [WIP][SPARK-32937][SPARK-32980][K8S] Fix decom & launcher tests and add some comments to reduce chance of breakage

2020-09-23 Thread GitBox


holdenk commented on pull request #29854:
URL: https://github.com/apache/spark/pull/29854#issuecomment-697982902


   cc @Ngone51 @HyukjinKwon @ScrapCodes @dongjoon-hyun 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29854: [WIP][SPARK-32937][SPARK-32980][K8S] Fix decom & launcher tests and add some comments to reduce chance of breakage

2020-09-23 Thread GitBox


SparkQA removed a comment on pull request #29854:
URL: https://github.com/apache/spark/pull/29854#issuecomment-697883527


   **[Test build #129042 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129042/testReport)**
 for PR 29854 at commit 
[`64d6dd8`](https://github.com/apache/spark/commit/64d6dd8be689d59785f2205a6586f665dc8057c7).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29854: [SPARK-32937][SPARK-32980][K8S] Fix decom & launcher tests and add some comments to reduce chance of breakage

2020-09-23 Thread GitBox


dongjoon-hyun commented on pull request #29854:
URL: https://github.com/apache/spark/pull/29854#issuecomment-698008172


   Merged to master because it recovers K8s IT and unblocks K8s module.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29846: [SPARK-32971][K8S] Support dynamic PVC creation/deletion for K8s executors

2020-09-23 Thread GitBox


dongjoon-hyun commented on pull request #29846:
URL: https://github.com/apache/spark/pull/29846#issuecomment-698009764


   Thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] holdenk commented on pull request #29846: [SPARK-32971][K8S] Support dynamic PVC creation/deletion for K8s executors

2020-09-23 Thread GitBox


holdenk commented on pull request #29846:
URL: https://github.com/apache/spark/pull/29846#issuecomment-698009613


   Thanks for the clarity @dongjoon-hyun LGTM pending jenkins



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29533: [SPARK-24266][K8S][3.0] Restart the watcher when we receive a version changed from k8s

2020-09-23 Thread GitBox


SparkQA removed a comment on pull request #29533:
URL: https://github.com/apache/spark/pull/29533#issuecomment-698015366


   **[Test build #129051 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129051/testReport)**
 for PR 29533 at commit 
[`6449efa`](https://github.com/apache/spark/commit/6449efa72b2f7ff2aea53139520a04ef37b72f18).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29533: [SPARK-24266][K8S][3.0] Restart the watcher when we receive a version changed from k8s

2020-09-23 Thread GitBox


AmplabJenkins removed a comment on pull request #29533:
URL: https://github.com/apache/spark/pull/29533#issuecomment-698020667







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6   7   8   >