date:20200518

[GitHub] [spark] SparkQA commented on pull request #28511: [SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same as the table

2020-05-18 Thread GitBox



SparkQA commented on pull request #28511:
URL: https://github.com/apache/spark/pull/28511#issuecomment-630599286


   **[Test build #122830 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122830/testReport)**
 for PR 28511 at commit 
[`78e0972`](https://github.com/apache/spark/commit/78e097284096b27839928668a3deaf6a49cab336).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on a change in pull request #28511: [SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same

2020-05-18 Thread GitBox



yaooqinn commented on a change in pull request #28511:
URL: https://github.com/apache/spark/pull/28511#discussion_r427046165



##
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/execution/benchmark/InsertIntoHiveTableBenchmark.scala
##
@@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.hive.HiveUtils
+import org.apache.spark.sql.hive.test.TestHive
+
+/**
+ * Benchmark to measure hive table write performance.
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt: bin/spark-submit --class 
+ *--jars ,,
+ *--packages org.spark-project.hive:hive-exec:1.2.1.spark2
+ *
+ *   2. build/sbt "hive/test:runMain " -Phive-1.2 or
+ *  build/sbt "hive/test:runMain " -Phive-2.3
+ *   3. generate result:
+ *   SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "hive/test:runMain "
+ *  Results will be written to 
"benchmarks/InsertIntoHiveTableBenchmark-hive2.3-results.txt".
+ *   4. -Phive-1.2 does not work for JDK 11
+ * }}}
+ */
+object InsertIntoHiveTableBenchmark extends SqlBasedBenchmark {
+
+  override def getSparkSession: SparkSession = TestHive.sparkSession
+
+  val tempView = "temp"
+  val numRows = 1024 * 10
+  val sql = spark.sql _
+
+  // scalastyle:off hadoopconfiguration
+  private val hadoopConf = spark.sparkContext.hadoopConfiguration
+  // scalastyle:on hadoopconfiguration
+  hadoopConf.set("hive.exec.dynamic.partition", "true")
+  hadoopConf.set("hive.exec.dynamic.partition.mode", "nonstrict")
+  hadoopConf.set("hive.exec.max.dynamic.partitions", numRows.toString)
+
+  def withTable(tableNames: String*)(f: => Unit): Unit = {
+tableNames.foreach { name =>
+  sql(s"CREATE TABLE $name(a INT) STORED AS TEXTFILE PARTITIONED BY (b 
INT, c INT)")
+}
+try f finally {
+  tableNames.foreach { name =>
+spark.sql(s"DROP TABLE IF EXISTS $name")
+  }
+}
+  }
+
+  def insertOverwriteDynamic(table: String, benchmark: Benchmark): Unit = {
+benchmark.addCase("INSERT OVERWRITE DYNAMIC") { _ =>
+  sql(s"INSERT OVERWRITE TABLE $table SELECT CAST(id AS INT) AS a," +
+s" CAST(id % 10 AS INT) AS b, CAST(id % 100 AS INT) AS c FROM 
$tempView DISTRIBUTE BY a")

Review comment:
   the cause of the regression of hive is to copy files one by one, so I 
use `DISTRIBUTE BY` to create a certain number of files to feed it.
   
   BUT after rerun the test w/o  `DISTRIBUTE BY`, the result seems ok to 
illustrate the problem. I will rm that clause.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] jerryshao commented on pull request #28525: [SPARK-27562][Shuffle] Complete the verification mechanism for shuffle transmitted data

2020-05-18 Thread GitBox



jerryshao commented on pull request #28525:
URL: https://github.com/apache/spark/pull/28525#issuecomment-630595693


   I would suggest to describe the specs of shuffle index file somewhere in the 
code, and reduce the magical hard code numbers in everywhere.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on pull request #28574: [SPARK-31752][SQL][DOCS] Add sql doc for interval type

2020-05-18 Thread GitBox



yaooqinn commented on pull request #28574:
URL: https://github.com/apache/spark/pull/28574#issuecomment-630592393


   Currently, we don't have any page for builtin types yet, I don't know 
whether the current style looks good or not, cc @maropu @cloud-fan 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gatorsmile commented on pull request #28523: [SPARK-31706][SQL] add back the support of streaming update mode

2020-05-18 Thread GitBox



gatorsmile commented on pull request #28523:
URL: https://github.com/apache/spark/pull/28523#issuecomment-630589229


   This is the last blocker issue of Spark 3.0. Can we speed up the review?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gatorsmile commented on pull request #28523: [SPARK-31706][SQL] add back the support of streaming update mode

2020-05-18 Thread GitBox



gatorsmile commented on pull request #28523:
URL: https://github.com/apache/spark/pull/28523#issuecomment-63052


   ping @tdas @rdblue @zsxwing @jose-torres 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #28523: [SPARK-31706][SQL] add back the support of streaming update mode

2020-05-18 Thread GitBox



cloud-fan commented on pull request #28523:
URL: https://github.com/apache/spark/pull/28523#issuecomment-630589029


   @rdblue @tdas @jose-torres tests pass and it's ready for review



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28462: [SPARK-31651][CORE] Improve handling the case where different barrier sync types in a single sync

2020-05-18 Thread GitBox



AmplabJenkins removed a comment on pull request #28462:
URL: https://github.com/apache/spark/pull/28462#issuecomment-630585266







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28462: [SPARK-31651][CORE] Improve handling the case where different barrier sync types in a single sync

2020-05-18 Thread GitBox



AmplabJenkins commented on pull request #28462:
URL: https://github.com/apache/spark/pull/28462#issuecomment-630585266







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28462: [SPARK-31651][CORE] Improve handling the case where different barrier sync types in a single sync

2020-05-18 Thread GitBox



SparkQA removed a comment on pull request #28462:
URL: https://github.com/apache/spark/pull/28462#issuecomment-630501075


   **[Test build #122823 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122823/testReport)**
 for PR 28462 at commit 
[`8b192f1`](https://github.com/apache/spark/commit/8b192f12d765f9d94ebca73a4e7faa634bce8245).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28462: [SPARK-31651][CORE] Improve handling the case where different barrier sync types in a single sync

2020-05-18 Thread GitBox



SparkQA commented on pull request #28462:
URL: https://github.com/apache/spark/pull/28462#issuecomment-630584612


   **[Test build #122823 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122823/testReport)**
 for PR 28462 at commit 
[`8b192f1`](https://github.com/apache/spark/commit/8b192f12d765f9d94ebca73a4e7faa634bce8245).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #28568: [SPARK-31710][SQL]Add compatibility flag to cast long to timestamp

2020-05-18 Thread GitBox



cloud-fan commented on pull request #28568:
URL: https://github.com/apache/spark/pull/28568#issuecomment-630582721


   > we need same sql to run on hive/spark during migrating,if spark failed or 
behaviored less expected. so with a compatibility flag ,as you said, we can 
easily migrate them and no need to change user's sqls
   
   We did something similar before, with the pgsql dialect. This project was 
canceled, because it's too much effort to keep 2 systems having exactly the 
same behaviors. And people may keep adding other dialects, which could increase 
maintenance costs dramatically.
   
   Hive is a bit different as Spark already provides a lot of Hive 
compatibility. But still, it's not the right direction for Spark to provide 
100% compatibility with another system.
   
   For this particular case, I agree with @bart-samwel that we can fail by 
default for cast long to timestamp, and provide a legacy config to allow it 
with spark or hive behavior. This is a non-standard and weird behavior to allow 
cast long to timestamp, so for long-term we do want to forbid it, with clear 
error message to suggest using `TIMESTAMP_MILLIS` or `TIMESTAMP_MICROS` 
functions.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang commented on a change in pull request #28208: [SPARK-31440][SQL] Improve SQL Rest API

2020-05-18 Thread GitBox



gengliangwang commented on a change in pull request #28208:
URL: https://github.com/apache/spark/pull/28208#discussion_r427031360



##
File path: 
sql/core/src/main/scala/org/apache/spark/status/api/v1/sql/SqlResource.scala
##
@@ -21,46 +21,55 @@ import java.util.Date
 import javax.ws.rs._
 import javax.ws.rs.core.MediaType
 
+import scala.util.{Failure, Success, Try}
+
 import org.apache.spark.JobExecutionStatus
-import org.apache.spark.sql.execution.ui.{SQLAppStatusStore, 
SQLExecutionUIData, SQLPlanMetric}
+import org.apache.spark.sql.execution.ui.{SparkPlanGraph, 
SparkPlanGraphCluster, SparkPlanGraphNode, SQLAppStatusStore, 
SQLExecutionUIData}
 import org.apache.spark.status.api.v1.{BaseAppResource, NotFoundException}
 
 @Produces(Array(MediaType.APPLICATION_JSON))
 private[v1] class SqlResource extends BaseAppResource {
 
+  val WHOLE_STAGE_CODEGEN = "WholeStageCodegen"
+
   @GET
   def sqlList(
-  @DefaultValue("false") @QueryParam("details") details: Boolean,
+  @DefaultValue("true") @QueryParam("details") details: Boolean,
+  @DefaultValue("true") @QueryParam("planDescription") planDescription: 
Boolean,

Review comment:
   @erenavsarogullari  one last comment: why do we have extra option 
"planDescription" here. It seems reasonable to be covered by the option 
"details"





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #28511: [SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same

2020-05-18 Thread GitBox



cloud-fan commented on a change in pull request #28511:
URL: https://github.com/apache/spark/pull/28511#discussion_r427027992



##
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/execution/benchmark/InsertIntoHiveTableBenchmark.scala
##
@@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.hive.HiveUtils
+import org.apache.spark.sql.hive.test.TestHive
+
+/**
+ * Benchmark to measure hive table write performance.
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt: bin/spark-submit --class 
+ *--jars ,,
+ *--packages org.spark-project.hive:hive-exec:1.2.1.spark2
+ *
+ *   2. build/sbt "hive/test:runMain " -Phive-1.2 or
+ *  build/sbt "hive/test:runMain " -Phive-2.3
+ *   3. generate result:
+ *   SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "hive/test:runMain "
+ *  Results will be written to 
"benchmarks/InsertIntoHiveTableBenchmark-hive2.3-results.txt".
+ *   4. -Phive-1.2 does not work for JDK 11
+ * }}}
+ */
+object InsertIntoHiveTableBenchmark extends SqlBasedBenchmark {
+
+  override def getSparkSession: SparkSession = TestHive.sparkSession
+
+  val tempView = "temp"
+  val numRows = 1024 * 10
+  val sql = spark.sql _
+
+  // scalastyle:off hadoopconfiguration
+  private val hadoopConf = spark.sparkContext.hadoopConfiguration
+  // scalastyle:on hadoopconfiguration
+  hadoopConf.set("hive.exec.dynamic.partition", "true")
+  hadoopConf.set("hive.exec.dynamic.partition.mode", "nonstrict")
+  hadoopConf.set("hive.exec.max.dynamic.partitions", numRows.toString)
+
+  def withTable(tableNames: String*)(f: => Unit): Unit = {
+tableNames.foreach { name =>
+  sql(s"CREATE TABLE $name(a INT) STORED AS TEXTFILE PARTITIONED BY (b 
INT, c INT)")
+}
+try f finally {
+  tableNames.foreach { name =>
+spark.sql(s"DROP TABLE IF EXISTS $name")
+  }
+}
+  }
+
+  def insertOverwriteDynamic(table: String, benchmark: Benchmark): Unit = {
+benchmark.addCase("INSERT OVERWRITE DYNAMIC") { _ =>
+  sql(s"INSERT OVERWRITE TABLE $table SELECT CAST(id AS INT) AS a," +
+s" CAST(id % 10 AS INT) AS b, CAST(id % 100 AS INT) AS c FROM 
$tempView DISTRIBUTE BY a")

Review comment:
   why does the number of files matter? `DISTRIBUTE BY a` will add shuffle 
AFAIK, then the perf is not purely about table insertion and may be less 
accurate.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28362: [SPARK-31570][R][DOCS] R combine gapply dapply

2020-05-18 Thread GitBox



AmplabJenkins removed a comment on pull request #28362:
URL: https://github.com/apache/spark/pull/28362#issuecomment-630577457







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28362: [SPARK-31570][R][DOCS] R combine gapply dapply

2020-05-18 Thread GitBox



AmplabJenkins commented on pull request #28362:
URL: https://github.com/apache/spark/pull/28362#issuecomment-630577457







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28362: [SPARK-31570][R][DOCS] R combine gapply dapply

2020-05-18 Thread GitBox



SparkQA commented on pull request #28362:
URL: https://github.com/apache/spark/pull/28362#issuecomment-630577030


   **[Test build #122829 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122829/testReport)**
 for PR 28362 at commit 
[`6ad2085`](https://github.com/apache/spark/commit/6ad20856a3b242bf7f139eaee10a98ceabadf29f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #28362: [SPARK-31570][R][DOCS] R combine gapply dapply

2020-05-18 Thread GitBox



HyukjinKwon commented on pull request #28362:
URL: https://github.com/apache/spark/pull/28362#issuecomment-630575435


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor

2020-05-18 Thread GitBox



cloud-fan commented on a change in pull request #26624:
URL: https://github.com/apache/spark/pull/26624#discussion_r427025412



##
File path: docs/configuration.md
##
@@ -2955,6 +2955,12 @@ Spark uses [log4j](http://logging.apache.org/log4j/) for 
logging. You can config
 `log4j.properties` file in the `conf` directory. One way to start is to copy 
the existing
 `log4j.properties.template` located there.
 
+By default, Spark adds 1 record to the MDC: `taskName`, which shows something

Review comment:
   do we have the full name of `MDC`? maybe some users don't know very well 
about it and we should give them something that they can get the general 
meaning.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor

2020-05-18 Thread GitBox



cloud-fan commented on a change in pull request #26624:
URL: https://github.com/apache/spark/pull/26624#discussion_r427024955



##
File path: core/src/main/scala/org/apache/spark/executor/Executor.scala
##
@@ -395,6 +401,9 @@ private[spark] class Executor(
 }
 
 override def run(): Unit = {
+
+  setMDCForTask(taskDescription.name, mdcProperties)

Review comment:
   nit: taskDescription.name -> taskName





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor

2020-05-18 Thread GitBox



cloud-fan commented on a change in pull request #26624:
URL: https://github.com/apache/spark/pull/26624#discussion_r427024809



##
File path: core/src/main/scala/org/apache/spark/executor/Executor.scala
##
@@ -320,7 +321,12 @@ private[spark] class Executor(
 
 val taskId = taskDescription.taskId
 val threadName = s"Executor task launch worker for task $taskId"
-private val taskName = taskDescription.name
+val taskName = taskDescription.name
+val mdcProperties = taskDescription.properties.asScala
+  .filter(_._1.startsWith("mdc.")).map { item =>
+val key = item._1.substring(4)
+(key, item._2)
+  }.toMap

Review comment:
   we don't really need to look up the resulting map. Seems calling `toSeq` 
and keeping it as `Seq[(String, String)]` is good enough.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor

2020-05-18 Thread GitBox



cloud-fan commented on a change in pull request #26624:
URL: https://github.com/apache/spark/pull/26624#discussion_r427024406



##
File path: core/src/main/scala/org/apache/spark/util/ThreadUtils.scala
##
@@ -133,6 +133,14 @@ private[spark] object ThreadUtils {
 
Executors.newCachedThreadPool(threadFactory).asInstanceOf[ThreadPoolExecutor]
   }
 
+  /**
+   * Wrapper over newCachedThreadPool. Thread names are formatted as 
prefix-ID, where ID is a

Review comment:
   This doesn't really have a prefix in the name. Can we inline this 
method? It's only called once, and the change itself is not related to the MDC 
stuff





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor

2020-05-18 Thread GitBox



cloud-fan commented on a change in pull request #26624:
URL: https://github.com/apache/spark/pull/26624#discussion_r427024406



##
File path: core/src/main/scala/org/apache/spark/util/ThreadUtils.scala
##
@@ -133,6 +133,14 @@ private[spark] object ThreadUtils {
 
Executors.newCachedThreadPool(threadFactory).asInstanceOf[ThreadPoolExecutor]
   }
 
+  /**
+   * Wrapper over newCachedThreadPool. Thread names are formatted as 
prefix-ID, where ID is a

Review comment:
   This doesn't really have a prefix in the name. Can we inline this 
method? It's only called once.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28575: [SPARK-31705][SQL] Push predicate through join by rewriting join condition to conjunctive normal form

2020-05-18 Thread GitBox



AmplabJenkins removed a comment on pull request #28575:
URL: https://github.com/apache/spark/pull/28575#issuecomment-630571645







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #28572: [SPARK-31750][SQL] Eliminate UpCast if child's dataType is DecimalType

2020-05-18 Thread GitBox



cloud-fan commented on a change in pull request #28572:
URL: https://github.com/apache/spark/pull/28572#discussion_r427022894



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -3071,15 +3071,31 @@ class Analyzer(
   case p => p transformExpressions {
 case u @ UpCast(child, _, _) if !child.resolved => u
 
-case UpCast(child, dt: AtomicType, _)
+case u @ UpCast(child, _, _)
 if SQLConf.get.getConf(SQLConf.LEGACY_LOOSE_UPCAST) &&
+  u.dataType.isInstanceOf[AtomicType] &&
   child.dataType == StringType =>
-  Cast(child, dt.asNullable)
-
-case UpCast(child, dataType, walkedTypePath) if 
!Cast.canUpCast(child.dataType, dataType) =>
-  fail(child, dataType, walkedTypePath)
-
-case UpCast(child, dataType, _) => Cast(child, dataType.asNullable)
+  Cast(child, u.dataType.asNullable)
+
+case u @ UpCast(child, _, walkedTypePath)
+  if child.dataType.isInstanceOf[DecimalType]
+&& u.dataType.isInstanceOf[DecimalType]
+&& walkedTypePath.nonEmpty =>
+  // SPARK-31750: there are two cases: 1. for local BigDecimal 
collection,
+  // e.g. Seq(BigDecimal(12.34), BigDecimal(1)).toDF("a"), we will have
+  // UpCast(child, Decimal(38, 18)) where child's data type is always 
Decimal(38, 18).
+  // 2. for other cases where data type is explicitly known, e.g, 
spark.read
+  // .parquet("/tmp/file").as[BigDecimal]. We will have UpCast(child, 
Decimal(38, 18)),
+  // where child's data type can be, e.g. Decimal(38, 0). In this 
case, we actually
+  // should not do cast otherwise there will be precision lost.
+  // Thus, we eliminate the UpCast here to avoid precision lost for 
case 2 and do
+  // no hurt for case 1.

Review comment:
   We can simplify the comment to only mention case 2





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28575: [SPARK-31705][SQL] Push predicate through join by rewriting join condition to conjunctive normal form

2020-05-18 Thread GitBox



AmplabJenkins commented on pull request #28575:
URL: https://github.com/apache/spark/pull/28575#issuecomment-630571645







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #28572: [SPARK-31750][SQL] Eliminate UpCast if child's dataType is DecimalType

2020-05-18 Thread GitBox



cloud-fan commented on a change in pull request #28572:
URL: https://github.com/apache/spark/pull/28572#discussion_r427022612



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -3071,15 +3071,31 @@ class Analyzer(
   case p => p transformExpressions {
 case u @ UpCast(child, _, _) if !child.resolved => u
 
-case UpCast(child, dt: AtomicType, _)
+case u @ UpCast(child, _, _)
 if SQLConf.get.getConf(SQLConf.LEGACY_LOOSE_UPCAST) &&
+  u.dataType.isInstanceOf[AtomicType] &&
   child.dataType == StringType =>
-  Cast(child, dt.asNullable)
-
-case UpCast(child, dataType, walkedTypePath) if 
!Cast.canUpCast(child.dataType, dataType) =>
-  fail(child, dataType, walkedTypePath)
-
-case UpCast(child, dataType, _) => Cast(child, dataType.asNullable)
+  Cast(child, u.dataType.asNullable)
+
+case u @ UpCast(child, _, walkedTypePath)
+  if child.dataType.isInstanceOf[DecimalType]
+&& u.dataType.isInstanceOf[DecimalType]
+&& walkedTypePath.nonEmpty =>

Review comment:
   why do we need to check `walkedTypePath`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28575: [SPARK-31705][SQL] Push predicate through join by rewriting join condition to conjunctive normal form

2020-05-18 Thread GitBox



SparkQA commented on pull request #28575:
URL: https://github.com/apache/spark/pull/28575#issuecomment-630571286


   **[Test build #122828 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122828/testReport)**
 for PR 28575 at commit 
[`21fb7c5`](https://github.com/apache/spark/commit/21fb7c5844f43ec0b9190ac6e823aba2854ba2bd).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #28572: [SPARK-31750][SQL] Eliminate UpCast if child's dataType is DecimalType

2020-05-18 Thread GitBox



cloud-fan commented on a change in pull request #28572:
URL: https://github.com/apache/spark/pull/28572#discussion_r427022539



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -3071,15 +3071,31 @@ class Analyzer(
   case p => p transformExpressions {
 case u @ UpCast(child, _, _) if !child.resolved => u
 
-case UpCast(child, dt: AtomicType, _)
+case u @ UpCast(child, _, _)
 if SQLConf.get.getConf(SQLConf.LEGACY_LOOSE_UPCAST) &&
+  u.dataType.isInstanceOf[AtomicType] &&
   child.dataType == StringType =>
-  Cast(child, dt.asNullable)
-
-case UpCast(child, dataType, walkedTypePath) if 
!Cast.canUpCast(child.dataType, dataType) =>
-  fail(child, dataType, walkedTypePath)
-
-case UpCast(child, dataType, _) => Cast(child, dataType.asNullable)
+  Cast(child, u.dataType.asNullable)
+
+case u @ UpCast(child, _, walkedTypePath)
+  if child.dataType.isInstanceOf[DecimalType]
+&& u.dataType.isInstanceOf[DecimalType]

Review comment:
   nit: `u.targetType == DecimalType`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wangyum opened a new pull request #28575: [SPARK-31705][SQL] Push predicate through join by rewriting join condition to conjunctive normal form

2020-05-18 Thread GitBox



wangyum opened a new pull request #28575:
URL: https://github.com/apache/spark/pull/28575


   ### What changes were proposed in this pull request?
   
   This PR add a new rule to support push predicate through join by rewriting 
join condition to conjunctive normal form. For example:
   ```sql
   create table x as select * from (values (1, 2, 3, 4)) as x(a, b, c, d);
   create table y as select * from (values (1, 2, 3, 4)) as x(a, b, c, d);
   ```
   
   ```sql
   postgres=# explain select x.* from x, y where x.b = y.b and ((x.a > 3 and 
y.a > 13) or (x.a > 1));
   QUERY PLAN
   --
Merge Join  (cost=218.07..437.16 rows=6973 width=16)
  Merge Cond: (x.b = y.b)
  Join Filter: (((x.a > 3) AND (y.a > 13)) OR (x.a > 1))
  ->  Sort  (cost=89.18..91.75 rows=1028 width=16)
Sort Key: x.b
->  Seq Scan on x  (cost=0.00..37.75 rows=1028 width=16)
  Filter: ((a > 3) OR (a > 1))
  ->  Sort  (cost=128.89..133.52 rows=1850 width=8)
Sort Key: y.b
->  Seq Scan on y  (cost=0.00..28.50 rows=1850 width=8)
   (10 rows)
   ```
   `x.b = y.b and ((x.a > 3 and y.a > 13) or (x.a > 1))` can be rewritten to 
`x.b = y.b and (((x.a > 3) or (x.a > 1)) and ((y.a > 13) or (x.a > 1`,  and 
then we can push down `((x.a > 3) or (x.a > 1))`.
   
   
   
   
   ### Why are the changes needed?
   Improve query performance.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   
   ### How was this patch tested?
   Unit test and benchmark test.
   
   
   SQL | Before this PR | After this PR
   --- | --- | ---
   TPCDS 5T q85 | 66s | 34s
   TPCH 1T q7 | 37s | 32s
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #28572: [SPARK-31750][SQL] Eliminate UpCast if child's dataType is DecimalType

2020-05-18 Thread GitBox



cloud-fan commented on a change in pull request #28572:
URL: https://github.com/apache/spark/pull/28572#discussion_r427022169



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/DeserializerBuildHelper.scala
##
@@ -161,6 +161,7 @@ object DeserializerBuildHelper {
 case _: StructType => expr
 case _: ArrayType => expr
 case _: MapType => expr
+case _: DecimalType => UpCast(expr, DecimalType, walkedTypePath.getPaths)

Review comment:
   let's add a comment:
   ```
   // For Scala/Java `BigDecimal`, we accept decimal types of any valid 
precision/scale. Here we use
   // the `DecimalType` object to indicate it.
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28488: [SPARK-29083][CORE] Prefetch elements in rdd.toLocalIterator

2020-05-18 Thread GitBox



AmplabJenkins removed a comment on pull request #28488:
URL: https://github.com/apache/spark/pull/28488#issuecomment-630564778


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122822/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28488: [SPARK-29083][CORE] Prefetch elements in rdd.toLocalIterator

2020-05-18 Thread GitBox



AmplabJenkins removed a comment on pull request #28488:
URL: https://github.com/apache/spark/pull/28488#issuecomment-630564771


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28488: [SPARK-29083][CORE] Prefetch elements in rdd.toLocalIterator

2020-05-18 Thread GitBox



AmplabJenkins commented on pull request #28488:
URL: https://github.com/apache/spark/pull/28488#issuecomment-630564771







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] jerryshao commented on a change in pull request #28525: [SPARK-27562][Shuffle] Complete the verification mechanism for shuffle transmitted data

2020-05-18 Thread GitBox



jerryshao commented on a change in pull request #28525:
URL: https://github.com/apache/spark/pull/28525#discussion_r427016625



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ShuffleIndexInformation.java
##
@@ -31,17 +34,44 @@
 public class ShuffleIndexInformation {
   /** offsets as long buffer */
   private final LongBuffer offsets;
+  private final boolean hasDigest;
+  /** digests as long buffer */
+  private final LongBuffer digests;
   private int size;
 
   public ShuffleIndexInformation(File indexFile) throws IOException {
+ByteBuffer offsetsBuffer, digestsBuffer;
 size = (int)indexFile.length();
-ByteBuffer buffer = ByteBuffer.allocate(size);
-offsets = buffer.asLongBuffer();
+int offsetsSize, digestsSize;
+if (size % 8 == 0) {
+  hasDigest = false;
+  offsetsSize = size;
+  digestsSize = 0;
+} else {
+  hasDigest = true;
+  offsetsSize = ((size - 8 - 1) / (8 + DigestUtils.getDigestLength()) + 1) 
* 8;

Review comment:
   Can you explain a bit about this line?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28488: [SPARK-29083][CORE] Prefetch elements in rdd.toLocalIterator

2020-05-18 Thread GitBox



SparkQA removed a comment on pull request #28488:
URL: https://github.com/apache/spark/pull/28488#issuecomment-630501057


   **[Test build #122822 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122822/testReport)**
 for PR 28488 at commit 
[`6c3856d`](https://github.com/apache/spark/commit/6c3856d5d82de3f98cd0dd93e951b8355e93fe7f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] jerryshao commented on a change in pull request #28525: [SPARK-27562][Shuffle] Complete the verification mechanism for shuffle transmitted data

2020-05-18 Thread GitBox



jerryshao commented on a change in pull request #28525:
URL: https://github.com/apache/spark/pull/28525#discussion_r427016499



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ShuffleIndexInformation.java
##
@@ -31,17 +34,44 @@
 public class ShuffleIndexInformation {
   /** offsets as long buffer */
   private final LongBuffer offsets;
+  private final boolean hasDigest;
+  /** digests as long buffer */
+  private final LongBuffer digests;
   private int size;
 
   public ShuffleIndexInformation(File indexFile) throws IOException {
+ByteBuffer offsetsBuffer, digestsBuffer;
 size = (int)indexFile.length();
-ByteBuffer buffer = ByteBuffer.allocate(size);
-offsets = buffer.asLongBuffer();
+int offsetsSize, digestsSize;
+if (size % 8 == 0) {

Review comment:
   Would you please explain more about why using `size % 8` to check 
whether the index file has digest or not?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28488: [SPARK-29083][CORE] Prefetch elements in rdd.toLocalIterator

2020-05-18 Thread GitBox



SparkQA commented on pull request #28488:
URL: https://github.com/apache/spark/pull/28488#issuecomment-630564401


   **[Test build #122822 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122822/testReport)**
 for PR 28488 at commit 
[`6c3856d`](https://github.com/apache/spark/commit/6c3856d5d82de3f98cd0dd93e951b8355e93fe7f).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on pull request #28106: [SPARK-31335][SQL] Add try function support

2020-05-18 Thread GitBox



yaooqinn commented on pull request #28106:
URL: https://github.com/apache/spark/pull/28106#issuecomment-630564036


   @bart-samwel  thanks for your explanations. 
[this](https://github.com/google/zetasql/blob/79adcd0fe227173e68ed7aa88f580a691ebe82c2/docs/functions-reference.md#safe-prefix)
 seems to have much lower cognitive cost for the end-users and maintenance cost 
for the community



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #28571: Add compatibility flag to cast long to timestamp

2020-05-18 Thread GitBox



HyukjinKwon commented on pull request #28571:
URL: https://github.com/apache/spark/pull/28571#issuecomment-630562207


   @GuoPhilipse, it's a matter of coherence in Apache Spark instead of fixing 
this one case alone. Also, you should open a PR against the master branch if 
you want to change something newly.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #28571: Add compatibility flag to cast long to timestamp

2020-05-18 Thread GitBox



HyukjinKwon closed pull request #28571:
URL: https://github.com/apache/spark/pull/28571


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] jerryshao commented on a change in pull request #28525: [SPARK-27562][Shuffle] Complete the verification mechanism for shuffle transmitted data

2020-05-18 Thread GitBox



jerryshao commented on a change in pull request #28525:
URL: https://github.com/apache/spark/pull/28525#discussion_r427013351



##
File path: 
common/network-common/src/main/java/org/apache/spark/network/util/DigestUtils.java
##
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.util;
+
+import java.io.*;
+import java.nio.ByteBuffer;
+import java.nio.MappedByteBuffer;
+import java.nio.channels.FileChannel;
+import java.util.zip.CRC32;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class DigestUtils {
+private static final Logger LOG = 
LoggerFactory.getLogger(DigestUtils.class);
+private static final int STREAM_BUFFER_LENGTH = 8192;
+private static final int DIGEST_LENGTH = 8;
+
+public static int getDigestLength() {
+   return DIGEST_LENGTH;
+}
+
+public static long getDigest(InputStream data) throws IOException {
+CRC32 crc32 = new CRC32();
+byte[] buffer = new byte[STREAM_BUFFER_LENGTH];
+int len;
+while ((len = data.read(buffer)) >= 0) {
+crc32.update(buffer, 0, len);
+}
+return crc32.getValue();
+}
+
+public static long getDigest(File file, long offset, long length) {
+if (length <= 0) {
+return -1L;
+}
+try (RandomAccessFile rf = new RandomAccessFile(file, "r")) {
+MappedByteBuffer data = 
rf.getChannel().map(FileChannel.MapMode.READ_ONLY, offset,
+  length);
+CRC32 crc32 = new CRC32();
+byte[] buffer = new byte[STREAM_BUFFER_LENGTH];
+int len;
+while ((len = Math.min(STREAM_BUFFER_LENGTH, data.remaining())) > 
0) {
+data.get(buffer, 0, len);
+crc32.update(buffer, 0, len);
+}
+return crc32.getValue();
+} catch (IOException e) {
+LOG.error(String.format("Exception while computing digest for file 
segment: " +
+  "%s(offset:%d, length:%d)", file.getName(), offset, length ));
+return -1L;
+}
+}
+}

Review comment:
   I think the indention here is not correct. In Spark we typically use 2 
space idention.

##
File path: 
common/network-common/src/main/java/org/apache/spark/network/util/DigestUtils.java
##
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.util;
+
+import java.io.*;
+import java.nio.ByteBuffer;
+import java.nio.MappedByteBuffer;
+import java.nio.channels.FileChannel;
+import java.util.zip.CRC32;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class DigestUtils {
+private static final Logger LOG = 
LoggerFactory.getLogger(DigestUtils.class);
+private static final int STREAM_BUFFER_LENGTH = 8192;
+private static final int DIGEST_LENGTH = 8;
+
+public static int getDigestLength() {
+   return DIGEST_LENGTH;
+}
+
+public static long getDigest(InputStream data) throws IOException {
+CRC32 crc32 = new CRC32();
+byte[] buffer = new byte[STREAM_BUFFER_LENGTH];
+int len;
+while ((len = data.read(buffer)) >= 0) {
+crc32.update(buffer, 0, len);
+}
+return crc32.getValue();
+}
+
+public static long getDigest(File file, long offset, long length) {
+if (length <= 0) {
+return -1L;
+

[GitHub] [spark] GuoPhilipse commented on pull request #28571: Add compatibility flag to cast long to timestamp

2020-05-18 Thread GitBox



GuoPhilipse commented on pull request #28571:
URL: https://github.com/apache/spark/pull/28571#issuecomment-630558198


   
   
   
   
   
   
   hi HyukjinKwon
   it will take effect only when converting long to timestamp, other cases 
currently seems no need to  change. we want to keep compatibility back with 
hive sql,
   Do you have any cases?  we can discuss more.
   
   
   Thanls.
   
   
   
   
   
   
   
   
   
   
   At 2020-05-19 10:44:13, "Hyukjin Kwon"  wrote:
   
   @HyukjinKwon commented on this pull request.
   
   In 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala:
   
   >// converting seconds to us
   -  private[this] def longToTimestamp(t: Long): Long = t * 100L
   +  private[this] def longToTimestamp(t: Long): Long = {
   +if (SQLConf.get.getConf(SQLConf.LONG_TIMESTAMP_CONVERSION_IN_SECONDS)) 
t * 100L
   
   
   Let's don't do this. It will require to add the compatibility across all the 
date time functionality in Spark, e.g.) from_unixtime.
   
   —
   You are receiving this because you were mentioned.
   Reply to this email directly, view it on GitHub, or unsubscribe.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28574: [SPARK-31752][SQL][DOCS] Add sql doc for interval type

2020-05-18 Thread GitBox



AmplabJenkins removed a comment on pull request #28574:
URL: https://github.com/apache/spark/pull/28574#issuecomment-630556941







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28574: [SPARK-31752][SQL][DOCS] Add sql doc for interval type

2020-05-18 Thread GitBox



SparkQA removed a comment on pull request #28574:
URL: https://github.com/apache/spark/pull/28574#issuecomment-630553566


   **[Test build #122826 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122826/testReport)**
 for PR 28574 at commit 
[`ab56a3b`](https://github.com/apache/spark/commit/ab56a3b3911adee8de8dc7b0dd7e44b29e339e1f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28574: [SPARK-31752][SQL][DOCS] Add sql doc for interval type

2020-05-18 Thread GitBox



SparkQA commented on pull request #28574:
URL: https://github.com/apache/spark/pull/28574#issuecomment-630556817


   **[Test build #122826 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122826/testReport)**
 for PR 28574 at commit 
[`ab56a3b`](https://github.com/apache/spark/commit/ab56a3b3911adee8de8dc7b0dd7e44b29e339e1f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28574: [SPARK-31752][SQL][DOCS] Add sql doc for interval type

2020-05-18 Thread GitBox



AmplabJenkins commented on pull request #28574:
URL: https://github.com/apache/spark/pull/28574#issuecomment-630556941







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28527: [SPARK-31709][SQL] Proper base path for database/table location when it is a relative path

2020-05-18 Thread GitBox



AmplabJenkins removed a comment on pull request #28527:
URL: https://github.com/apache/spark/pull/28527#issuecomment-63060







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28527: [SPARK-31709][SQL] Proper base path for database/table location when it is a relative path

2020-05-18 Thread GitBox



AmplabJenkins commented on pull request #28527:
URL: https://github.com/apache/spark/pull/28527#issuecomment-63060







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28527: [SPARK-31709][SQL] Proper base path for database/table location when it is a relative path

2020-05-18 Thread GitBox



SparkQA commented on pull request #28527:
URL: https://github.com/apache/spark/pull/28527#issuecomment-630555292


   **[Test build #122827 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122827/testReport)**
 for PR 28527 at commit 
[`3fbe65a`](https://github.com/apache/spark/commit/3fbe65a0c3f1e59e5af1f5d3f3b7beb13c0636f6).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28574: [SPARK-31752][SQL][DOCS] Add sql doc for interval type

2020-05-18 Thread GitBox



AmplabJenkins removed a comment on pull request #28574:
URL: https://github.com/apache/spark/pull/28574#issuecomment-630553799







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28574: [SPARK-31752][SQL][DOCS] Add sql doc for interval type

2020-05-18 Thread GitBox



AmplabJenkins commented on pull request #28574:
URL: https://github.com/apache/spark/pull/28574#issuecomment-630553799







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28574: [SPARK-31752][SQL][DOCS] Add sql doc for interval type

2020-05-18 Thread GitBox



SparkQA commented on pull request #28574:
URL: https://github.com/apache/spark/pull/28574#issuecomment-630553566


   **[Test build #122826 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122826/testReport)**
 for PR 28574 at commit 
[`ab56a3b`](https://github.com/apache/spark/commit/ab56a3b3911adee8de8dc7b0dd7e44b29e339e1f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn opened a new pull request #28574: [SPARK-31752][SQL][DOCS] Add sql doc for interval type

2020-05-18 Thread GitBox



yaooqinn opened a new pull request #28574:
URL: https://github.com/apache/spark/pull/28574


   
   
   ### What changes were proposed in this pull request?
   
   This PR adds SQL doc for interval datatype to describe its keyword, 
definition, and related operators and functions.
   
   
   
   ### Why are the changes needed?
   Provide a detailed SQL doc for interval type
   
   ### Does this PR introduce _any_ user-facing change?
   
   yes, new doc added 
   
   ### How was this patch tested?
   
   Passing doc generation and locally verified.
   
   
![image](https://user-images.githubusercontent.com/8326978/82280400-64606c00-99c1-11ea-8ecd-56f0e098d097.png)
   
   
![image](https://user-images.githubusercontent.com/8326978/82280450-7b9f5980-99c1-11ea-97b6-af30cf893e20.png)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on pull request #28290: [SPARK-29458][SQL][DOCS] Add a paragraph for scalar function in sql getting started

2020-05-18 Thread GitBox



maropu commented on pull request #28290:
URL: https://github.com/apache/spark/pull/28290#issuecomment-630551973


   note: I've filed jira so that we wouldn't forget to add them in the 
documents: https://issues.apache.org/jira/browse/SPARK-31753



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] rednaxelafx commented on a change in pull request #28463: [SPARK-31399][CORE][test-hadoop3.2][test-java11] Support indylambda Scala closure in ClosureCleaner

2020-05-18 Thread GitBox



rednaxelafx commented on a change in pull request #28463:
URL: https://github.com/apache/spark/pull/28463#discussion_r426999033



##
File path: core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala
##
@@ -414,6 +434,312 @@ private[spark] object ClosureCleaner extends Logging {
   }
 }
 
+private[spark] object IndylambdaScalaClosures extends Logging {
+  // internal name of java.lang.invoke.LambdaMetafactory
+  val LambdaMetafactoryClassName = "java/lang/invoke/LambdaMetafactory"
+  // the method that Scala indylambda use for bootstrap method
+  val LambdaMetafactoryMethodName = "altMetafactory"
+  val LambdaMetafactoryMethodDesc = "(Ljava/lang/invoke/MethodHandles$Lookup;" 
+
+"Ljava/lang/String;Ljava/lang/invoke/MethodType;[Ljava/lang/Object;)" +
+"Ljava/lang/invoke/CallSite;"
+
+  /**
+   * Check if the given reference is a indylambda style Scala closure.
+   * If so (e.g. for Scala 2.12+ closures), return a non-empty serialization 
proxy
+   * (SerializedLambda) of the closure;
+   * otherwise (e.g. for Scala 2.11 closures) return None.
+   *
+   * @param maybeClosure the closure to check.
+   */
+  def getSerializationProxy(maybeClosure: AnyRef): Option[SerializedLambda] = {
+def isClosureCandidate(cls: Class[_]): Boolean = {
+  // TODO: maybe lift this restriction to support other functional 
interfaces in the future
+  val implementedInterfaces = ClassUtils.getAllInterfaces(cls).asScala
+  implementedInterfaces.exists(_.getName.startsWith("scala.Function"))
+}
+
+maybeClosure.getClass match {
+  // shortcut the fast check:
+  // 1. indylambda closure classes are generated by Java's 
LambdaMetafactory, and they're
+  //always synthetic.
+  // 2. We only care about Serializable closures, so let's check that as 
well
+  case c if !c.isSynthetic || !maybeClosure.isInstanceOf[Serializable] => 
None
+
+  case c if isClosureCandidate(c) =>
+try {
+  Option(inspect(maybeClosure)).filter(isIndylambdaScalaClosure)
+} catch {
+  case e: Exception =>
+logDebug("The given reference is not an indylambda Scala 
closure.", e)
+None
+}
+
+  case _ => None
+}
+  }
+
+  def isIndylambdaScalaClosure(lambdaProxy: SerializedLambda): Boolean = {
+lambdaProxy.getImplMethodKind == MethodHandleInfo.REF_invokeStatic &&
+  lambdaProxy.getImplMethodName.contains("$anonfun$")
+  }
+
+  def inspect(closure: AnyRef): SerializedLambda = {
+val writeReplace = closure.getClass.getDeclaredMethod("writeReplace")
+writeReplace.setAccessible(true)
+writeReplace.invoke(closure).asInstanceOf[SerializedLambda]
+  }
+
+  /**
+   * Check if the handle represents the LambdaMetafactory that indylambda 
Scala closures
+   * use for creating the lambda class and getting a closure instance.
+   */
+  def isLambdaMetafactory(bsmHandle: Handle): Boolean = {
+bsmHandle.getOwner == LambdaMetafactoryClassName &&
+  bsmHandle.getName == LambdaMetafactoryMethodName &&
+  bsmHandle.getDesc == LambdaMetafactoryMethodDesc
+  }
+
+  /**
+   * Check if the handle represents a target method that is:
+   * - a STATIC method that implements a Scala lambda body in the indylambda 
style
+   * - captures the enclosing `this`, i.e. the first argument is a reference 
to the same type as
+   *   the owning class.
+   * Returns true if both criteria above are met.
+   */
+  def isLambdaBodyCapturingOuter(handle: Handle, ownerInternalName: String): 
Boolean = {
+handle.getTag == H_INVOKESTATIC &&
+  handle.getName.contains("$anonfun$") &&
+  handle.getOwner == ownerInternalName &&
+  handle.getDesc.startsWith(s"(L$ownerInternalName;")
+  }
+
+  /**
+   * Check if the callee of a call site is a inner class constructor.
+   * - A constructor has to be invoked via INVOKESPECIAL
+   * - A constructor's internal name is "init" and the return type is 
"V" (void)
+   * - An inner class' first argument in the signature has to be a reference 
to the
+   *   enclosing "this", aka `$outer` in Scala.
+   */
+  def isInnerClassCtorCapturingOuter(
+  op: Int, owner: String, name: String, desc: String, callerInternalName: 
String): Boolean = {

Review comment:
   Good point! Thanks for the suggestion, I'll remove the unused parameter 
when I gather enough changes for a follow-up change.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] rednaxelafx commented on a change in pull request #28463: [SPARK-31399][CORE][test-hadoop3.2][test-java11] Support indylambda Scala closure in ClosureCleaner

2020-05-18 Thread GitBox



rednaxelafx commented on a change in pull request #28463:
URL: https://github.com/apache/spark/pull/28463#discussion_r426998824



##
File path: core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala
##
@@ -414,6 +434,312 @@ private[spark] object ClosureCleaner extends Logging {
   }
 }
 
+private[spark] object IndylambdaScalaClosures extends Logging {
+  // internal name of java.lang.invoke.LambdaMetafactory
+  val LambdaMetafactoryClassName = "java/lang/invoke/LambdaMetafactory"
+  // the method that Scala indylambda use for bootstrap method
+  val LambdaMetafactoryMethodName = "altMetafactory"
+  val LambdaMetafactoryMethodDesc = "(Ljava/lang/invoke/MethodHandles$Lookup;" 
+
+"Ljava/lang/String;Ljava/lang/invoke/MethodType;[Ljava/lang/Object;)" +
+"Ljava/lang/invoke/CallSite;"
+
+  /**
+   * Check if the given reference is a indylambda style Scala closure.
+   * If so (e.g. for Scala 2.12+ closures), return a non-empty serialization 
proxy
+   * (SerializedLambda) of the closure;
+   * otherwise (e.g. for Scala 2.11 closures) return None.
+   *
+   * @param maybeClosure the closure to check.
+   */
+  def getSerializationProxy(maybeClosure: AnyRef): Option[SerializedLambda] = {
+def isClosureCandidate(cls: Class[_]): Boolean = {
+  // TODO: maybe lift this restriction to support other functional 
interfaces in the future
+  val implementedInterfaces = ClassUtils.getAllInterfaces(cls).asScala
+  implementedInterfaces.exists(_.getName.startsWith("scala.Function"))
+}
+
+maybeClosure.getClass match {
+  // shortcut the fast check:
+  // 1. indylambda closure classes are generated by Java's 
LambdaMetafactory, and they're
+  //always synthetic.
+  // 2. We only care about Serializable closures, so let's check that as 
well
+  case c if !c.isSynthetic || !maybeClosure.isInstanceOf[Serializable] => 
None
+
+  case c if isClosureCandidate(c) =>
+try {
+  Option(inspect(maybeClosure)).filter(isIndylambdaScalaClosure)
+} catch {
+  case e: Exception =>
+logDebug("The given reference is not an indylambda Scala 
closure.", e)
+None
+}
+
+  case _ => None
+}
+  }
+
+  def isIndylambdaScalaClosure(lambdaProxy: SerializedLambda): Boolean = {
+lambdaProxy.getImplMethodKind == MethodHandleInfo.REF_invokeStatic &&
+  lambdaProxy.getImplMethodName.contains("$anonfun$")
+  }
+
+  def inspect(closure: AnyRef): SerializedLambda = {
+val writeReplace = closure.getClass.getDeclaredMethod("writeReplace")
+writeReplace.setAccessible(true)
+writeReplace.invoke(closure).asInstanceOf[SerializedLambda]
+  }
+
+  /**
+   * Check if the handle represents the LambdaMetafactory that indylambda 
Scala closures
+   * use for creating the lambda class and getting a closure instance.
+   */
+  def isLambdaMetafactory(bsmHandle: Handle): Boolean = {
+bsmHandle.getOwner == LambdaMetafactoryClassName &&
+  bsmHandle.getName == LambdaMetafactoryMethodName &&
+  bsmHandle.getDesc == LambdaMetafactoryMethodDesc
+  }
+
+  /**
+   * Check if the handle represents a target method that is:
+   * - a STATIC method that implements a Scala lambda body in the indylambda 
style
+   * - captures the enclosing `this`, i.e. the first argument is a reference 
to the same type as
+   *   the owning class.
+   * Returns true if both criteria above are met.
+   */
+  def isLambdaBodyCapturingOuter(handle: Handle, ownerInternalName: String): 
Boolean = {
+handle.getTag == H_INVOKESTATIC &&
+  handle.getName.contains("$anonfun$") &&
+  handle.getOwner == ownerInternalName &&
+  handle.getDesc.startsWith(s"(L$ownerInternalName;")
+  }
+
+  /**
+   * Check if the callee of a call site is a inner class constructor.
+   * - A constructor has to be invoked via INVOKESPECIAL
+   * - A constructor's internal name is "init" and the return type is 
"V" (void)
+   * - An inner class' first argument in the signature has to be a reference 
to the
+   *   enclosing "this", aka `$outer` in Scala.
+   */
+  def isInnerClassCtorCapturingOuter(
+  op: Int, owner: String, name: String, desc: String, callerInternalName: 
String): Boolean = {
+op == INVOKESPECIAL && name == "" && 
desc.startsWith(s"(L$callerInternalName;")
+  }
+
+  /**
+   * Scans an indylambda Scala closure, along with its lexically nested 
closures, and populate
+   * the accessed fields info on which fields on the outer object are accessed.
+   *
+   * This is equivalent to getInnerClosureClasses() + InnerClosureFinder + 
FieldAccessFinder fused
+   * into one for processing indylambda closures. The traversal order along 
the call graph is the
+   * same for all three combined, so they can be fused together easily while 
maintaining the same
+   * ordering as the existing implementation.
+   *
+   * Precondition: this function expects the `accessedFields` to be populated 
with all known
+   *

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28571: Add compatibility flag to cast long to timestamp

2020-05-18 Thread GitBox



HyukjinKwon commented on a change in pull request #28571:
URL: https://github.com/apache/spark/pull/28571#discussion_r426998182



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
##
@@ -390,8 +390,12 @@ case class Cast(child: Expression, dataType: DataType, 
timeZoneId: Option[String
 if (d.isNaN || d.isInfinite) null else (d * 100L).toLong
   }
 
+  // SPARK-31710.Add compatibility flag to cast long to timestamp,
   // converting seconds to us
-  private[this] def longToTimestamp(t: Long): Long = t * 100L
+  private[this] def longToTimestamp(t: Long): Long = {
+if (SQLConf.get.getConf(SQLConf.LONG_TIMESTAMP_CONVERSION_IN_SECONDS)) t * 
100L

Review comment:
   Let's don't do this. It will require to add the compatibility across all 
the date time functionality in Spark, e.g.) `from_unixtime`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on a change in pull request #28544: [SPARK-31387][SQL][test-maven] Handle unknown operation/session ID in HiveThriftServer2Listener

2020-05-18 Thread GitBox



maropu commented on a change in pull request #28544:
URL: https://github.com/apache/spark/pull/28544#discussion_r426997661



##
File path: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala
##
@@ -131,60 +132,81 @@ private[thriftserver] class HiveThriftServer2Listener(
 updateLiveStore(session)
   }
 
-  private def onSessionClosed(e: SparkListenerThriftServerSessionClosed): Unit 
= {
-val session = sessionList.get(e.sessionId)
-session.finishTimestamp = e.finishTime
-updateStoreWithTriggerEnabled(session)
-sessionList.remove(e.sessionId)
-  }
+  private def onSessionClosed(e: SparkListenerThriftServerSessionClosed): Unit 
=
+Option(sessionList.get(e.sessionId)) match {
+  case None => logWarning(s"onSessionClosed called with unknown session 
id: ${e.sessionId}")
+  case Some(sessionData) =>
+val session = sessionData
+session.finishTimestamp = e.finishTime
+updateStoreWithTriggerEnabled(session)
+sessionList.remove(e.sessionId)
+}
 
-  private def onOperationStart(e: SparkListenerThriftServerOperationStart): 
Unit = {
-val info = getOrCreateExecution(
-  e.id,
-  e.statement,
-  e.sessionId,
-  e.startTime,
-  e.userName)
-
-info.state = ExecutionState.STARTED
-executionList.put(e.id, info)
-sessionList.get(e.sessionId).totalExecution += 1
-executionList.get(e.id).groupId = e.groupId
-updateLiveStore(executionList.get(e.id))
-updateLiveStore(sessionList.get(e.sessionId))
-  }
+  private def onOperationStart(e: SparkListenerThriftServerOperationStart): 
Unit =
+Option(sessionList.get(e.sessionId)) match {
+  case None => logWarning(s"onOperationStart called with unknown session 
id: ${e.sessionId}")

Review comment:
   > @maropu this code is in a listener that executes on the listener bus. 
The listener bus can drop events if it becomes to busy.
   So it might happen that events are dropped, and id is unavailable. It will 
not affect query execution.
   
   Does a server get back to a normal state by just ignoring the case? What I a 
bit worry about is that users can get the same warning message repeatedly if a 
session id gets lost (I'm not sure the case can happen in event sequences).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28569: [SPARK-31748][CORE][PYTHON] Document resource module in PySpark doc and rename/move classes

2020-05-18 Thread GitBox



AmplabJenkins removed a comment on pull request #28569:
URL: https://github.com/apache/spark/pull/28569#issuecomment-630543560







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28569: [SPARK-31748][CORE][PYTHON] Document resource module in PySpark doc and rename/move classes

2020-05-18 Thread GitBox



AmplabJenkins commented on pull request #28569:
URL: https://github.com/apache/spark/pull/28569#issuecomment-630543560







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28569: [SPARK-31748][CORE][PYTHON] Document resource module in PySpark doc and rename/move classes

2020-05-18 Thread GitBox



SparkQA commented on pull request #28569:
URL: https://github.com/apache/spark/pull/28569#issuecomment-630543079


   **[Test build #122825 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122825/testReport)**
 for PR 28569 at commit 
[`6beeca7`](https://github.com/apache/spark/commit/6beeca79435cbfe6c1acecf8eb898efe84faf1f4).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28569: [SPARK-31748][CORE][PYTHON] Document resource module in PySpark doc and rename/move classes

2020-05-18 Thread GitBox



SparkQA removed a comment on pull request #28569:
URL: https://github.com/apache/spark/pull/28569#issuecomment-630529795


   **[Test build #122825 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122825/testReport)**
 for PR 28569 at commit 
[`6beeca7`](https://github.com/apache/spark/commit/6beeca79435cbfe6c1acecf8eb898efe84faf1f4).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28540: [SPARK-31719][SQL] Refactor JoinSelection

2020-05-18 Thread GitBox



AmplabJenkins removed a comment on pull request #28540:
URL: https://github.com/apache/spark/pull/28540#issuecomment-630541237







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28540: [SPARK-31719][SQL] Refactor JoinSelection

2020-05-18 Thread GitBox



AmplabJenkins commented on pull request #28540:
URL: https://github.com/apache/spark/pull/28540#issuecomment-630541237







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28540: [SPARK-31719][SQL] Refactor JoinSelection

2020-05-18 Thread GitBox



SparkQA removed a comment on pull request #28540:
URL: https://github.com/apache/spark/pull/28540#issuecomment-630419853


   **[Test build #122820 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122820/testReport)**
 for PR 28540 at commit 
[`8a2d0ee`](https://github.com/apache/spark/commit/8a2d0ee8ac1cf97d5f4bf2b46306be6b90128159).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28540: [SPARK-31719][SQL] Refactor JoinSelection

2020-05-18 Thread GitBox



SparkQA commented on pull request #28540:
URL: https://github.com/apache/spark/pull/28540#issuecomment-630540660


   **[Test build #122820 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122820/testReport)**
 for PR 28540 at commit 
[`8a2d0ee`](https://github.com/apache/spark/commit/8a2d0ee8ac1cf97d5f4bf2b46306be6b90128159).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #28558: [SPARK-31738][SQL][DOCS] Describe 'L' and 'M' month pattern letters

2020-05-18 Thread GitBox



HyukjinKwon commented on pull request #28558:
URL: https://github.com/apache/spark/pull/28558#issuecomment-630536486


   +1 seems good to me too



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu commented on a change in pull request #28541: [SPARK-31720][CORE] TaskMemoryManager allocate failed when new task comming

2020-05-18 Thread GitBox



AngersZh commented on a change in pull request #28541:
URL: https://github.com/apache/spark/pull/28541#discussion_r426989811



##
File path: core/src/main/scala/org/apache/spark/memory/ExecutionMemoryPool.scala
##
@@ -138,6 +140,11 @@ private[memory] class ExecutionMemoryPool(
   if (toGrant < numBytes && curMem + toGrant < minMemoryPerTask) {
 logInfo(s"TID $taskAttemptId waiting for at least 1/2N of $poolName 
pool to be free")
 lock.wait()
+  } else if (toGrant == 0 && memoryFree > 0) {

Review comment:
   > Or, if one specific pattern of tasks take significantly more memory 
than other tasks, maybe you should tune the tasks themselves, instead of 
modifying the Spark internal for a specific case.
   
   yea, ad-hoc query sort with big key, spill 20G +. only run that query will 
success, run with other sql with high concurrency , failed.
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu commented on a change in pull request #28541: [SPARK-31720][CORE] TaskMemoryManager allocate failed when new task comming

2020-05-18 Thread GitBox



AngersZh commented on a change in pull request #28541:
URL: https://github.com/apache/spark/pull/28541#discussion_r426988749



##
File path: core/src/main/scala/org/apache/spark/memory/ExecutionMemoryPool.scala
##
@@ -138,6 +140,11 @@ private[memory] class ExecutionMemoryPool(
   if (toGrant < numBytes && curMem + toGrant < minMemoryPerTask) {
 logInfo(s"TID $taskAttemptId waiting for at least 1/2N of $poolName 
pool to be free")
 lock.wait()
+  } else if (toGrant == 0 && memoryFree > 0) {

Review comment:
   > That's exactly what I mean, the executor is oversubscribed by tasks. 
The task OOM because the executor memory is not enough, the choice here is 
either to launch worker node with more memory, or change the value of 
`spark.task.cpus` to allow less tasks run in parallel.
   
   In our case, spark.task.cpus is default value, but core per executor is 4. 
Active Task not use too much memory decrease the limit of task which use more 
memory.  So want heavy task wait a little for other task release.  A little 
corner case 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] jiangxb1987 commented on a change in pull request #28541: [SPARK-31720][CORE] TaskMemoryManager allocate failed when new task comming

2020-05-18 Thread GitBox



jiangxb1987 commented on a change in pull request #28541:
URL: https://github.com/apache/spark/pull/28541#discussion_r426988698



##
File path: core/src/main/scala/org/apache/spark/memory/ExecutionMemoryPool.scala
##
@@ -138,6 +140,11 @@ private[memory] class ExecutionMemoryPool(
   if (toGrant < numBytes && curMem + toGrant < minMemoryPerTask) {
 logInfo(s"TID $taskAttemptId waiting for at least 1/2N of $poolName 
pool to be free")
 lock.wait()
+  } else if (toGrant == 0 && memoryFree > 0) {

Review comment:
   Or, if one specific pattern of tasks take significantly more memory than 
other tasks, maybe you should tune the tasks themselves, instead of modifying 
the Spark internal for a specific case.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28569: [SPARK-31748][CORE][PYTHON] Document resource module in PySpark doc and rename/move classes

2020-05-18 Thread GitBox



HyukjinKwon commented on a change in pull request #28569:
URL: https://github.com/apache/spark/pull/28569#discussion_r426478018



##
File path: python/pyspark/resource/information.py
##
@@ -26,8 +26,10 @@ class ResourceInformation(object):
 
 One example is GPUs, where the addresses would be the indices of the GPUs
 
-@param name the name of the resource
-@param addresses an array of strings describing the addresses of the 
resource
+:param name: the name of the resource
+:param addresses: an array of strings describing the addresses of the 
resource
+
+.. versionadded:: 3.0.0

Review comment:
   Will open a PR to backport for this change specifically for branch-3.0. 
This PR itself should go to master only.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] jiangxb1987 commented on a change in pull request #28541: [SPARK-31720][CORE] TaskMemoryManager allocate failed when new task comming

2020-05-18 Thread GitBox



jiangxb1987 commented on a change in pull request #28541:
URL: https://github.com/apache/spark/pull/28541#discussion_r426987592



##
File path: core/src/main/scala/org/apache/spark/memory/ExecutionMemoryPool.scala
##
@@ -138,6 +140,11 @@ private[memory] class ExecutionMemoryPool(
   if (toGrant < numBytes && curMem + toGrant < minMemoryPerTask) {
 logInfo(s"TID $taskAttemptId waiting for at least 1/2N of $poolName 
pool to be free")
 lock.wait()
+  } else if (toGrant == 0 && memoryFree > 0) {

Review comment:
   That's exactly what I mean, the executor is oversubscribed by tasks. The 
task OOM because the executor memory is not enough, the choice here is either 
to launch worker node with more memory, or change the value of 
`spark.task.cpus` to allow less tasks run in parallel. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #28559: [SPARK-31739][PYSPARK][DOCS][MINOR] Fix docstring syntax issues and misplaced space characters.

2020-05-18 Thread GitBox



HyukjinKwon commented on pull request #28559:
URL: https://github.com/apache/spark/pull/28559#issuecomment-630531082


   Merged to branch-3.0 as well.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28569: [SPARK-31748][CORE][PYTHON] Document resource module in PySpark doc and rename/move classes

2020-05-18 Thread GitBox



AmplabJenkins removed a comment on pull request #28569:
URL: https://github.com/apache/spark/pull/28569#issuecomment-630530241







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28569: [SPARK-31748][CORE][PYTHON] Document resource module in PySpark doc and rename/move classes

2020-05-18 Thread GitBox



AmplabJenkins commented on pull request #28569:
URL: https://github.com/apache/spark/pull/28569#issuecomment-630530241







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28569: [SPARK-31748][CORE][PYTHON] Document resource module in PySpark doc and rename/move classes

2020-05-18 Thread GitBox



SparkQA commented on pull request #28569:
URL: https://github.com/apache/spark/pull/28569#issuecomment-630529795


   **[Test build #122825 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122825/testReport)**
 for PR 28569 at commit 
[`6beeca7`](https://github.com/apache/spark/commit/6beeca79435cbfe6c1acecf8eb898efe84faf1f4).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu commented on a change in pull request #28541: [SPARK-31720][CORE] TaskMemoryManager allocate failed when new task comming

2020-05-18 Thread GitBox



AngersZh commented on a change in pull request #28541:
URL: https://github.com/apache/spark/pull/28541#discussion_r426984333



##
File path: core/src/main/scala/org/apache/spark/memory/ExecutionMemoryPool.scala
##
@@ -138,6 +140,11 @@ private[memory] class ExecutionMemoryPool(
   if (toGrant < numBytes && curMem + toGrant < minMemoryPerTask) {
 logInfo(s"TID $taskAttemptId waiting for at least 1/2N of $poolName 
pool to be free")
 lock.wait()
+  } else if (toGrant == 0 && memoryFree > 0) {

Review comment:
   > I don't see why it's still required to wait for more memory here. If 
your executor memory is not sufficient to support so many tasks, either 
increase your executor memory or reduce the slots per executor.
   
   Sometimes required 0 then cause Task throw OOM, in this case always task is 
heavy, re-compute cost a lot. 
   
   For normal Spark job, we can change config to increase memory or change 
slot, but for long running Spark such as Thrift server, we can't always restart 
it. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on pull request #28569: [SPARK-31748][CORE][PYTHON] Document resource module in PySpark doc and rename/move classes

2020-05-18 Thread GitBox



viirya commented on pull request #28569:
URL: https://github.com/apache/spark/pull/28569#issuecomment-630527564


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28569: [SPARK-31748][CORE][PYTHON] Document resource module in PySpark doc and rename/move classes

2020-05-18 Thread GitBox



AmplabJenkins removed a comment on pull request #28569:
URL: https://github.com/apache/spark/pull/28569#issuecomment-630523156


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122824/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28569: [SPARK-31748][CORE][PYTHON] Document resource module in PySpark doc and rename/move classes

2020-05-18 Thread GitBox



AmplabJenkins removed a comment on pull request #28569:
URL: https://github.com/apache/spark/pull/28569#issuecomment-630523152


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28569: [SPARK-31748][CORE][PYTHON] Document resource module in PySpark doc and rename/move classes

2020-05-18 Thread GitBox



AmplabJenkins commented on pull request #28569:
URL: https://github.com/apache/spark/pull/28569#issuecomment-630523152







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28569: [SPARK-31748][CORE][PYTHON] Document resource module in PySpark doc and rename/move classes

2020-05-18 Thread GitBox



SparkQA removed a comment on pull request #28569:
URL: https://github.com/apache/spark/pull/28569#issuecomment-630512946


   **[Test build #122824 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122824/testReport)**
 for PR 28569 at commit 
[`6beeca7`](https://github.com/apache/spark/commit/6beeca79435cbfe6c1acecf8eb898efe84faf1f4).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28569: [SPARK-31748][CORE][PYTHON] Document resource module in PySpark doc and rename/move classes

2020-05-18 Thread GitBox



SparkQA commented on pull request #28569:
URL: https://github.com/apache/spark/pull/28569#issuecomment-630522970


   **[Test build #122824 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122824/testReport)**
 for PR 28569 at commit 
[`6beeca7`](https://github.com/apache/spark/commit/6beeca79435cbfe6c1acecf8eb898efe84faf1f4).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] TJX2014 commented on a change in pull request #28534: [SPARK-31710][SQL]Fix millisecond and microsecond convert to timestamp in to_timestamp

2020-05-18 Thread GitBox



TJX2014 commented on a change in pull request #28534:
URL: https://github.com/apache/spark/pull/28534#discussion_r426973155



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
##
@@ -789,7 +789,7 @@ abstract class ToTimestamp
   protected def downScaleFactor: Long
 
   override def inputTypes: Seq[AbstractDataType] =
-Seq(TypeCollection(StringType, DateType, TimestampType), StringType)
+Seq(TypeCollection(StringType, DateType, TimestampType, LongType ), 
StringType)

Review comment:
   Yes, `IntegerType` should also be considered.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28543: [SPARK-31721][SQL] Assert optimized is initialized before tracking the planning time

2020-05-18 Thread GitBox



AmplabJenkins removed a comment on pull request #28543:
URL: https://github.com/apache/spark/pull/28543#issuecomment-630513775







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28543: [SPARK-31721][SQL] Assert optimized is initialized before tracking the planning time

2020-05-18 Thread GitBox



AmplabJenkins commented on pull request #28543:
URL: https://github.com/apache/spark/pull/28543#issuecomment-630513775







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28569: [SPARK-31748][CORE][PYTHON] Document resource module in PySpark doc and rename/move classes

2020-05-18 Thread GitBox



AmplabJenkins removed a comment on pull request #28569:
URL: https://github.com/apache/spark/pull/28569#issuecomment-630513352







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28569: [SPARK-31748][CORE][PYTHON] Document resource module in PySpark doc and rename/move classes

2020-05-18 Thread GitBox



AmplabJenkins commented on pull request #28569:
URL: https://github.com/apache/spark/pull/28569#issuecomment-630513352







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28543: [SPARK-31721][SQL] Assert optimized is initialized before tracking the planning time

2020-05-18 Thread GitBox



SparkQA removed a comment on pull request #28543:
URL: https://github.com/apache/spark/pull/28543#issuecomment-630360910


   **[Test build #122819 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122819/testReport)**
 for PR 28543 at commit 
[`41ea1a9`](https://github.com/apache/spark/commit/41ea1a91b8741c9af60cd927c64da863ea893cdc).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28569: [SPARK-31748][CORE][PYTHON] Document resource module in PySpark doc and rename/move classes

2020-05-18 Thread GitBox



SparkQA commented on pull request #28569:
URL: https://github.com/apache/spark/pull/28569#issuecomment-630512946


   **[Test build #122824 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122824/testReport)**
 for PR 28569 at commit 
[`6beeca7`](https://github.com/apache/spark/commit/6beeca79435cbfe6c1acecf8eb898efe84faf1f4).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28543: [SPARK-31721][SQL] Assert optimized is initialized before tracking the planning time

2020-05-18 Thread GitBox



SparkQA commented on pull request #28543:
URL: https://github.com/apache/spark/pull/28543#issuecomment-630512988


   **[Test build #122819 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122819/testReport)**
 for PR 28543 at commit 
[`41ea1a9`](https://github.com/apache/spark/commit/41ea1a91b8741c9af60cd927c64da863ea893cdc).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] jiangxb1987 commented on a change in pull request #28541: [SPARK-31720][CORE] TaskMemoryManager allocate failed when new task comming

2020-05-18 Thread GitBox



jiangxb1987 commented on a change in pull request #28541:
URL: https://github.com/apache/spark/pull/28541#discussion_r426963366



##
File path: core/src/main/scala/org/apache/spark/memory/ExecutionMemoryPool.scala
##
@@ -138,6 +140,11 @@ private[memory] class ExecutionMemoryPool(
   if (toGrant < numBytes && curMem + toGrant < minMemoryPerTask) {
 logInfo(s"TID $taskAttemptId waiting for at least 1/2N of $poolName 
pool to be free")
 lock.wait()
+  } else if (toGrant == 0 && memoryFree > 0) {

Review comment:
   This simply means the task has already got assigned the amount of memory 
that is above average, I don't see why it's still required to wait for more 
memory here. If your executor memory is not sufficient to support so many 
tasks, either increase your executor memory or reduce the slots per executor.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] GuoPhilipse commented on pull request #28571: Add compatibility flag to cast long to timestamp

2020-05-18 Thread GitBox



GuoPhilipse commented on pull request #28571:
URL: https://github.com/apache/spark/pull/28571#issuecomment-630503356


   
   
   
   thanks Takeshi ,i will read the rules.
   
   
   
   
   
   
   
   
   
   
   
   
   
   At 2020-05-19 07:43:14, "Takeshi Yamamuro"  wrote:
   
   Hi, @GuoPhilipse. Could you pelase read the Pull Request section in the our 
contribution guide first? Then, plz follow the rules there. 
https://spark.apache.org/contributing.html Thanks for your contribution, anyway.
   
   —
   You are receiving this because you were mentioned.
   Reply to this email directly, view it on GitHub, or unsubscribe.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Sudhar287 commented on pull request #28573: [WIP][SPARK-31579][SQL] Floor div to div

2020-05-18 Thread GitBox



Sudhar287 commented on pull request #28573:
URL: https://github.com/apache/spark/pull/28573#issuecomment-630503124


   Right. Got it!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Sudhar287 closed pull request #28573: [WIP][SPARK-31579][SQL] Floor div to div

2020-05-18 Thread GitBox



Sudhar287 closed pull request #28573:
URL: https://github.com/apache/spark/pull/28573


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on pull request #28573: [WIP][SPARK-31579][SQL] Floor div to div

2020-05-18 Thread GitBox



maropu commented on pull request #28573:
URL: https://github.com/apache/spark/pull/28573#issuecomment-630502891


   If you want to discuss this topic, I think you should do in the JIRA side by 
showing your PoF patch in your local (forked) repository.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28462: [SPARK-31651][CORE] Improve handling the case where different barrier sync types in a single sync

2020-05-18 Thread GitBox



AmplabJenkins removed a comment on pull request #28462:
URL: https://github.com/apache/spark/pull/28462#issuecomment-630501468







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28462: [SPARK-31651][CORE] Improve handling the case where different barrier sync types in a single sync

2020-05-18 Thread GitBox



AmplabJenkins commented on pull request #28462:
URL: https://github.com/apache/spark/pull/28462#issuecomment-630501468







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28488: [SPARK-29083][CORE] Prefetch elements in rdd.toLocalIterator

2020-05-18 Thread GitBox



SparkQA commented on pull request #28488:
URL: https://github.com/apache/spark/pull/28488#issuecomment-630501057


   **[Test build #122822 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122822/testReport)**
 for PR 28488 at commit 
[`6c3856d`](https://github.com/apache/spark/commit/6c3856d5d82de3f98cd0dd93e951b8355e93fe7f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 520 matches

Mail list logo