[GitHub] [spark] SparkQA removed a comment on pull request #32645: [SPARK-35129][SQL] Construct year-month interval column from integral fields

2021-05-23 Thread GitBox


SparkQA removed a comment on pull request #32645:
URL: https://github.com/apache/spark/pull/32645#issuecomment-846707466


   **[Test build #138858 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138858/testReport)**
 for PR 32645 at commit 
[`0197136`](https://github.com/apache/spark/commit/01971362298c3813123e5ea89383e756701ca5e8).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32645: [SPARK-35129][SQL] Construct year-month interval column from integral fields

2021-05-23 Thread GitBox


SparkQA commented on pull request #32645:
URL: https://github.com/apache/spark/pull/32645#issuecomment-846766141


   **[Test build #138858 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138858/testReport)**
 for PR 32645 at commit 
[`0197136`](https://github.com/apache/spark/commit/01971362298c3813123e5ea89383e756701ca5e8).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `case class MakeYMInterval(years: Expression, months: Expression)`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32645: [SPARK-35129][SQL] Construct year-month interval column from integral fields

2021-05-23 Thread GitBox


SparkQA commented on pull request #32645:
URL: https://github.com/apache/spark/pull/32645#issuecomment-846747586


   **[Test build #138860 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138860/testReport)**
 for PR 32645 at commit 
[`111535c`](https://github.com/apache/spark/commit/111535c0aa1954a72b21f70679e0573c42850344).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32646: [SPARK-35057][SQL] Group exception messages in hive/thriftserver

2021-05-23 Thread GitBox


AmplabJenkins removed a comment on pull request #32646:
URL: https://github.com/apache/spark/pull/32646#issuecomment-846746531


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43381/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32645: [SPARK-35129][SQL] Construct year-month interval column from integral fields

2021-05-23 Thread GitBox


AmplabJenkins removed a comment on pull request #32645:
URL: https://github.com/apache/spark/pull/32645#issuecomment-846746526


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43380/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32644: [MINOR] Add thread target wrapper API for pyspark pin thread mode.

2021-05-23 Thread GitBox


AmplabJenkins removed a comment on pull request #32644:
URL: https://github.com/apache/spark/pull/32644#issuecomment-846746525


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43378/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32644: [MINOR] Add thread target wrapper API for pyspark pin thread mode.

2021-05-23 Thread GitBox


AmplabJenkins commented on pull request #32644:
URL: https://github.com/apache/spark/pull/32644#issuecomment-846746525


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43378/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32645: [SPARK-35129][SQL] Construct year-month interval column from integral fields

2021-05-23 Thread GitBox


AmplabJenkins commented on pull request #32645:
URL: https://github.com/apache/spark/pull/32645#issuecomment-846746526


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43380/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32646: [SPARK-35057][SQL] Group exception messages in hive/thriftserver

2021-05-23 Thread GitBox


AmplabJenkins commented on pull request #32646:
URL: https://github.com/apache/spark/pull/32646#issuecomment-846746531


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43381/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #32594: [SPARK-35447][SQL] Optimize skew join before coalescing shuffle partitions

2021-05-23 Thread GitBox


viirya commented on a change in pull request #32594:
URL: https://github.com/apache/spark/pull/32594#discussion_r637693175



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/ShufflePartitionsUtil.scala
##
@@ -21,16 +21,160 @@ import scala.collection.mutable.ArrayBuffer
 
 import org.apache.spark.MapOutputStatistics
 import org.apache.spark.internal.Logging
-import org.apache.spark.sql.execution.{CoalescedPartitionSpec, 
ShufflePartitionSpec}
+import org.apache.spark.sql.execution.{CoalescedPartitionSpec, 
PartialReducerPartitionSpec, ShufflePartitionSpec}
 
 object ShufflePartitionsUtil extends Logging {
   final val SMALL_PARTITION_FACTOR = 0.2
   final val MERGED_PARTITION_FACTOR = 1.2
 
   /**
-   * Coalesce the partitions from multiple shuffles. This method assumes that 
all the shuffles
-   * have the same number of partitions, and the partitions of same index will 
be read together
-   * by one task.
+   * Coalesce the partitions from multiple shuffles, either in their original 
states, or applied
+   * with skew handling partition specs. If called on partitions containing 
skew partition specs,
+   * this method will keep the skew partition specs intact and only coalesce 
the partitions outside
+   * the skew sections.
+   *
+   * This method will return an empty result if the shuffles have been 
coalesced already, or if
+   * they do not have the same number of partitions, or if the coalesced 
result is the same as the
+   * input partition layout.
+   *
+   * @return A sequence of sequence of [[ShufflePartitionSpec]]s, which each 
inner sequence as the
+   * new partition specs for its corresponding shuffle after 
coalescing. If Nil is returned,
+   * then no coalescing is applied.
+   */
+  def coalescePartitions(
+  mapOutputStatistics: Seq[Option[MapOutputStatistics]],
+  inputPartitionSpecs: Seq[Option[Seq[ShufflePartitionSpec]]],
+  advisoryTargetSize: Long,
+  minNumPartitions: Int): Seq[Seq[ShufflePartitionSpec]] = {
+assert(mapOutputStatistics.length == inputPartitionSpecs.length)
+
+if (mapOutputStatistics.isEmpty) {
+  return Seq.empty
+}
+
+// If `minNumPartitions` is very large, it is possible that we need to use 
a value less than
+// `advisoryTargetSize` as the target size of a coalesced task.
+val totalPostShuffleInputSize = 
mapOutputStatistics.flatMap(_.map(_.bytesByPartitionId.sum)).sum
+// The max at here is to make sure that when we have an empty table, we 
only have a single
+// coalesced partition.
+// There is no particular reason that we pick 16. We just need a number to 
prevent
+// `maxTargetSize` from being set to 0.
+val maxTargetSize = math.max(
+  math.ceil(totalPostShuffleInputSize / minNumPartitions.toDouble).toLong, 
16)
+val targetSize = math.min(maxTargetSize, advisoryTargetSize)
+
+val shuffleIds = 
mapOutputStatistics.flatMap(_.map(_.shuffleId)).mkString(", ")
+logInfo(s"For shuffle($shuffleIds), advisory target size: 
$advisoryTargetSize, " +
+  s"actual target size $targetSize.")
+
+val numShuffles = mapOutputStatistics.length
+// `ShuffleQueryStageExec#mapStats` returns None when the input RDD has 0 
partitions,
+// we should skip it when calculating the `partitionStartIndices`.
+val validMetrics = mapOutputStatistics.flatten
+
+if (inputPartitionSpecs.forall(_.isEmpty)) {
+  // If all input RDDs have 0 partition, we create an empty partition for 
every shuffle reader.
+  if (validMetrics.isEmpty) {
+return Seq.fill(numShuffles)(Seq(CoalescedPartitionSpec(0, 0)))
+  }
+
+  // We may have different pre-shuffle partition numbers, don't reduce 
shuffle partition number
+  // in that case. For example when we union fully aggregated data (data 
is arranged to a single
+  // partition) and a result of a SortMergeJoin (multiple partitions).
+  if (validMetrics.map(_.bytesByPartitionId.length).distinct.length > 1) {
+return Seq.empty
+  }
+
+  val numPartitions = validMetrics.head.bytesByPartitionId.length
+  val newPartitionSpecs = coalescePartitions(
+0, numPartitions, validMetrics, targetSize)
+  if (newPartitionSpecs.length < numPartitions) {
+return Seq.fill(numShuffles)(newPartitionSpecs)
+  } else {
+return Seq.empty
+  }
+}
+
+// Do not coalesce if any of the map output stats are missing or if not 
all shuffles have
+// partition specs, which should not happen in practice.
+if (!mapOutputStatistics.forall(_.isDefined) || 
!inputPartitionSpecs.forall(_.isDefined)) {
+  logWarning("Could not apply partition coalescing because of missing 
MapOutputStatistics " +
+"or shuffle partition specs.")
+  return Seq.empty
+}
+
+// Extract the start indices of each partition spec. Give invalid index -1 
to unexpected
+// partition specs. When we reach here, it means skew 

[GitHub] [spark] viirya commented on a change in pull request #32594: [SPARK-35447][SQL] Optimize skew join before coalescing shuffle partitions

2021-05-23 Thread GitBox


viirya commented on a change in pull request #32594:
URL: https://github.com/apache/spark/pull/32594#discussion_r637693175



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/ShufflePartitionsUtil.scala
##
@@ -21,16 +21,160 @@ import scala.collection.mutable.ArrayBuffer
 
 import org.apache.spark.MapOutputStatistics
 import org.apache.spark.internal.Logging
-import org.apache.spark.sql.execution.{CoalescedPartitionSpec, 
ShufflePartitionSpec}
+import org.apache.spark.sql.execution.{CoalescedPartitionSpec, 
PartialReducerPartitionSpec, ShufflePartitionSpec}
 
 object ShufflePartitionsUtil extends Logging {
   final val SMALL_PARTITION_FACTOR = 0.2
   final val MERGED_PARTITION_FACTOR = 1.2
 
   /**
-   * Coalesce the partitions from multiple shuffles. This method assumes that 
all the shuffles
-   * have the same number of partitions, and the partitions of same index will 
be read together
-   * by one task.
+   * Coalesce the partitions from multiple shuffles, either in their original 
states, or applied
+   * with skew handling partition specs. If called on partitions containing 
skew partition specs,
+   * this method will keep the skew partition specs intact and only coalesce 
the partitions outside
+   * the skew sections.
+   *
+   * This method will return an empty result if the shuffles have been 
coalesced already, or if
+   * they do not have the same number of partitions, or if the coalesced 
result is the same as the
+   * input partition layout.
+   *
+   * @return A sequence of sequence of [[ShufflePartitionSpec]]s, which each 
inner sequence as the
+   * new partition specs for its corresponding shuffle after 
coalescing. If Nil is returned,
+   * then no coalescing is applied.
+   */
+  def coalescePartitions(
+  mapOutputStatistics: Seq[Option[MapOutputStatistics]],
+  inputPartitionSpecs: Seq[Option[Seq[ShufflePartitionSpec]]],
+  advisoryTargetSize: Long,
+  minNumPartitions: Int): Seq[Seq[ShufflePartitionSpec]] = {
+assert(mapOutputStatistics.length == inputPartitionSpecs.length)
+
+if (mapOutputStatistics.isEmpty) {
+  return Seq.empty
+}
+
+// If `minNumPartitions` is very large, it is possible that we need to use 
a value less than
+// `advisoryTargetSize` as the target size of a coalesced task.
+val totalPostShuffleInputSize = 
mapOutputStatistics.flatMap(_.map(_.bytesByPartitionId.sum)).sum
+// The max at here is to make sure that when we have an empty table, we 
only have a single
+// coalesced partition.
+// There is no particular reason that we pick 16. We just need a number to 
prevent
+// `maxTargetSize` from being set to 0.
+val maxTargetSize = math.max(
+  math.ceil(totalPostShuffleInputSize / minNumPartitions.toDouble).toLong, 
16)
+val targetSize = math.min(maxTargetSize, advisoryTargetSize)
+
+val shuffleIds = 
mapOutputStatistics.flatMap(_.map(_.shuffleId)).mkString(", ")
+logInfo(s"For shuffle($shuffleIds), advisory target size: 
$advisoryTargetSize, " +
+  s"actual target size $targetSize.")
+
+val numShuffles = mapOutputStatistics.length
+// `ShuffleQueryStageExec#mapStats` returns None when the input RDD has 0 
partitions,
+// we should skip it when calculating the `partitionStartIndices`.
+val validMetrics = mapOutputStatistics.flatten
+
+if (inputPartitionSpecs.forall(_.isEmpty)) {
+  // If all input RDDs have 0 partition, we create an empty partition for 
every shuffle reader.
+  if (validMetrics.isEmpty) {
+return Seq.fill(numShuffles)(Seq(CoalescedPartitionSpec(0, 0)))
+  }
+
+  // We may have different pre-shuffle partition numbers, don't reduce 
shuffle partition number
+  // in that case. For example when we union fully aggregated data (data 
is arranged to a single
+  // partition) and a result of a SortMergeJoin (multiple partitions).
+  if (validMetrics.map(_.bytesByPartitionId.length).distinct.length > 1) {
+return Seq.empty
+  }
+
+  val numPartitions = validMetrics.head.bytesByPartitionId.length
+  val newPartitionSpecs = coalescePartitions(
+0, numPartitions, validMetrics, targetSize)
+  if (newPartitionSpecs.length < numPartitions) {
+return Seq.fill(numShuffles)(newPartitionSpecs)
+  } else {
+return Seq.empty
+  }
+}
+
+// Do not coalesce if any of the map output stats are missing or if not 
all shuffles have
+// partition specs, which should not happen in practice.
+if (!mapOutputStatistics.forall(_.isDefined) || 
!inputPartitionSpecs.forall(_.isDefined)) {
+  logWarning("Could not apply partition coalescing because of missing 
MapOutputStatistics " +
+"or shuffle partition specs.")
+  return Seq.empty
+}
+
+// Extract the start indices of each partition spec. Give invalid index -1 
to unexpected
+// partition specs. When we reach here, it means skew 

[GitHub] [spark] SparkQA commented on pull request #32646: [SPARK-35057][SQL] Group exception messages in hive/thriftserver

2021-05-23 Thread GitBox


SparkQA commented on pull request #32646:
URL: https://github.com/apache/spark/pull/32646#issuecomment-846738097


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43381/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32644: [MINOR] Add thread target wrapper API for pyspark pin thread mode.

2021-05-23 Thread GitBox


SparkQA commented on pull request #32644:
URL: https://github.com/apache/spark/pull/32644#issuecomment-846736560


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43378/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #32644: [MINOR] Add thread target wrapper API for pyspark pin thread mode.

2021-05-23 Thread GitBox


HyukjinKwon commented on pull request #32644:
URL: https://github.com/apache/spark/pull/32644#issuecomment-846730839


   Shall we file a JIRA?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32645: [SPARK-35129][SQL] Construct year-month interval column from integral fields

2021-05-23 Thread GitBox


SparkQA commented on pull request #32645:
URL: https://github.com/apache/spark/pull/32645#issuecomment-846730780


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43380/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32646: [SPARK-35057][SQL] Group exception messages in hive/thriftserver

2021-05-23 Thread GitBox


SparkQA removed a comment on pull request #32646:
URL: https://github.com/apache/spark/pull/32646#issuecomment-846721626


   **[Test build #138859 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138859/testReport)**
 for PR 32646 at commit 
[`2c46597`](https://github.com/apache/spark/commit/2c4659736935029e84c65c43d16a52fe9d5473b7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32646: [SPARK-35057][SQL] Group exception messages in hive/thriftserver

2021-05-23 Thread GitBox


AmplabJenkins removed a comment on pull request #32646:
URL: https://github.com/apache/spark/pull/32646#issuecomment-846725283


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138859/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32646: [SPARK-35057][SQL] Group exception messages in hive/thriftserver

2021-05-23 Thread GitBox


AmplabJenkins commented on pull request #32646:
URL: https://github.com/apache/spark/pull/32646#issuecomment-846725283


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138859/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32646: [SPARK-35057][SQL] Group exception messages in hive/thriftserver

2021-05-23 Thread GitBox


SparkQA commented on pull request #32646:
URL: https://github.com/apache/spark/pull/32646#issuecomment-846725263


   **[Test build #138859 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138859/testReport)**
 for PR 32646 at commit 
[`2c46597`](https://github.com/apache/spark/commit/2c4659736935029e84c65c43d16a52fe9d5473b7).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32644: [MINOR] Add thread target wrapper API for pyspark pin thread mode.

2021-05-23 Thread GitBox


SparkQA commented on pull request #32644:
URL: https://github.com/apache/spark/pull/32644#issuecomment-846721706


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43378/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32646: [SPARK-35057][SQL] Group exception messages in hive/thriftserver

2021-05-23 Thread GitBox


SparkQA commented on pull request #32646:
URL: https://github.com/apache/spark/pull/32646#issuecomment-846721626


   **[Test build #138859 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138859/testReport)**
 for PR 32646 at commit 
[`2c46597`](https://github.com/apache/spark/commit/2c4659736935029e84c65c43d16a52fe9d5473b7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32644: [MINOR] Add thread target wrapper API for pyspark pin thread mode.

2021-05-23 Thread GitBox


AmplabJenkins removed a comment on pull request #32644:
URL: https://github.com/apache/spark/pull/32644#issuecomment-846721119


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138856/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32602: [SPARK-35455][SQL] Enhance EliminateUnnecessaryJoin

2021-05-23 Thread GitBox


AmplabJenkins removed a comment on pull request #32602:
URL: https://github.com/apache/spark/pull/32602#issuecomment-846721120


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43377/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32643: [WIP] migrate transformAllExpressions

2021-05-23 Thread GitBox


AmplabJenkins removed a comment on pull request #32643:
URL: https://github.com/apache/spark/pull/32643#issuecomment-846721117






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32602: [SPARK-35455][SQL] Enhance EliminateUnnecessaryJoin

2021-05-23 Thread GitBox


AmplabJenkins commented on pull request #32602:
URL: https://github.com/apache/spark/pull/32602#issuecomment-846721120


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43377/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32643: [WIP] migrate transformAllExpressions

2021-05-23 Thread GitBox


AmplabJenkins commented on pull request #32643:
URL: https://github.com/apache/spark/pull/32643#issuecomment-846721117






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32644: [MINOR] Add thread target wrapper API for pyspark pin thread mode.

2021-05-23 Thread GitBox


AmplabJenkins commented on pull request #32644:
URL: https://github.com/apache/spark/pull/32644#issuecomment-846721119


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138856/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32645: [SPARK-35129][SQL] Construct year-month interval column from integral fields

2021-05-23 Thread GitBox


SparkQA commented on pull request #32645:
URL: https://github.com/apache/spark/pull/32645#issuecomment-846720772


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43380/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #32488: [SPARK-35316][SQL] UnwrapCastInBinaryComparison support In predicate

2021-05-23 Thread GitBox


maropu commented on a change in pull request #32488:
URL: https://github.com/apache/spark/pull/32488#discussion_r637672796



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala
##
@@ -121,6 +129,49 @@ object UnwrapCastInBinaryComparison extends 
Rule[LogicalPlan] {
 if canImplicitlyCast(fromExp, toType, literalType) =>
   simplifyNumericComparison(be, fromExp, toType, value)
 
+// As the analyzer makes sure that the list of In is already of the same 
data type, then the
+// rule can simply check the first literal in `in.list` can implicitly 
cast to `toType` or not,
+// and this rule doesn't convert in when `in.list` is empty.
+case in @ In(Cast(fromExp, toType: NumericType, _), list @ Seq(firstLit, 
_*))
+if canImplicitlyCast(fromExp, toType, firstLit.dataType) && 
in.inSetConvertible =>
+  val (newValueList, exp) =
+list.map(lit => unwrapCast(EqualTo(in.value, lit)))
+  .partition {
+case EqualTo(_, _: Literal) => true
+case And(IsNull(_), Literal(null, BooleanType)) => false

Review comment:
   `case And(IsNull(_), Literal(null, BooleanType)) => false` => `case _ => 
false`?

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala
##
@@ -121,6 +129,49 @@ object UnwrapCastInBinaryComparison extends 
Rule[LogicalPlan] {
 if canImplicitlyCast(fromExp, toType, literalType) =>
   simplifyNumericComparison(be, fromExp, toType, value)
 
+// As the analyzer makes sure that the list of In is already of the same 
data type, then the
+// rule can simply check the first literal in `in.list` can implicitly 
cast to `toType` or not,
+// and this rule doesn't convert in when `in.list` is empty.
+case in @ In(Cast(fromExp, toType: NumericType, _), list @ Seq(firstLit, 
_*))
+if canImplicitlyCast(fromExp, toType, firstLit.dataType) && 
in.inSetConvertible =>
+  val (newValueList, exp) =
+list.map(lit => unwrapCast(EqualTo(in.value, lit)))
+  .partition {
+case EqualTo(_, _: Literal) => true
+case And(IsNull(_), Literal(null, BooleanType)) => false
+  }
+
+  val (nonNullValueList, nullValueList) = newValueList.partition {
+case EqualTo(_, NonNullLiteral(_, _: NumericType)) => true
+case EqualTo(_, Literal(null, _)) => false
+  }
+  // make sure the new return list have the same dataType.
+  val newList = {
+if (nonNullValueList.nonEmpty) {
+  // cast the null value to the dataType of nonNullValueList
+  // when the nonNullValueList is nonEmpty.
+  nullValueList.map {
+case EqualTo(_, lit) =>
+  Cast(lit, 
nonNullValueList.head.asInstanceOf[EqualTo].left.dataType)
+  } ++ nonNullValueList.map {case EqualTo(_, lit) => lit}
+} else {
+  // the new value list only contains null value,
+  // cast the null value to fromExp.dataType.
+  nullValueList.map {
+case EqualTo(_, lit) =>
+  Cast(lit, fromExp.dataType)
+  }
+}
+  }
+
+  val unwrapIn = In(fromExp, newList)
+  // since `exp` are all the same,
+  // convert to a single value `And(IsNull(_), Literal(null, 
BooleanType))`.
+  exp.headOption match {
+case Some(falseIfNotNull) => Or(falseIfNotNull, unwrapIn)

Review comment:
   We still need to unwrap casts in this case? IIUC we unwrap casts so that 
the later optimizer rules can easily push  down predicates into data sources. 
But, predicates having `Or` makes it hard to push them down?
   

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala
##
@@ -121,6 +129,49 @@ object UnwrapCastInBinaryComparison extends 
Rule[LogicalPlan] {
 if canImplicitlyCast(fromExp, toType, literalType) =>
   simplifyNumericComparison(be, fromExp, toType, value)
 
+// As the analyzer makes sure that the list of In is already of the same 
data type, then the
+// rule can simply check the first literal in `in.list` can implicitly 
cast to `toType` or not,
+// and this rule doesn't convert in when `in.list` is empty.
+case in @ In(Cast(fromExp, toType: NumericType, _), list @ Seq(firstLit, 
_*))
+if canImplicitlyCast(fromExp, toType, firstLit.dataType) && 
in.inSetConvertible =>
+  val (newValueList, exp) =
+list.map(lit => unwrapCast(EqualTo(in.value, lit)))
+  .partition {
+case EqualTo(_, _: Literal) => true
+case And(IsNull(_), Literal(null, BooleanType)) => false
+  }
+
+  val (nonNullValueList, nullValueList) = newValueList.partition {
+case EqualTo(_, NonNullLiteral(_, _: NumericType)) => true
+case EqualTo(_, Literal(null, _)) => 

[GitHub] [spark] SparkQA commented on pull request #32643: [WIP] migrate transformAllExpressions

2021-05-23 Thread GitBox


SparkQA commented on pull request #32643:
URL: https://github.com/apache/spark/pull/32643#issuecomment-846717967


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43379/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer opened a new pull request #32646: [SPARK-35057][SQL] Group exception messages in hive/thriftserver

2021-05-23 Thread GitBox


beliefer opened a new pull request #32646:
URL: https://github.com/apache/spark/pull/32646


   ### What changes were proposed in this pull request?
   This PR group exception messages in 
`sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver`.
   
   
   ### Why are the changes needed?
   It will largely help with standardization of error messages and its 
maintenance.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No. Error messages remain unchanged.
   
   
   ### How was this patch tested?
   No new tests - pass all original tests to make sure it doesn't break any 
existing behavior.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on pull request #32136: [SPARK-35022][CORE] Task Scheduling Plugin in Spark

2021-05-23 Thread GitBox


Ngone51 commented on pull request #32136:
URL: https://github.com/apache/spark/pull/32136#issuecomment-846713976


   > If I understand correctly about stage level scheduling, you still need to 
specify "all" resources needed for "all" tasks in StateRDD; while that may 
block Spark to schedule when some resources are missing (like lost executor 
with PVC), I'm wondering how task level schedule would work as its intention. 
After this, locality is the only one we can deal with, and it's not an 
enforcement so we're back to the origin problem.
   
   That's true. I think we can extend the `ResourceProfile.defaultProfile` by 
adding the state store request to id.
   
   And we may not need to add the state store request to the executor (but task 
only) so the executor doesn't need to load the state store at its launch time 
while using dynamic allocation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32644: [MINOR] Add thread target wrapper API for pyspark pin thread mode.

2021-05-23 Thread GitBox


SparkQA removed a comment on pull request #32644:
URL: https://github.com/apache/spark/pull/32644#issuecomment-846704085


   **[Test build #138856 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138856/testReport)**
 for PR 32644 at commit 
[`af94332`](https://github.com/apache/spark/commit/af94332023c65df92a5b1a781dd3771e115b279b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32644: [MINOR] Add thread target wrapper API for pyspark pin thread mode.

2021-05-23 Thread GitBox


SparkQA commented on pull request #32644:
URL: https://github.com/apache/spark/pull/32644#issuecomment-846713551


   **[Test build #138856 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138856/testReport)**
 for PR 32644 at commit 
[`af94332`](https://github.com/apache/spark/commit/af94332023c65df92a5b1a781dd3771e115b279b).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32602: [SPARK-35455][SQL] Enhance EliminateUnnecessaryJoin

2021-05-23 Thread GitBox


SparkQA commented on pull request #32602:
URL: https://github.com/apache/spark/pull/32602#issuecomment-846712770


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43377/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #32594: [SPARK-35447][SQL] Optimize skew join before coalescing shuffle partitions

2021-05-23 Thread GitBox


viirya commented on a change in pull request #32594:
URL: https://github.com/apache/spark/pull/32594#discussion_r637669250



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala
##
@@ -48,9 +48,6 @@ import org.apache.spark.sql.internal.SQLConf
  * (L2-1, R2), (L2-2, R2),
  * (L3, R3-1), (L3, R3-2),
  * (L4-1, R4-1), (L4-2, R4-1), (L4-1, R4-2), (L4-2, R4-2)
- *
- * Note that, when this rule is enabled, it also coalesces non-skewed 
partitions like
- * `CoalesceShufflePartitions` does.

Review comment:
   Is this comment not true anymore?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] FatalLin commented on pull request #32202: [SPARK-28098][SQL]Supporting non-partitioned Hive tables with subdirectories

2021-05-23 Thread GitBox


FatalLin commented on pull request #32202:
URL: https://github.com/apache/spark/pull/32202#issuecomment-846711680


   > I found the same problem with partition Hive tables if they contain 
subdirectories, so why wasn't it changed in this action?
   
   you mean it will hit the same problem if we trigger the action with hive 
engine instead of spark native reader?
   I thought it could be handled with the hive configuration such like 
"hive.mapred.supports.subdirectories".


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32643: [WIP] migrate transformAllExpressions

2021-05-23 Thread GitBox


SparkQA removed a comment on pull request #32643:
URL: https://github.com/apache/spark/pull/32643#issuecomment-846704117


   **[Test build #138857 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138857/testReport)**
 for PR 32643 at commit 
[`773bb96`](https://github.com/apache/spark/commit/773bb961ddd1824efd7ca37ae462d031ac36edd3).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32643: [WIP] migrate transformAllExpressions

2021-05-23 Thread GitBox


SparkQA commented on pull request #32643:
URL: https://github.com/apache/spark/pull/32643#issuecomment-846711414


   **[Test build #138857 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138857/testReport)**
 for PR 32643 at commit 
[`773bb96`](https://github.com/apache/spark/commit/773bb961ddd1824efd7ca37ae462d031ac36edd3).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `  case class DeferFetchRequestResult(fetchRequest: FetchRequest) 
extends FetchResult`
 * `case class AgeExample(birthday: Expression, child: Expression) extends 
RuntimeReplaceable `
 * `class SessionExtensionsWithLoader extends 
SparkSessionExtensionsProvider `
 * `class SessionExtensionsWithoutLoader extends 
SparkSessionExtensionsProvider `
 * `case class AvroWrite(`
 * `case class KafkaWrite(`
 * `class DataTypeOps(object, metaclass=ABCMeta):`
 * `\"\"\"The base class for binary operations of pandas-on-Spark 
objects (of different data types).\"\"\"`
 * `class BooleanOps(DataTypeOps):`
 * `class CategoricalOps(DataTypeOps):`
 * `class DateOps(DataTypeOps):`
 * `class DatetimeOps(DataTypeOps):`
 * `class NumericOps(DataTypeOps):`
 * `class IntegralOps(NumericOps):`
 * `class FractionalOps(NumericOps):`
 * `class StringOps(DataTypeOps):`
 * `class CachedAccessor(Generic[T]):`
 * `class SparkIndexOpsMethods(metaclass=ABCMeta):`
 * `case class ReferenceEqualPlanWrapper(plan: LogicalPlan) `
 * `  class ExpressionContainmentOrdering extends Ordering[Expression] `
 * `case class TryEval(child: Expression) extends UnaryExpression with 
NullIntolerant `
 * `case class TryAdd(left: Expression, right: Expression, child: 
Expression)`
 * `case class TryDivide(left: Expression, right: Expression, child: 
Expression)`
 * `new RuntimeException(s\"Failed to convert value $value (class of 
$cls) \" +`
 * `new RuntimeException(s\"class $clsName has unexpected serializer: 
$objSerializer\")`
 * `case class UpdatingSessionsExec(`
 * `class UpdatingSessionsIterator(`
 * `trait FileWrite extends Write `
 * `case class CSVWrite(`
 * `case class JsonWrite(`
 * `case class OrcWrite(`
 * `case class ParquetWrite(`
 * `case class TextWrite(`
 * `class ConsoleWrite(schema: StructType, options: 
CaseInsensitiveStringMap)`
 * `class ForeachWrite[T](`
 * `class MemoryWrite(sink: MemorySink, schema: StructType, needTruncate: 
Boolean) extends Write `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on a change in pull request #32563: [SPARK-35415][SQL] Change `information` to map type for SHOW TABLE EXTENDED command

2021-05-23 Thread GitBox


yaooqinn commented on a change in pull request #32563:
URL: https://github.com/apache/spark/pull/32563#discussion_r637668321



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala
##
@@ -65,6 +65,11 @@ object HiveResult {
 // database, table name, isTemp.
 case command @ ExecutedCommandExec(s: ShowTablesCommand) if !s.isExtended 
=>
   command.executeCollect().map(_.getString(1))
+// SHOW TABLE EXTENDED in Hive do not have isTemp while our v1 command 
outputs isTemp.
+case command @ ExecutedCommandExec(s: ShowTablesCommand) if s.isExtended =>
+  command.executeCollect().map(_.getMap(3))
+.map(m => m.keyArray().array.zip(m.valueArray().array)
+  .map(kv => s"${kv._1}: ${kv._2}").mkString("\n"))

Review comment:
   https://github.com/apache/spark/pull/32563/files#r635773394, there seems 
no `space` after the colon in hive results?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #32594: [SPARK-35447][SQL] Optimize skew join before coalescing shuffle partitions

2021-05-23 Thread GitBox


viirya commented on a change in pull request #32594:
URL: https://github.com/apache/spark/pull/32594#discussion_r637667408



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
##
@@ -96,10 +96,9 @@ case class AdaptiveSparkPlanExec(
   @transient private val queryStageOptimizerRules: Seq[Rule[SparkPlan]] = Seq(
 PlanAdaptiveDynamicPruningFilters(this),
 ReuseAdaptiveSubquery(context.subqueryCache),
-CoalesceShufflePartitions(context.session),
-// The following two rules need to make use of 
'CustomShuffleReaderExec.partitionSpecs'
-// added by `CoalesceShufflePartitions`. So they must be executed after it.
+// Skew join does not handle `CustomShuffleReader` so needs to be applied 
first.
 OptimizeSkewedJoin,
+CoalesceShufflePartitions(context.session),
 OptimizeLocalShuffleReader

Review comment:
   As `OptimizeLocalShuffleReader` still handles `CustomShuffleReaderExec`, 
we should keep the comment?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] chong0929 edited a comment on pull request #32202: [SPARK-28098][SQL]Supporting non-partitioned Hive tables with subdirectories

2021-05-23 Thread GitBox


chong0929 edited a comment on pull request #32202:
URL: https://github.com/apache/spark/pull/32202#issuecomment-846708661


   I found the same problem with partition Hive tables if they contain 
subdirectories, so why wasn't it changed in this action?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] chong0929 commented on pull request #32202: [SPARK-28098][SQL]Supporting non-partitioned Hive tables with subdirectories

2021-05-23 Thread GitBox


chong0929 commented on pull request #32202:
URL: https://github.com/apache/spark/pull/32202#issuecomment-846708661


   I found the same problem with partition Hive tables if they contain 
subdirectories, so why wasn't it changed in this action


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32645: [SPARK-35129][SQL] Construct year-month interval column from integral fields

2021-05-23 Thread GitBox


SparkQA commented on pull request #32645:
URL: https://github.com/apache/spark/pull/32645#issuecomment-846707466


   **[Test build #138858 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138858/testReport)**
 for PR 32645 at commit 
[`0197136`](https://github.com/apache/spark/commit/01971362298c3813123e5ea89383e756701ca5e8).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu opened a new pull request #32645: [SPARK-35129][SQL] Construct year-month interval column from integral fields

2021-05-23 Thread GitBox


AngersZh opened a new pull request #32645:
URL: https://github.com/apache/spark/pull/32645


   ### What changes were proposed in this pull request?
   Add a  new function to support construct YearMonthIntervalType from integral 
fields
   
   
   ### Why are the changes needed?
   Add a  new function to support construct YearMonthIntervalType from integral 
fields
   
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yea user can use `make_ym_interval` to construct TearMonthIntervalType from 
years/months integral fields
   
   
   ### How was this patch tested?
   Added UT
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] JkSelf commented on a change in pull request #32594: [SPARK-35447][SQL] Optimize skew join before coalescing shuffle partitions

2021-05-23 Thread GitBox


JkSelf commented on a change in pull request #32594:
URL: https://github.com/apache/spark/pull/32594#discussion_r637663065



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala
##
@@ -158,76 +155,68 @@ object OptimizeSkewedJoin extends CustomShuffleReaderRule 
{
*3 tasks separately.
*/
   private def tryOptimizeJoinChildren(
-  left: ShuffleStageInfo,
-  right: ShuffleStageInfo,
+  left: ShuffleQueryStageExec,
+  right: ShuffleQueryStageExec,
   joinType: JoinType): Option[(SparkPlan, SparkPlan)] = {
-assert(left.partitionsWithSizes.length == right.partitionsWithSizes.length)
-val numPartitions = left.partitionsWithSizes.length
+val leftSizes = left.mapStats.get.bytesByPartitionId
+val rightSizes = right.mapStats.get.bytesByPartitionId
+assert(leftSizes.length == rightSizes.length)
+val numPartitions = leftSizes.length
 // We use the median size of the original shuffle partitions to detect 
skewed partitions.
-val leftMedSize = medianSize(left.mapStats)
-val rightMedSize = medianSize(right.mapStats)
+val leftMedSize = medianSize(leftSizes)
+val rightMedSize = medianSize(rightSizes)
 logDebug(
   s"""
  |Optimizing skewed join.
  |Left side partitions size info:
- |${getSizeInfo(leftMedSize, left.mapStats.bytesByPartitionId)}
+ |${getSizeInfo(leftMedSize, leftSizes)}
  |Right side partitions size info:
- |${getSizeInfo(rightMedSize, right.mapStats.bytesByPartitionId)}
+ |${getSizeInfo(rightMedSize, rightSizes)}
   """.stripMargin)
+
 val canSplitLeft = canSplitLeftSide(joinType)
 val canSplitRight = canSplitRightSide(joinType)
-// We use the actual partition sizes (may be coalesced) to calculate 
target size, so that
-// the final data distribution is even (coalesced partitions + split 
partitions).
-val leftActualSizes = left.partitionsWithSizes.map(_._2)
-val rightActualSizes = right.partitionsWithSizes.map(_._2)
-val leftTargetSize = targetSize(leftActualSizes, leftMedSize)
-val rightTargetSize = targetSize(rightActualSizes, rightMedSize)
+val leftTargetSize = targetSize(leftSizes, leftMedSize)
+val rightTargetSize = targetSize(rightSizes, rightMedSize)
 
 val leftSidePartitions = mutable.ArrayBuffer.empty[ShufflePartitionSpec]
 val rightSidePartitions = mutable.ArrayBuffer.empty[ShufflePartitionSpec]
 var numSkewedLeft = 0
 var numSkewedRight = 0
 for (partitionIndex <- 0 until numPartitions) {
-  val leftActualSize = leftActualSizes(partitionIndex)
-  val isLeftSkew = isSkewed(leftActualSize, leftMedSize) && canSplitLeft
-  val leftPartSpec = left.partitionsWithSizes(partitionIndex)._1
-  val isLeftCoalesced = leftPartSpec.startReducerIndex + 1 < 
leftPartSpec.endReducerIndex
-
-  val rightActualSize = rightActualSizes(partitionIndex)
-  val isRightSkew = isSkewed(rightActualSize, rightMedSize) && 
canSplitRight
-  val rightPartSpec = right.partitionsWithSizes(partitionIndex)._1
-  val isRightCoalesced = rightPartSpec.startReducerIndex + 1 < 
rightPartSpec.endReducerIndex
+  val leftSize = leftSizes(partitionIndex)
+  val isLeftSkew = isSkewed(leftSize, leftMedSize) && canSplitLeft
+  val rightSize = rightSizes(partitionIndex)
+  val isRightSkew = isSkewed(rightSize, rightMedSize) && canSplitRight
+  val noSkewPartitionSpec = Seq(CoalescedPartitionSpec(partitionIndex, 
partitionIndex + 1))
 
-  // A skewed partition should never be coalesced, but skip it here just 
to be safe.
-  val leftParts = if (isLeftSkew && !isLeftCoalesced) {
-val reducerId = leftPartSpec.startReducerIndex
+  val leftParts = if (isLeftSkew) {
 val skewSpecs = createSkewPartitionSpecs(
-  left.mapStats.shuffleId, reducerId, leftTargetSize)
+  left.mapStats.get.shuffleId, partitionIndex, leftTargetSize)
 if (skewSpecs.isDefined) {
   logDebug(s"Left side partition $partitionIndex " +
-s"(${FileUtils.byteCountToDisplaySize(leftActualSize)}) is skewed, 
" +
+s"(${FileUtils.byteCountToDisplaySize(leftSize)}) is skewed, " +
 s"split it into ${skewSpecs.get.length} parts.")
   numSkewedLeft += 1
 }
-skewSpecs.getOrElse(Seq(leftPartSpec))
+skewSpecs.getOrElse(noSkewPartitionSpec)
   } else {
-Seq(leftPartSpec)
+noSkewPartitionSpec
   }
 
   // A skewed partition should never be coalesced, but skip it here just 
to be safe.

Review comment:
   This comment can be removed.

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/ShufflePartitionsUtil.scala
##
@@ -21,16 +21,160 @@ import scala.collection.mutable.ArrayBuffer
 
 import org.apache.spark.MapOutputStatistics
 import org.apache.spark.internal.Logging
-import 

[GitHub] [spark] wangyum commented on a change in pull request #32563: [SPARK-35415][SQL] Change `information` to map type for SHOW TABLE EXTENDED command

2021-05-23 Thread GitBox


wangyum commented on a change in pull request #32563:
URL: https://github.com/apache/spark/pull/32563#discussion_r637663948



##
File path: sql/core/src/test/resources/sql-tests/results/show-tables.sql.out
##
@@ -120,19 +120,9 @@ show_t3
 -- !query
 SHOW TABLE EXTENDED LIKE 'show_t*'
 -- !query schema
-struct
+struct>
 -- !query output
-   show_t3 trueTable: show_t3

Review comment:
   `beeline` before:
   ```
   0: jdbc:hive2://localhost:1> set 
spark.sql.legacy.keepCommandOutputSchema=true;
   +---++
   |key| value  |
   +---++
   | spark.sql.legacy.keepCommandOutputSchema  | true   |
   +---++
   1 row selected (0.055 seconds)
   0: jdbc:hive2://localhost:1> SHOW TABLE EXTENDED LIKE '*';
   
+---+---+--++
   | database  |   tableName   | isTemporary  |information  
   |
   
+---+---+--++
   | default   | test_parquet  | false| CatalogTable(
   Database: default
   Table: test_parquet
   Owner: yumwang
   Created Time: Mon May 24 11:16:33 CST 2021
   Last Access: UNKNOWN
   Created By: Spark 3.2.0-SNAPSHOT
   Type: MANAGED
   Provider: hive
   Table Properties: [transient_lastDdlTime=1621826201]
   Statistics: 290 bytes
   Location: file:/Users/yumwang/tmp//spark/spark-warehouse/test_parquet
   Serde Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
   InputFormat: org.apache.hadoop.mapred.TextInputFormat
   OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
   Storage Properties: [serialization.format=1]
   Partition Provider: Catalog
   Schema: root
|-- id: long (nullable = false)
   )
|
   
+---+---+--++
   1 row selected (0.086 seconds)
   ```
   `beeline` after:
   ```
   0: jdbc:hive2://localhost:1> SHOW TABLE EXTENDED LIKE '*';
   
++---+--++
   | namespace  |   tableName   | isTemporary  |information 
|
   
++---+--++
   | default| test_parquet  | false| {"Created By":"Spark 
3.2.0-SNAPSHOT","Created Time":"Mon May 24 11:16:33 CST 
2021","Database":"default","InputFormat":"org.apache.hadoop.mapred.TextInputFormat","Last
 
Access":"UNKNOWN","Location":"file:/Users/yumwang/tmp//spark/spark-warehouse/test_parquet","OutputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","Owner":"yumwang","Partition
 Provider":"Catalog","Provider":"hive","Schema":"root
|-- id: long (nullable = false)
   ","Serde 
Library":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe","Statistics":"290 
bytes","Storage Properties":"[serialization.format=1]","Table 
Properties":"[transient_lastDdlTime=1621826201]","Table":"test_parquet","Type":"MANAGED"}
 |
   
++---+--++
   1 row selected (0.903 seconds)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a change in pull request #32563: [SPARK-35415][SQL] Change `information` to map type for SHOW TABLE EXTENDED command

2021-05-23 Thread GitBox


wangyum commented on a change in pull request #32563:
URL: https://github.com/apache/spark/pull/32563#discussion_r637662935



##
File path: sql/core/src/test/resources/sql-tests/results/show-tables.sql.out
##
@@ -120,19 +120,9 @@ show_t3
 -- !query
 SHOW TABLE EXTENDED LIKE 'show_t*'
 -- !query schema
-struct
+struct>
 -- !query output
-   show_t3 trueTable: show_t3

Review comment:
   `spark-sql` before:
   ```
   spark-sql> set spark.sql.legacy.keepCommandOutputSchema=true;
   spark.sql.legacy.keepCommandOutputSchema true
   Time taken: 0.012 seconds, Fetched 1 row(s)
   spark-sql> SHOW TABLE EXTENDED LIKE '*';
   default  test_parquetfalse   CatalogTable(
   Database: default
   Table: test_parquet
   Owner: yumwang
   Created Time: Mon May 24 11:16:33 CST 2021
   Last Access: UNKNOWN
   Created By: Spark 3.2.0-SNAPSHOT
   Type: MANAGED
   Provider: hive
   Table Properties: [transient_lastDdlTime=1621826201]
   Statistics: 290 bytes
   Location: file:/Users/yumwang/tmp//spark/spark-warehouse/test_parquet
   Serde Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
   InputFormat: org.apache.hadoop.mapred.TextInputFormat
   OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
   Storage Properties: [serialization.format=1]
   Partition Provider: Catalog
   Schema: root
|-- id: long (nullable = false)
   )
   
   Time taken: 0.031 seconds, Fetched 1 row(s)
   ```
   
   `spark-sql` after:
   ```
   spark-sql> SHOW TABLE EXTENDED LIKE '*';
   default  test_parquetfalse   {"Created By":"Spark 
3.2.0-SNAPSHOT","Created Time":"Mon May 24 11:16:33 CST 
2021","Database":"default","InputFormat":"org.apache.hadoop.mapred.TextInputFormat","Last
 
Access":"UNKNOWN","Location":"file:/Users/yumwang/tmp//spark/spark-warehouse/test_parquet","OutputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","Owner":"yumwang","Partition
 Provider":"Catalog","Provider":"hive","Schema":"root
|-- id: long (nullable = false)
   ","Serde 
Library":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe","Statistics":"290 
bytes","Storage Properties":"[serialization.format=1]","Table 
Properties":"[transient_lastDdlTime=1621826201]","Table":"test_parquet","Type":"MANAGED"}
   Time taken: 0.043 seconds, Fetched 1 row(s)
   ```
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32643: [WIP] migrate transformAllExpressions

2021-05-23 Thread GitBox


SparkQA commented on pull request #32643:
URL: https://github.com/apache/spark/pull/32643#issuecomment-846704117


   **[Test build #138857 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138857/testReport)**
 for PR 32643 at commit 
[`773bb96`](https://github.com/apache/spark/commit/773bb961ddd1824efd7ca37ae462d031ac36edd3).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32644: [MINOR] Add thread target wrapper API for pyspark pin thread mode.

2021-05-23 Thread GitBox


SparkQA commented on pull request #32644:
URL: https://github.com/apache/spark/pull/32644#issuecomment-846704085


   **[Test build #138856 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138856/testReport)**
 for PR 32644 at commit 
[`af94332`](https://github.com/apache/spark/commit/af94332023c65df92a5b1a781dd3771e115b279b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32602: [SPARK-35455][SQL] Enhance EliminateUnnecessaryJoin

2021-05-23 Thread GitBox


AmplabJenkins removed a comment on pull request #32602:
URL: https://github.com/apache/spark/pull/32602#issuecomment-846703760


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43376/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a change in pull request #32563: [SPARK-35415][SQL] Change `information` to map type for SHOW TABLE EXTENDED command

2021-05-23 Thread GitBox


wangyum commented on a change in pull request #32563:
URL: https://github.com/apache/spark/pull/32563#discussion_r637662935



##
File path: sql/core/src/test/resources/sql-tests/results/show-tables.sql.out
##
@@ -120,19 +120,9 @@ show_t3
 -- !query
 SHOW TABLE EXTENDED LIKE 'show_t*'
 -- !query schema
-struct
+struct>
 -- !query output
-   show_t3 trueTable: show_t3

Review comment:
   `spark-sql` Before:
   ```
   spark-sql> set spark.sql.legacy.keepCommandOutputSchema=true;
   spark.sql.legacy.keepCommandOutputSchema true
   Time taken: 0.012 seconds, Fetched 1 row(s)
   spark-sql> SHOW TABLE EXTENDED LIKE '*';
   default  test_parquetfalse   CatalogTable(
   Database: default
   Table: test_parquet
   Owner: yumwang
   Created Time: Mon May 24 11:16:33 CST 2021
   Last Access: UNKNOWN
   Created By: Spark 3.2.0-SNAPSHOT
   Type: MANAGED
   Provider: hive
   Table Properties: [transient_lastDdlTime=1621826201]
   Statistics: 290 bytes
   Location: file:/Users/yumwang/tmp//spark/spark-warehouse/test_parquet
   Serde Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
   InputFormat: org.apache.hadoop.mapred.TextInputFormat
   OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
   Storage Properties: [serialization.format=1]
   Partition Provider: Catalog
   Schema: root
|-- id: long (nullable = false)
   )
   
   Time taken: 0.031 seconds, Fetched 1 row(s)
   ```
   
   `spark-sql` After:
   ```
   spark-sql> SHOW TABLE EXTENDED LIKE '*';
   default  test_parquetfalse   {"Created By":"Spark 
3.2.0-SNAPSHOT","Created Time":"Mon May 24 11:16:33 CST 
2021","Database":"default","InputFormat":"org.apache.hadoop.mapred.TextInputFormat","Last
 
Access":"UNKNOWN","Location":"file:/Users/yumwang/tmp//spark/spark-warehouse/test_parquet","OutputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","Owner":"yumwang","Partition
 Provider":"Catalog","Provider":"hive","Schema":"root
|-- id: long (nullable = false)
   ","Serde 
Library":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe","Statistics":"290 
bytes","Storage Properties":"[serialization.format=1]","Table 
Properties":"[transient_lastDdlTime=1621826201]","Table":"test_parquet","Type":"MANAGED"}
   Time taken: 0.043 seconds, Fetched 1 row(s)
   ```
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32602: [SPARK-35455][SQL] Enhance EliminateUnnecessaryJoin

2021-05-23 Thread GitBox


AmplabJenkins commented on pull request #32602:
URL: https://github.com/apache/spark/pull/32602#issuecomment-846703760


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43376/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32602: [SPARK-35455][SQL] Enhance EliminateUnnecessaryJoin

2021-05-23 Thread GitBox


SparkQA commented on pull request #32602:
URL: https://github.com/apache/spark/pull/32602#issuecomment-846703020


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43377/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] WeichenXu123 opened a new pull request #32644: [MINOR] Add thread target wrapper API for pyspark pin thread mode.

2021-05-23 Thread GitBox


WeichenXu123 opened a new pull request #32644:
URL: https://github.com/apache/spark/pull/32644


   
   
   ### What changes were proposed in this pull request?
   Add thread target wrapper API for pyspark pin thread mode.
   
   ### Why are the changes needed?
   A helper method which make user easier to write threading code under pin 
thread mode.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Manual.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sigmod opened a new pull request #32643: [WIP] migrate transformAllExpressions

2021-05-23 Thread GitBox


sigmod opened a new pull request #32643:
URL: https://github.com/apache/spark/pull/32643


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #32642: [SPARK-35495][R] Change SparkR maintainer for CRAN

2021-05-23 Thread GitBox


dongjoon-hyun commented on pull request #32642:
URL: https://github.com/apache/spark/pull/32642#issuecomment-846700488


   Thank you, @shivaram !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #32136: [SPARK-35022][CORE] Task Scheduling Plugin in Spark

2021-05-23 Thread GitBox


viirya commented on pull request #32136:
URL: https://github.com/apache/spark/pull/32136#issuecomment-846699726


   > That said, personally, as I commented on above comment thread, I'd like to 
see the interface to let end users do the hack on Spark and do whatever they 
want, with taking their own risks. In other words, task level scheduling itself 
sounds OK to me, though I promise I'll respect the decision of committers on 
CORE area.
   > 
   > Btw, IMHO, at least for now, initial state distribution and leveraging PVC 
are something which should be proved in various use cases / workloads in 
production, before taking them on the plate of discussions. Before then, I'd 
like to see Spark be customizable on their needs, so that they are no longer 
blocked on Spark side.
   
   This API is proposed to let other developers to change the way in Spark. And 
yes, taking with their own risks. Actually this might be most less intrusive 
way to support the improvements on SS side. We don't need to touch other core 
pieces but just let developers to work with the API.
   
   Maybe we even don't need to add the PVC stuff into Spark upstream. As it is 
pretty clean API implementation, we can just implement it at our side if Spark 
upstream doesn't want to take the complexity. Mentioning PVC is to show the 
use-case in our mind when we were asked about how we are going to use the API.
   
   Unfortunately so far this is unable to push forward. And I'm not sure right 
now how much change is needed for stage-level scheduling for similar function. 
But I also want to respect the decision of committers on core area, so I'd like 
to take a look on stage-level scheduling as it is suggested above.
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] shivaram commented on pull request #32642: [SPARK-35495][R] Change SparkR maintainer for CRAN

2021-05-23 Thread GitBox


shivaram commented on pull request #32642:
URL: https://github.com/apache/spark/pull/32642#issuecomment-846698523


   Thanks @felixcheung +1 from me as well


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #32136: [SPARK-35022][CORE] Task Scheduling Plugin in Spark

2021-05-23 Thread GitBox


viirya commented on pull request #32136:
URL: https://github.com/apache/spark/pull/32136#issuecomment-846696394


   > My major point is about the characteristic of the checkpoint location.
   > 
   > We require checkpoint location to be "fault-tolerant" including hardware 
failures (local storage doesn't make sense here), and provide "high 
availability" by itself so that Spark can delegate such complexity to the 
checkpoint location. For sure, such requirement leads underlying file system to 
be heavy and non-trivial to maintain, but IMHO that's not an enough reason to 
take the complexity back to Spark, because:
   
   I think the users face one major issue is, they don't have choice. For 
"fault-tolerant", as we consider PVC as an abstract way to look at storage, it 
can support that if the storage class supports the feature. Actually there is 
storage class supporting that. Again, it is about user-choice. Users can choose 
from different storage classes for PVC. How often the fault can occur and how 
serious a fault could be for the streaming app? Not to mention there is also 
snapshot support for volumes on K8S. From less to more, users can choose 
different storage classes to meet their requirements.
   
   For example, for a streaming app that fault may not be too serious issue, 
maybe local storage + occasional snapshot or local storage with raid may be 
good enough?
   
   For industry usage, sometimes it is not easy to ask whatever file system to 
use, e.g. Object stores in Azure or GCS or others, if the users want. Any 
backend file system adoption requires organization change, talent hiring, 
system engineering team support, policy change, etc.
   
   > I'd interpret the reasons as two folds:
   > 
   > A. Majority of real-world workloads are working well with current 
technology
   > B. Some workloads don't work well, but no strong demand on this as 
possible issues are tolerable
   
   I don't want to guess it here, but maybe another one possibility, they are 
moved to other streaming engine which can support their workloads easily.
   
   > I'd be happy to see the overall system design and the result of POC. Let's 
continue the talk about PVC once we get the details.
   
   Sure. This stage-level scheduling is a different direction than my original 
proposal. I need to take some time on revising it. I will keep it posted in 
other place e.g. new JIRA.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn edited a comment on pull request #32594: [SPARK-35447][SQL] Optimize skew join before coalescing shuffle partitions

2021-05-23 Thread GitBox


yaooqinn edited a comment on pull request #32594:
URL: https://github.com/apache/spark/pull/32594#issuecomment-846695031


   Looks fine to me


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on pull request #32594: [SPARK-35447][SQL] Optimize skew join before coalescing shuffle partitions

2021-05-23 Thread GitBox


yaooqinn commented on pull request #32594:
URL: https://github.com/apache/spark/pull/32594#issuecomment-846695031


   Look fine to me


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32602: [SPARK-35455][SQL] Enhance EliminateUnnecessaryJoin

2021-05-23 Thread GitBox


SparkQA commented on pull request #32602:
URL: https://github.com/apache/spark/pull/32602#issuecomment-846693183


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43376/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #32136: [SPARK-35022][CORE] Task Scheduling Plugin in Spark

2021-05-23 Thread GitBox


HeartSaVioR edited a comment on pull request #32136:
URL: https://github.com/apache/spark/pull/32136#issuecomment-846691545


   That said, personally, as I commented on above comment thread, I'd like to 
see the interface to let end users do the hack on Spark and do whatever they 
want, with taking their own risks. In other words, task level scheduling itself 
sounds OK to me, though I promise I'll respect the decision of committers on 
CORE area.
   
   Btw, IMHO, at least for now, initial state distribution and leveraging PVC 
are something which should be proved in various use cases / workloads in 
production, before taking them on the plate of discussions. Before then, I'd 
like to see Spark be customizable on their needs, so that they are no longer 
blocked on Spark side.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #32136: [SPARK-35022][CORE] Task Scheduling Plugin in Spark

2021-05-23 Thread GitBox


HeartSaVioR commented on pull request #32136:
URL: https://github.com/apache/spark/pull/32136#issuecomment-846691545


   That said, personally, as I commented on above comment thread, I'd like to 
see the interface to let end users do the hack on Spark and do whatever they 
want, with taking their own risks. In other words, task level scheduling itself 
sounds OK to me, though I promise I'll respect the decision of committers on 
CORE area.
   
   Btw, IMHO, at least for now, initial state distribution and leveraging PVC 
are something which should be proved in various use cases / workloads in 
production, before taking them on the plate of discussions. Before then, I'd 
like to see Spark be able to customize on their needs, so that they are no 
longer blocked on Spark side.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ankurdave commented on pull request #32625: [SPARK-35486][CORE] TaskMemoryManager: retry if other task takes memory freed by partial self-spill

2021-05-23 Thread GitBox


ankurdave commented on pull request #32625:
URL: https://github.com/apache/spark/pull/32625#issuecomment-846689070


   Fine with me to consider this for master only.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32602: [SPARK-35455][SQL] Enhance EliminateUnnecessaryJoin

2021-05-23 Thread GitBox


SparkQA commented on pull request #32602:
URL: https://github.com/apache/spark/pull/32602#issuecomment-846688411


   **[Test build #138855 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138855/testReport)**
 for PR 32602 at commit 
[`f7a14cf`](https://github.com/apache/spark/commit/f7a14cf49ac27336195dbcb6285b2b2e5f96df38).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #32252: [SPARK-35094][SQL]Spark from_json(JsonToStruct) function return wrong value in permissive mode

2021-05-23 Thread GitBox


dongjoon-hyun commented on pull request #32252:
URL: https://github.com/apache/spark/pull/32252#issuecomment-846687927


   Thank you so much for the confirmation, @HyukjinKwon !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #32136: [SPARK-35022][CORE] Task Scheduling Plugin in Spark

2021-05-23 Thread GitBox


HeartSaVioR edited a comment on pull request #32136:
URL: https://github.com/apache/spark/pull/32136#issuecomment-846687139


   My major point is about the characteristic of the checkpoint location.
   
   We require checkpoint location to be "fault-tolerant" including hardware 
failures (local storage doesn't make sense here), and provide "high 
availability" by itself so that Spark can delegate such complexity to the 
checkpoint location. For sure, such requirement leads underlying file system to 
be heavy and non-trivial to maintain, but IMHO that's not an enough reason to 
take the complexity back to Spark, because:
   
   1. Spark is already quite complicated
   
   Nowadays it's quite uneasy to bring a major changes without affecting 
existing behaviors. Personally I'd like to see the mainline of Apache Spark as 
ensuring majority's demands, not everyone's demands.
   
   2. The industry on file systems (or alike) also makes improvements
   
   Many end users were trying to deal with checkpoint location in object stores 
despite of eventual consistency of S3, and it even became strong consistency. 
Object stores in Azure had been providing strong consistency if I understand 
correctly. Not 100% sure of GCS but I heard they claim strong consistency.
   
   HDFS has been exposing several shortcomings so far, but the community is 
also making improvements like Apache Ozone.
   
   3. Spark community even didn't try to optimize the path
   
   I'd interpret the reasons as two folds:
   
   A. Majority of real-world workloads are working well with current technology
   B. Some workloads don't work well, but no strong demand on this as possible 
issues are tolerable
   
   I think we still don't struggle for this enough. There're still spots to 
reduce the latency down, like I did on optimizing WAL commit phase (#31495). 
I'd like to put my efforts on helping majority's use cases.
   
   > Technically, PVC is kinds of abstract way to look at the volume mounted on 
container running executor. It could be local storage on nodes on k8s. It 
depends where the PVC is bound to.
   
   PVC is not a kind of abstraction which "guarantees" fault-tolerant file 
system, so it still has to depend on the actual file system under the hood, and 
also the guarantees on interface accessing file system. I imagine the 
operational costs on making PVC guarantees such requirements would be 
non-trivial as well.
   
   > Using PVC as checkpoint could be huge relief on the loading of HDFS. There 
are also others like better latency, simplified streaming architecture.
   
   I'd be happy to see the overall system design and the result of POC. Let's 
continue the talk about PVC once we get the details.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #32136: [SPARK-35022][CORE] Task Scheduling Plugin in Spark

2021-05-23 Thread GitBox


HeartSaVioR commented on pull request #32136:
URL: https://github.com/apache/spark/pull/32136#issuecomment-846687139


   My major point is about the characteristic of the checkpoint location.
   
   We require checkpoint location to be "fault-tolerant" including hardware 
failures (local storage doesn't make sense here), and provide "high 
availability" by itself so that Spark can delegate such complexity to the 
checkpoint location. For sure, such requirement leads underlying file system to 
be heavy and non-trivial to maintain, but IMHO that's not an enough reason to 
take the complexity back to Spark, because:
   
   1. Spark is already quite complicated
   
   Nowadays it's quite uneasy to bring a major changes without affecting 
existing behaviors. Personally I'd like to see the mainline of Apache Spark as 
ensuring majority's demands, not everyone's demands.
   
   2. The industry on file systems (or alike) also makes improvements
   
   Many end users were trying to deal with checkpoint location in object stores 
despite of eventual consistency of S3, and it even became strong consistency. 
Object stores in Azure had been providing strong consistency if I understand 
correctly. Not 100% sure of GCS but I heard they claim strong consistency.
   
   HDFS has been exposing several shortcomings so far, but the community is 
also making improvements like Apache Ozone.
   
   3. Spark community even didn't try to optimize the path
   
   I'd interpret the reasons as two folds:
   
   A. Majority of real-world workloads are working well with current technology
   B. Some workloads don't work well, but no strong demand on this as possible 
issues are tolerable
   
   I think we still don't struggle for this enough. There're still spots to 
reduce the latency down, like I did on optimizing WAL commit phase (#31495). 
I'd like to put my efforts on helping majority's use cases.
   
   > Technically, PVC is kinds of abstract way to look at the volume mounted on 
container running executor. It could be local storage on nodes on k8s. It 
depends where the PVC is bound to.
   
   PVC is not a kind of abstraction which "guarantees" fault-tolerant file 
system, so it still has to depend on the actual file system under the hood, and 
also the guarantees on interface accessing file system. I imagine the 
operational costs on making PVC guarantee such requirements would be 
non-trivial as well.
   
   > Using PVC as checkpoint could be huge relief on the loading of HDFS. There 
are also others like better latency, simplified streaming architecture.
   
   I'd be happy to see the overall system design and the result of POC. Let's 
continue the talk about PVC once we get the details.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #32642: [SPARK-35495][R] Change SparkR maintainer for CRAN

2021-05-23 Thread GitBox


dongjoon-hyun commented on pull request #32642:
URL: https://github.com/apache/spark/pull/32642#issuecomment-846686579


   Merged to master/3.1/3.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32599: [SPARK-35320][SQL] Fix from_json maps parser: Only StringType is supported for keys.

2021-05-23 Thread GitBox


AmplabJenkins removed a comment on pull request #32599:
URL: https://github.com/apache/spark/pull/32599#issuecomment-846686262


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43375/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32642: [SPARK-35495][R] Change SparkR maintainer for CRAN

2021-05-23 Thread GitBox


AmplabJenkins removed a comment on pull request #32642:
URL: https://github.com/apache/spark/pull/32642#issuecomment-846686263


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43373/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32602: [SPARK-35455][SQL] Enhance EliminateUnnecessaryJoin

2021-05-23 Thread GitBox


AmplabJenkins removed a comment on pull request #32602:
URL: https://github.com/apache/spark/pull/32602#issuecomment-846686261


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138854/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32642: [SPARK-35495][R] Change SparkR maintainer for CRAN

2021-05-23 Thread GitBox


AmplabJenkins commented on pull request #32642:
URL: https://github.com/apache/spark/pull/32642#issuecomment-846686263


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43373/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32602: [SPARK-35455][SQL] Enhance EliminateUnnecessaryJoin

2021-05-23 Thread GitBox


AmplabJenkins commented on pull request #32602:
URL: https://github.com/apache/spark/pull/32602#issuecomment-846686261


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138854/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32599: [SPARK-35320][SQL] Fix from_json maps parser: Only StringType is supported for keys.

2021-05-23 Thread GitBox


AmplabJenkins commented on pull request #32599:
URL: https://github.com/apache/spark/pull/32599#issuecomment-846686262


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43375/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32625: [SPARK-35486][CORE] TaskMemoryManager: retry if other task takes memory freed by partial self-spill

2021-05-23 Thread GitBox


dongjoon-hyun commented on a change in pull request #32625:
URL: https://github.com/apache/spark/pull/32625#discussion_r637647625



##
File path: core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java
##
@@ -202,14 +202,22 @@ public long acquireExecutionMemory(long required, 
MemoryConsumer consumer) {
 }
   }
 
-  // call spill() on itself
-  if (got < required) {
+  // Attempt to free up memory by self-spilling.
+  //
+  // When our spill handler releases memory, 
`ExecutionMemoryPool#releaseMemory()` will
+  // immediately notify other tasks that memory has been freed, and they 
may acquire the
+  // newly-freed memory before we have a chance to do so (SPARK-35486). In 
that case, we will
+  // try again in the next loop iteration.
+  while (got < required) {

Review comment:
   Thank you for the explanation.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32602: [SPARK-35455][SQL] Enhance EliminateUnnecessaryJoin

2021-05-23 Thread GitBox


SparkQA commented on pull request #32602:
URL: https://github.com/apache/spark/pull/32602#issuecomment-846683680


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43376/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #32642: [SPARK-35495][R] Change SparkR maintainer for CRAN

2021-05-23 Thread GitBox


dongjoon-hyun closed pull request #32642:
URL: https://github.com/apache/spark/pull/32642


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ankurdave commented on a change in pull request #32625: [SPARK-35486][CORE] TaskMemoryManager: retry if other task takes memory freed by partial self-spill

2021-05-23 Thread GitBox


ankurdave commented on a change in pull request #32625:
URL: https://github.com/apache/spark/pull/32625#discussion_r637640349



##
File path: core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java
##
@@ -202,14 +202,22 @@ public long acquireExecutionMemory(long required, 
MemoryConsumer consumer) {
 }
   }
 
-  // call spill() on itself
-  if (got < required) {
+  // Attempt to free up memory by self-spilling.
+  //
+  // When our spill handler releases memory, 
`ExecutionMemoryPool#releaseMemory()` will
+  // immediately notify other tasks that memory has been freed, and they 
may acquire the
+  // newly-freed memory before we have a chance to do so (SPARK-35486). In 
that case, we will
+  // try again in the next loop iteration.
+  while (got < required) {

Review comment:
   Good question. The worst case occurs when waiting tasks take all the 
memory released by `consumer.spill()`. The specific number of iterations 
depends on the behavior of `consumer.spill()`:
   
   - If `consumer.spill()` releases memory promptly when requested to, as all 
current implementations do, then this can iterate up to `consumer.getUsed() / 
(required - got) + 2` times. In all but the last two iterations, `consumer` 
spills `required - got` bytes; in the second-to-last iteration, `consumer` 
spills any remaining bytes; and in the last iteration, `consumer` spills 0 
bytes.
   - If `consumer.spill()` trickles out memory, for example by releasing only 1 
byte per call, then there can be up to `consumer.getUsed() + 1` iterations. In 
all but the last iteration, `consumer` spills 1 byte, and in the last 
iteration, `consumer` spills 0 bytes.
   
   This is the same worst-case behavior as the `while 
(!sortedConsumers.isEmpty())` loop above.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32602: [SPARK-35455][SQL] Enhance EliminateUnnecessaryJoin

2021-05-23 Thread GitBox


SparkQA removed a comment on pull request #32602:
URL: https://github.com/apache/spark/pull/32602#issuecomment-846670009


   **[Test build #138854 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138854/testReport)**
 for PR 32602 at commit 
[`e26df96`](https://github.com/apache/spark/commit/e26df96b72806ace84c72e1f134278e2b20b4457).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32602: [SPARK-35455][SQL] Enhance EliminateUnnecessaryJoin

2021-05-23 Thread GitBox


SparkQA commented on pull request #32602:
URL: https://github.com/apache/spark/pull/32602#issuecomment-846679533


   **[Test build #138854 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138854/testReport)**
 for PR 32602 at commit 
[`e26df96`](https://github.com/apache/spark/commit/e26df96b72806ace84c72e1f134278e2b20b4457).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #32303: [SPARK-34382][SQL] Support LATERAL subqueries

2021-05-23 Thread GitBox


maropu commented on a change in pull request #32303:
URL: https://github.com/apache/spark/pull/32303#discussion_r637630300



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -2395,6 +2421,70 @@ class Analyzer(override val catalogManager: 
CatalogManager)
 }
   }
 
+  /**
+   * This rule resolves lateral joins.
+   */
+  object ResolveLateralJoin extends Rule[LogicalPlan] {

Review comment:
   Q: Btw, is it difficult to merge this resolution logic into the 
`ResolveReferences` one?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #32303: [SPARK-34382][SQL] Support LATERAL subqueries

2021-05-23 Thread GitBox


maropu commented on a change in pull request #32303:
URL: https://github.com/apache/spark/pull/32303#discussion_r637626356



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -1607,33 +1622,30 @@ class Analyzer(override val catalogManager: 
CatalogManager)
 }
   }
   f1.copy(arguments = f1.arguments.flatMap {
-case s: Star => s.expand(child, resolver)
+case s: Star => expand(s)
 case o => o :: Nil
   })
 case c: CreateNamedStruct if containsStar(c.valExprs) =>
   val newChildren = c.children.grouped(2).flatMap {
-case Seq(k, s : Star) => CreateStruct(s.expand(child, 
resolver)).children
+case Seq(k, s : Star) => CreateStruct(expand(s)).children
 case kv => kv
   }
   c.copy(children = newChildren.toList )
 case c: CreateArray if containsStar(c.children) =>
   c.copy(children = c.children.flatMap {
-case s: Star => s.expand(child, resolver)
+case s: Star => expand(s)
 case o => o :: Nil
   })
 case p: Murmur3Hash if containsStar(p.children) =>
   p.copy(children = p.children.flatMap {
-case s: Star => s.expand(child, resolver)
+case s: Star => expand(s)
 case o => o :: Nil
   })
 case p: XxHash64 if containsStar(p.children) =>
   p.copy(children = p.children.flatMap {
-case s: Star => s.expand(child, resolver)
+case s: Star => expand(s)
 case o => o :: Nil
   })
-// count(*) has been replaced by count(1)
-case o if containsStar(o.children) =>
-  throw QueryCompilationErrors.invalidStarUsageError(s"expression 
'${o.prettyName}'")

Review comment:
   Was this error handling moved into `CheckAnalysis` 
https://github.com/apache/spark/pull/32303/files#diff-583171e935b2dc349378063a5841c5b98b30a2d57ac3743a9eccfe7bffcb8f2aR177-R179
 ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #32303: [SPARK-34382][SQL] Support LATERAL subqueries

2021-05-23 Thread GitBox


maropu commented on a change in pull request #32303:
URL: https://github.com/apache/spark/pull/32303#discussion_r637627094



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -1663,6 +1675,27 @@ class Analyzer(override val catalogManager: 
CatalogManager)
 }
   }
 
+  /**
+   * Optionally resolve the name parts using the outer query plan and wrap 
resolved attributes
+   * with [[OuterReference]]s.
+   */
+  private def resolveOuterReference(

Review comment:
   nit: `tryResolveOuterReference`?

##
File path: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
##
@@ -643,12 +643,12 @@ setQuantifier
 ;
 
 relation
-: relationPrimary joinRelation*
+: LATERAL? relationPrimary joinRelation*
 ;
 
 joinRelation
-: (joinType) JOIN right=relationPrimary joinCriteria?
-| NATURAL joinType JOIN right=relationPrimary
+: (joinType) JOIN LATERAL? right=relationPrimary joinCriteria?
+| NATURAL joinType JOIN LATERAL? right=relationPrimary

Review comment:
   Oh, super cleaner than the previous one. Nice.

##
File path: sql/core/src/test/resources/sql-tests/inputs/join-lateral.sql
##
@@ -0,0 +1,97 @@
+-- Test cases for lateral join
+
+CREATE VIEW t1(c1, c2) AS VALUES (0, 1), (1, 2);
+CREATE VIEW t2(c1, c2) AS VALUES (0, 2), (0, 3);
+
+-- lateral join with single column select
+SELECT * FROM t1, LATERAL (SELECT c1);
+SELECT * FROM t1, LATERAL (SELECT c1 FROM t2);
+SELECT * FROM t1, LATERAL (SELECT t1.c1 FROM t2);
+SELECT * FROM t1, LATERAL (SELECT t1.c1 + t2.c1 FROM t2);
+
+-- lateral join with star expansion

Review comment:
   Could you add tests for `quotedRegexColumnNames`, too?

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala
##
@@ -168,6 +168,21 @@ object EliminateOuterJoin extends Rule[LogicalPlan] with 
PredicateHelper {
   }
 }
 
+/**
+ * Rewrite lateral joins by rewriting all dependent joins (if any) inside the 
right
+ * sub-tree of the lateral join and converting the lateral join into a base 
join type.
+ */
+object RewriteLateralJoin extends Rule[LogicalPlan] with PredicateHelper {
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+case j @ Join(left, right, LateralJoin(joinType), condition, _) =>
+  val conditions = condition.map(splitConjunctivePredicates).getOrElse(Nil)
+  val newRight = DecorrelateInnerQuery.rewriteDomainJoins(left, right, 
conditions)
+  // TODO: handle the COUNT bug

Review comment:
   Can you add the jira number (SPARK-15370) here?

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -1607,33 +1622,30 @@ class Analyzer(override val catalogManager: 
CatalogManager)
 }
   }
   f1.copy(arguments = f1.arguments.flatMap {
-case s: Star => s.expand(child, resolver)
+case s: Star => expand(s)
 case o => o :: Nil
   })
 case c: CreateNamedStruct if containsStar(c.valExprs) =>
   val newChildren = c.children.grouped(2).flatMap {
-case Seq(k, s : Star) => CreateStruct(s.expand(child, 
resolver)).children
+case Seq(k, s : Star) => CreateStruct(expand(s)).children
 case kv => kv
   }
   c.copy(children = newChildren.toList )
 case c: CreateArray if containsStar(c.children) =>
   c.copy(children = c.children.flatMap {
-case s: Star => s.expand(child, resolver)
+case s: Star => expand(s)
 case o => o :: Nil
   })
 case p: Murmur3Hash if containsStar(p.children) =>
   p.copy(children = p.children.flatMap {
-case s: Star => s.expand(child, resolver)
+case s: Star => expand(s)
 case o => o :: Nil
   })
 case p: XxHash64 if containsStar(p.children) =>
   p.copy(children = p.children.flatMap {
-case s: Star => s.expand(child, resolver)
+case s: Star => expand(s)
 case o => o :: Nil
   })
-// count(*) has been replaced by count(1)
-case o if containsStar(o.children) =>
-  throw QueryCompilationErrors.invalidStarUsageError(s"expression 
'${o.prettyName}'")

Review comment:
   Did this error handling is moved into `CheckAnalysis` 
https://github.com/apache/spark/pull/32303/files#diff-583171e935b2dc349378063a5841c5b98b30a2d57ac3743a9eccfe7bffcb8f2aR177-R179
 ?

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -276,6 +276,7 @@ class Analyzer(override val catalogManager: CatalogManager)
   ResolveAliases ::
   ResolveSubquery ::
   ResolveSubqueryColumnAliases ::
+  ResolveLateralJoin ::

Review comment:
   Since `ResolveLateralJoin` has a 

[GitHub] [spark] ankurdave commented on a change in pull request #32625: [SPARK-35486][CORE] TaskMemoryManager: retry if other task takes memory freed by partial self-spill

2021-05-23 Thread GitBox


ankurdave commented on a change in pull request #32625:
URL: https://github.com/apache/spark/pull/32625#discussion_r637640349



##
File path: core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java
##
@@ -202,14 +202,22 @@ public long acquireExecutionMemory(long required, 
MemoryConsumer consumer) {
 }
   }
 
-  // call spill() on itself
-  if (got < required) {
+  // Attempt to free up memory by self-spilling.
+  //
+  // When our spill handler releases memory, 
`ExecutionMemoryPool#releaseMemory()` will
+  // immediately notify other tasks that memory has been freed, and they 
may acquire the
+  // newly-freed memory before we have a chance to do so (SPARK-35486). In 
that case, we will
+  // try again in the next loop iteration.
+  while (got < required) {

Review comment:
   Good question. The worst case occurs when waiting tasks take all the 
memory released by `consumer.spill()`. The specific number of iterations 
depends on the behavior of `consumer.spill()`:
   
   - If `consumer.spill()` releases memory promptly when requested to, as all 
current implementations do, then this can iterate up to `consumer.getUsed() / 
(required - got) + 1` times. In all but the last iteration, `consumer` spills 
`required - got` bytes, and in the last iteration, `consumer` spills 0 bytes.
   - If `consumer.spill()` trickles out memory, for example by releasing only 1 
byte per call, then there can be up to `consumer.getUsed() + 1` iterations. In 
all but the last iteration, `consumer` spills 1 byte, and in the last 
iteration, `consumer` spills 0 bytes.
   
   This is the same worst-case behavior as the `while 
(!sortedConsumers.isEmpty())` loop above.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32599: [SPARK-35320][SQL] Fix from_json maps parser: Only StringType is supported for keys.

2021-05-23 Thread GitBox


SparkQA commented on pull request #32599:
URL: https://github.com/apache/spark/pull/32599#issuecomment-846675354


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43375/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32642: [R][RELEASE] change SparkR maintainer for CRAN

2021-05-23 Thread GitBox


SparkQA commented on pull request #32642:
URL: https://github.com/apache/spark/pull/32642#issuecomment-846675040


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43373/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32602: [SPARK-35455][SQL] Enhance EliminateUnnecessaryJoin

2021-05-23 Thread GitBox


SparkQA commented on pull request #32602:
URL: https://github.com/apache/spark/pull/32602#issuecomment-846670009


   **[Test build #138854 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138854/testReport)**
 for PR 32602 at commit 
[`e26df96`](https://github.com/apache/spark/commit/e26df96b72806ace84c72e1f134278e2b20b4457).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32642: [R][RELEASE] change SparkR maintainer for CRAN

2021-05-23 Thread GitBox


AmplabJenkins removed a comment on pull request #32642:
URL: https://github.com/apache/spark/pull/32642#issuecomment-846669848


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138853/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32641: [R][RELEASE] change SparkR maintainer for CRAN

2021-05-23 Thread GitBox


AmplabJenkins removed a comment on pull request #32641:
URL: https://github.com/apache/spark/pull/32641#issuecomment-846669847


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138851/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32641: [R][RELEASE] change SparkR maintainer for CRAN

2021-05-23 Thread GitBox


AmplabJenkins commented on pull request #32641:
URL: https://github.com/apache/spark/pull/32641#issuecomment-846669847


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138851/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32642: [R][RELEASE] change SparkR maintainer for CRAN

2021-05-23 Thread GitBox


AmplabJenkins commented on pull request #32642:
URL: https://github.com/apache/spark/pull/32642#issuecomment-846669848


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138853/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32599: [SPARK-35320][SQL] Fix from_json maps parser: Only StringType is supported for keys.

2021-05-23 Thread GitBox


SparkQA commented on pull request #32599:
URL: https://github.com/apache/spark/pull/32599#issuecomment-84547


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43375/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32642: [R][RELEASE] change SparkR maintainer for CRAN

2021-05-23 Thread GitBox


SparkQA commented on pull request #32642:
URL: https://github.com/apache/spark/pull/32642#issuecomment-84410


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43373/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32642: [R][RELEASE] change SparkR maintainer for CRAN

2021-05-23 Thread GitBox


SparkQA removed a comment on pull request #32642:
URL: https://github.com/apache/spark/pull/32642#issuecomment-846653085


   **[Test build #138853 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138853/testReport)**
 for PR 32642 at commit 
[`30cdcee`](https://github.com/apache/spark/commit/30cdceec4189b836784b4e5e17e9bb47ecc5aa41).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32642: [R][RELEASE] change SparkR maintainer for CRAN

2021-05-23 Thread GitBox


SparkQA commented on pull request #32642:
URL: https://github.com/apache/spark/pull/32642#issuecomment-846662415


   **[Test build #138853 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138853/testReport)**
 for PR 32642 at commit 
[`30cdcee`](https://github.com/apache/spark/commit/30cdceec4189b836784b4e5e17e9bb47ecc5aa41).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32641: [R][RELEASE] change SparkR maintainer for CRAN

2021-05-23 Thread GitBox


SparkQA removed a comment on pull request #32641:
URL: https://github.com/apache/spark/pull/32641#issuecomment-846652261


   **[Test build #138851 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138851/testReport)**
 for PR 32641 at commit 
[`5dbe3bb`](https://github.com/apache/spark/commit/5dbe3bbc5d29e2ba6caca022c37eb0b5a2f13202).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32641: [R][RELEASE] change SparkR maintainer for CRAN

2021-05-23 Thread GitBox


SparkQA commented on pull request #32641:
URL: https://github.com/apache/spark/pull/32641#issuecomment-846661384


   **[Test build #138851 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138851/testReport)**
 for PR 32641 at commit 
[`5dbe3bb`](https://github.com/apache/spark/commit/5dbe3bbc5d29e2ba6caca022c37eb0b5a2f13202).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   >