[GitHub] [spark] AmplabJenkins removed a comment on pull request #28379: [SPARK-28040][SPARK-28070][R] Write type object s3

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28379:
URL: https://github.com/apache/spark/pull/28379#issuecomment-620761429


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122009/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28379: [SPARK-28040][SPARK-28070][R] Write type object s3

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28379:
URL: https://github.com/apache/spark/pull/28379#issuecomment-620761418


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28379: [SPARK-28040][SPARK-28070][R] Write type object s3

2020-04-28 Thread GitBox


SparkQA removed a comment on pull request #28379:
URL: https://github.com/apache/spark/pull/28379#issuecomment-620756951


   **[Test build #122009 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122009/testReport)**
 for PR 28379 at commit 
[`849bc25`](https://github.com/apache/spark/commit/849bc25fe8f15822ea7aff461ae5bc316b153ab2).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28398: [SPARK-31557][SQL][TESTS][FOLLOWUP] Check rebasing in all legacy formatters

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28398:
URL: https://github.com/apache/spark/pull/28398#issuecomment-620761779







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28386: [SPARK-26199][SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-620761796







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28330: [SPARK-31377][SQL][TEST] Added unit tests to 'number of output rows metric' for some joins in SQLMetricSuite

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28330:
URL: https://github.com/apache/spark/pull/28330#issuecomment-620761773







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28386: [SPARK-26199][SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


AmplabJenkins commented on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-620761796







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28330: [SPARK-31377][SQL][TEST] Added unit tests to 'number of output rows metric' for some joins in SQLMetricSuite

2020-04-28 Thread GitBox


AmplabJenkins commented on pull request #28330:
URL: https://github.com/apache/spark/pull/28330#issuecomment-620761773







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28392: [SPARK-31594][SQL] Do not display the seed of rand/randn with no argument in output schema

2020-04-28 Thread GitBox


dongjoon-hyun commented on a change in pull request #28392:
URL: https://github.com/apache/spark/pull/28392#discussion_r416810632



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala
##
@@ -145,6 +156,11 @@ case class Randn(child: Expression) extends RDG with 
ExpressionWithRandomSeed {
   }
 
   override def freshCopy(): Randn = Randn(child)
+
+  override def flatArguments: Iterator[Any] = Iterator(child)
+  override def sql: String = {
+s"randn(${if (useRandSeed) "" else child.sql})"

Review comment:
   Currently, `Randn(child = expr, useRandSeed = true)` seems to be 
possible in program. It might look weird because it will not use rand seed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28392: [SPARK-31594][SQL] Do not display the seed of rand/randn with no argument in output schema

2020-04-28 Thread GitBox


dongjoon-hyun commented on a change in pull request #28392:
URL: https://github.com/apache/spark/pull/28392#discussion_r416810632



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala
##
@@ -145,6 +156,11 @@ case class Randn(child: Expression) extends RDG with 
ExpressionWithRandomSeed {
   }
 
   override def freshCopy(): Randn = Randn(child)
+
+  override def flatArguments: Iterator[Any] = Iterator(child)
+  override def sql: String = {
+s"randn(${if (useRandSeed) "" else child.sql})"

Review comment:
   Currently, `Randn(child = expr, useRandSeed = true)` seems to be 
possible. It might look weird because it will not use rand seed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28379: [SPARK-28040][SPARK-28070][R] Write type object s3

2020-04-28 Thread GitBox


SparkQA commented on pull request #28379:
URL: https://github.com/apache/spark/pull/28379#issuecomment-620761399


   **[Test build #122009 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122009/testReport)**
 for PR 28379 at commit 
[`849bc25`](https://github.com/apache/spark/commit/849bc25fe8f15822ea7aff461ae5bc316b153ab2).
* This patch **fails R style tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28398: [SPARK-31557][SQL][TESTS][FOLLOWUP] Check rebasing in all legacy formatters

2020-04-28 Thread GitBox


AmplabJenkins commented on pull request #28398:
URL: https://github.com/apache/spark/pull/28398#issuecomment-620761779







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28379: [SPARK-28040][SPARK-28070][R] Write type object s3

2020-04-28 Thread GitBox


AmplabJenkins commented on pull request #28379:
URL: https://github.com/apache/spark/pull/28379#issuecomment-620761418







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28330: [SPARK-31377][SQL][TEST] Added unit tests to 'number of output rows metric' for some joins in SQLMetricSuite

2020-04-28 Thread GitBox


SparkQA commented on pull request #28330:
URL: https://github.com/apache/spark/pull/28330#issuecomment-620760923


   **[Test build #122012 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122012/testReport)**
 for PR 28330 at commit 
[`9f7f98e`](https://github.com/apache/spark/commit/9f7f98e4528881797ca1c57ca1944b63fb427d87).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28386: [SPARK-26199][SPARK-31517][R] fix strategy for handling ... names in mutate

2020-04-28 Thread GitBox


SparkQA commented on pull request #28386:
URL: https://github.com/apache/spark/pull/28386#issuecomment-620760984


   **[Test build #122011 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122011/testReport)**
 for PR 28386 at commit 
[`e9c2315`](https://github.com/apache/spark/commit/e9c2315c4931c54609b30ade9c1150ebc576794d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28398: [SPARK-31557][SQL][TESTS][FOLLOWUP] Check rebasing in all legacy formatters

2020-04-28 Thread GitBox


SparkQA commented on pull request #28398:
URL: https://github.com/apache/spark/pull/28398#issuecomment-620761049


   **[Test build #122010 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122010/testReport)**
 for PR 28398 at commit 
[`d653213`](https://github.com/apache/spark/commit/d6532134e313acdd9d1b2f7515f189744ecf56d4).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sririshindra commented on a change in pull request #28330: [SPARK-31377][SQL][TEST] Added unit tests to 'number of output rows metric' for some joins in SQLMetricSuite

2020-04-28 Thread GitBox


sririshindra commented on a change in pull request #28330:
URL: https://github.com/apache/spark/pull/28330#discussion_r416809429



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala
##
@@ -325,34 +325,57 @@ class SQLMetricsSuite extends SharedSparkSession with 
SQLMetricsTestUtils
   // +- LocalTableScan(nodeId = 7)
   Seq((1L, 2L, 5L, false), (2L, 3L, 7L, true)).foreach {
 case (nodeId1, nodeId2, nodeId3, enableWholeStage) =>
-val df = df1.join(df2, "key")
-testSparkPlanMetrics(df, 1, Map(
-  nodeId1 -> (("ShuffledHashJoin", Map(
-"number of output rows" -> 2L))),
-  nodeId2 -> (("Exchange", Map(
-"shuffle records written" -> 2L,
-"records read" -> 2L))),
-  nodeId3 -> (("Exchange", Map(
-"shuffle records written" -> 10L,
-"records read" -> 10L,
-  enableWholeStage
-)
+  val df = df1.join(df2, "key")
+  testSparkPlanMetrics(df, 1, Map(
+nodeId1 -> (("ShuffledHashJoin", Map(
+  "number of output rows" -> 2L))),
+nodeId2 -> (("Exchange", Map(
+  "shuffle records written" -> 2L,
+  "records read" -> 2L))),
+nodeId3 -> (("Exchange", Map(
+  "shuffle records written" -> 10L,
+  "records read" -> 10L,
+enableWholeStage

Review comment:
   I think this indentation is correct. I know it is just a small cosmetic 
change and probably doesn't need to be included in this PR. I will remove it if 
you think this should not be there.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on pull request #28398: [SPARK-31557][SQL][TESTS][FOLLOWUP] Check rebasing in all legacy formatters

2020-04-28 Thread GitBox


MaxGekk commented on pull request #28398:
URL: https://github.com/apache/spark/pull/28398#issuecomment-620760433


   @bersprockets @cloud-fan While working on the fix for legacy timestamp 
formatters, I have found that the round trip tests for dates are not enough. 
Need to test in each direction separately. Also existing tests don't cover all 
legacy parsers. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28392: [SPARK-31594][SQL] Do not display the seed of rand/randn with no argument in output schema

2020-04-28 Thread GitBox


dongjoon-hyun commented on a change in pull request #28392:
URL: https://github.com/apache/spark/pull/28392#discussion_r416809229



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala
##
@@ -145,6 +156,11 @@ case class Randn(child: Expression) extends RDG with 
ExpressionWithRandomSeed {
   }
 
   override def freshCopy(): Randn = Randn(child)
+
+  override def flatArguments: Iterator[Any] = Iterator(child)
+  override def sql: String = {
+s"randn(${if (useRandSeed) "" else child.sql})"

Review comment:
   The naming `useRandSeed` might be a little mismatched. This is used only 
here to hide random seed. Maybe, something like `hideSeed` is more direct?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk opened a new pull request #28398: [SPARK-31557][SQL][TESTS][FOLLOWUP] Check rebasing in all legacy formatters

2020-04-28 Thread GitBox


MaxGekk opened a new pull request #28398:
URL: https://github.com/apache/spark/pull/28398


   ### What changes were proposed in this pull request?
   - Check all available legacy formats in the tests added by 
https://github.com/apache/spark/pull/28345
   - Check dates rebasing in legacy parsers for only one direction either days 
-> string or string -> days.
   
   ### Why are the changes needed?
   Round trip tests can hide issues in dates rebasing. For example, if we 
remove rebasing from legacy parsers (from `parse()` and `format()`) the tests 
will pass.
   
   ### Does this PR introduce any user-facing change?
   No
   
   ### How was this patch tested?
   By running `DateFormatterSuite`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sririshindra commented on a change in pull request #28330: [SPARK-31377][SQL][TEST] Added unit tests to 'number of output rows metric' for some joins in SQLMetricSuite

2020-04-28 Thread GitBox


sririshindra commented on a change in pull request #28330:
URL: https://github.com/apache/spark/pull/28330#discussion_r416806910



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala
##
@@ -325,34 +325,57 @@ class SQLMetricsSuite extends SharedSparkSession with 
SQLMetricsTestUtils
   // +- LocalTableScan(nodeId = 7)
   Seq((1L, 2L, 5L, false), (2L, 3L, 7L, true)).foreach {
 case (nodeId1, nodeId2, nodeId3, enableWholeStage) =>
-val df = df1.join(df2, "key")
-testSparkPlanMetrics(df, 1, Map(
-  nodeId1 -> (("ShuffledHashJoin", Map(
-"number of output rows" -> 2L))),
-  nodeId2 -> (("Exchange", Map(
-"shuffle records written" -> 2L,
-"records read" -> 2L))),
-  nodeId3 -> (("Exchange", Map(
-"shuffle records written" -> 10L,
-"records read" -> 10L,
-  enableWholeStage
-)
+  val df = df1.join(df2, "key")
+  testSparkPlanMetrics(df, 1, Map(
+nodeId1 -> (("ShuffledHashJoin", Map(
+  "number of output rows" -> 2L))),
+nodeId2 -> (("Exchange", Map(
+  "shuffle records written" -> 2L,
+  "records read" -> 2L))),
+nodeId3 -> (("Exchange", Map(
+  "shuffle records written" -> 10L,
+  "records read" -> 10L,
+enableWholeStage
+  )
   }
 }
   }
 
+  test("ShuffledHashJoin(left,outer) metrics") {
+withSQLConf(SQLConf.SHUFFLE_PARTITIONS.key -> "2",

Review comment:
   You are right. I should have removed these in the last commit itself. 
Fixed in the latest commit.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sririshindra commented on a change in pull request #28330: [SPARK-31377][SQL][TEST] Added unit tests to 'number of output rows metric' for some joins in SQLMetricSuite

2020-04-28 Thread GitBox


sririshindra commented on a change in pull request #28330:
URL: https://github.com/apache/spark/pull/28330#discussion_r416807097



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala
##
@@ -325,34 +325,57 @@ class SQLMetricsSuite extends SharedSparkSession with 
SQLMetricsTestUtils
   // +- LocalTableScan(nodeId = 7)
   Seq((1L, 2L, 5L, false), (2L, 3L, 7L, true)).foreach {
 case (nodeId1, nodeId2, nodeId3, enableWholeStage) =>
-val df = df1.join(df2, "key")
-testSparkPlanMetrics(df, 1, Map(
-  nodeId1 -> (("ShuffledHashJoin", Map(
-"number of output rows" -> 2L))),
-  nodeId2 -> (("Exchange", Map(
-"shuffle records written" -> 2L,
-"records read" -> 2L))),
-  nodeId3 -> (("Exchange", Map(
-"shuffle records written" -> 10L,
-"records read" -> 10L,
-  enableWholeStage
-)
+  val df = df1.join(df2, "key")
+  testSparkPlanMetrics(df, 1, Map(
+nodeId1 -> (("ShuffledHashJoin", Map(
+  "number of output rows" -> 2L))),
+nodeId2 -> (("Exchange", Map(
+  "shuffle records written" -> 2L,
+  "records read" -> 2L))),
+nodeId3 -> (("Exchange", Map(
+  "shuffle records written" -> 10L,
+  "records read" -> 10L,
+enableWholeStage
+  )
   }
 }
   }
 
+  test("ShuffledHashJoin(left,outer) metrics") {

Review comment:
   fixed in the latest commit.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sririshindra commented on a change in pull request #28330: [SPARK-31377][SQL][TEST] Added unit tests to 'number of output rows metric' for some joins in SQLMetricSuite

2020-04-28 Thread GitBox


sririshindra commented on a change in pull request #28330:
URL: https://github.com/apache/spark/pull/28330#discussion_r416806448



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala
##
@@ -325,34 +325,57 @@ class SQLMetricsSuite extends SharedSparkSession with 
SQLMetricsTestUtils
   // +- LocalTableScan(nodeId = 7)
   Seq((1L, 2L, 5L, false), (2L, 3L, 7L, true)).foreach {
 case (nodeId1, nodeId2, nodeId3, enableWholeStage) =>
-val df = df1.join(df2, "key")
-testSparkPlanMetrics(df, 1, Map(
-  nodeId1 -> (("ShuffledHashJoin", Map(
-"number of output rows" -> 2L))),
-  nodeId2 -> (("Exchange", Map(
-"shuffle records written" -> 2L,
-"records read" -> 2L))),
-  nodeId3 -> (("Exchange", Map(
-"shuffle records written" -> 10L,
-"records read" -> 10L,
-  enableWholeStage
-)
+  val df = df1.join(df2, "key")
+  testSparkPlanMetrics(df, 1, Map(
+nodeId1 -> (("ShuffledHashJoin", Map(
+  "number of output rows" -> 2L))),
+nodeId2 -> (("Exchange", Map(
+  "shuffle records written" -> 2L,
+  "records read" -> 2L))),
+nodeId3 -> (("Exchange", Map(
+  "shuffle records written" -> 10L,
+  "records read" -> 10L,
+enableWholeStage
+  )
   }
 }
   }
 
+  test("ShuffledHashJoin(left,outer) metrics") {
+withSQLConf(SQLConf.SHUFFLE_PARTITIONS.key -> "2",
+  SQLConf.PREFER_SORTMERGEJOIN.key -> "false") {
+  val leftDf = Seq((1, "1"), (2, "2")).toDF("key", "value")
+  val rightDf = (1 to 10).map(i => (i, i.toString)).toSeq.toDF("key2", 
"value")
+  Seq((0L, "right_outer", leftDf, rightDf, 10L, false),
+(0L, "left_outer", rightDf, leftDf, 10L, false),
+(0L, "right_outer", leftDf, rightDf, 10L, true),
+(0L, "left_outer", rightDf, leftDf, 10L, true),
+(2L, "left_anti", rightDf, leftDf, 8L, true),
+(2L, "left_semi", rightDf, leftDf, 2L, true),
+(1L, "left_anti", rightDf, leftDf, 8L, false),
+(1L, "left_semi", rightDf, leftDf, 2L, false))
+.foreach { case (nodeId, joinType, leftDf, rightDf, rows, 
enableWholeStage) =>
+  val df = leftDf.hint("shuffle_hash").join(
+rightDf.hint("shuffle_hash"), $"key" === $"key2", joinType)
+  testSparkPlanMetrics(df, 1, Map(
+nodeId -> (("ShuffledHashJoin", Map(
+  "number of output rows" -> rows,
+enableWholeStage
+  )
+}
+}
+  }
+
   test("BroadcastHashJoin(outer) metrics") {
 val df1 = Seq((1, "a"), (1, "b"), (4, "c")).toDF("key", "value")
 val df2 = Seq((1, "a"), (1, "b"), (2, "c"), (3, "d")).toDF("key2", "value")
 // Assume the execution plan is
 // ... -> BroadcastHashJoin(nodeId = 0)
-Seq(("left_outer", 0L, 5L, false), ("right_outer", 0L, 6L, false),
-  ("left_outer", 1L, 5L, true), ("right_outer", 1L, 6L, true)).foreach {
-  case (joinType, nodeId, numRows, enableWholeStage) =>
+Seq(("left_outer", 0L, 5L, false), ("right_outer", 0L, 6L, false), 
("left_outer", 1L, 5L, true),
+  ("right_outer", 1L, 6L, true)).foreach { case (joinType, nodeId, 
numRows, enableWholeStage) =>
   val df = df1.join(broadcast(df2), $"key" === $"key2", joinType)
   testSparkPlanMetrics(df, 2, Map(
-nodeId -> (("BroadcastHashJoin", Map(
-  "number of output rows" -> numRows,
+nodeId -> (("BroadcastHashJoin", Map("number of output rows" -> 
numRows,

Review comment:
   Fixed in latest commit





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sririshindra commented on a change in pull request #28330: [SPARK-31377][SQL][TEST] Added unit tests to 'number of output rows metric' for some joins in SQLMetricSuite

2020-04-28 Thread GitBox


sririshindra commented on a change in pull request #28330:
URL: https://github.com/apache/spark/pull/28330#discussion_r416806570



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala
##
@@ -394,6 +420,21 @@ class SQLMetricsSuite extends SharedSparkSession with 
SQLMetricsTestUtils
 }
   }
 
+  test("BroadcastLeftAntiJoinHash metrics") {
+val df1 = Seq((1, "1"), (2, "2")).toDF("key", "value")
+val df2 = Seq((1, "1"), (2, "2"), (3, "3"), (4, "4")).toDF("key2", "value")
+// Assume the execution plan is
+// ... -> BroadcastHashJoin(nodeId = 1)

Review comment:
   Fixed in latest commit





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28379: [SPARK-28040][SPARK-28070][R] Write type object s3

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28379:
URL: https://github.com/apache/spark/pull/28379#issuecomment-620757541







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28379: [SPARK-28040][SPARK-28070][R] Write type object s3

2020-04-28 Thread GitBox


AmplabJenkins commented on pull request #28379:
URL: https://github.com/apache/spark/pull/28379#issuecomment-620757541







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28379: [SPARK-28040][SPARK-28070][R] Write type object s3

2020-04-28 Thread GitBox


SparkQA commented on pull request #28379:
URL: https://github.com/apache/spark/pull/28379#issuecomment-620756951


   **[Test build #122009 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122009/testReport)**
 for PR 28379 at commit 
[`849bc25`](https://github.com/apache/spark/commit/849bc25fe8f15822ea7aff461ae5bc316b153ab2).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28392: [SPARK-31594][SQL] Do not display the seed of rand/randn with no argument in output schema

2020-04-28 Thread GitBox


dongjoon-hyun commented on a change in pull request #28392:
URL: https://github.com/apache/spark/pull/28392#discussion_r416802666



##
File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
##
@@ -3425,6 +3425,28 @@ class SQLQuerySuite extends QueryTest with 
SharedSparkSession with AdaptiveSpark
   assert(SQLConf.get.getConf(SQLConf.CODEGEN_FALLBACK) === true)
 }
   }
+
+  test("Do not display the seed of rand/randn with no argument in output 
schema") {

Review comment:
   If you don't mind, `SPARK-31594: ` prefix please?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28392: [SPARK-31594][SQL] Do not display the seed of rand/randn with no argument in output schema

2020-04-28 Thread GitBox


dongjoon-hyun commented on a change in pull request #28392:
URL: https://github.com/apache/spark/pull/28392#discussion_r416801382



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala
##
@@ -102,6 +105,11 @@ case class Rand(child: Expression) extends RDG with 
ExpressionWithRandomSeed {
   }
 
   override def freshCopy(): Rand = Rand(child)

Review comment:
   What happens when we do `freshCopy`? Do we need to propagate 
`useRandSeed` field?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28397: [SPARK-31519][SQL][2.4] Cast in having aggregate expressions returns the wrong result

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28397:
URL: https://github.com/apache/spark/pull/28397#issuecomment-620742059


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122001/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28397: [SPARK-31519][SQL][2.4] Cast in having aggregate expressions returns the wrong result

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28397:
URL: https://github.com/apache/spark/pull/28397#issuecomment-620742055


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28397: [SPARK-31519][SQL][2.4] Cast in having aggregate expressions returns the wrong result

2020-04-28 Thread GitBox


AmplabJenkins commented on pull request #28397:
URL: https://github.com/apache/spark/pull/28397#issuecomment-620742055







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28397: [SPARK-31519][SQL][2.4] Cast in having aggregate expressions returns the wrong result

2020-04-28 Thread GitBox


SparkQA removed a comment on pull request #28397:
URL: https://github.com/apache/spark/pull/28397#issuecomment-620628606


   **[Test build #122001 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122001/testReport)**
 for PR 28397 at commit 
[`de8da06`](https://github.com/apache/spark/commit/de8da064ac7e708646f0c4aea3e8c0f486a5a13c).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28397: [SPARK-31519][SQL][2.4] Cast in having aggregate expressions returns the wrong result

2020-04-28 Thread GitBox


SparkQA commented on pull request #28397:
URL: https://github.com/apache/spark/pull/28397#issuecomment-620740881


   **[Test build #122001 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122001/testReport)**
 for PR 28397 at commit 
[`de8da06`](https://github.com/apache/spark/commit/de8da064ac7e708646f0c4aea3e8c0f486a5a13c).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `case class AggregateWithHaving(`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dbtsai edited a comment on pull request #28376: [SPARK-31582] [Yarn] Being able to not populate Hadoop classpath

2020-04-28 Thread GitBox


dbtsai edited a comment on pull request #28376:
URL: https://github.com/apache/spark/pull/28376#issuecomment-620738022


   @tgravescs the standard "with hadoop" Spark built with Yarn is sufficient to 
work in our CDH env. Thanks for the review. I'll update the doc and config.md.
   
   @viirya the idea of this PR is isolate the classpath in cluster from the 
Spark distribution, so when Spark runs in the Yarn clusters, the deps from Yarn 
will not be added. Thus, users don't need to pay attention on dep issue since 
Spark distribution is the only source of the classpath.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28395:
URL: https://github.com/apache/spark/pull/28395#issuecomment-620738507







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group

2020-04-28 Thread GitBox


AmplabJenkins commented on pull request #28395:
URL: https://github.com/apache/spark/pull/28395#issuecomment-620738507







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] prakharjain09 commented on a change in pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-04-28 Thread GitBox


prakharjain09 commented on a change in pull request #28370:
URL: https://github.com/apache/spark/pull/28370#discussion_r416781804



##
File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
##
@@ -1829,7 +1895,52 @@ private[spark] class BlockManager(
 data.dispose()
   }
 
+  /**
+   * Class to handle block manager decommissioning retries
+   * It creates a Thread to retry offloading all RDD cache blocks
+   */
+  private class BlockManagerDecommissionManager(conf: SparkConf) {
+@volatile private var stopped = false
+private val blockReplicationThread = new Thread {
+  override def run(): Unit = {
+while (blockManagerDecommissioning && !stopped) {
+  try {
+logDebug("Attempting to replicate all cached RDD blocks")
+decommissionRddCacheBlocks()
+logInfo("Attempt to replicate all cached blocks done")
+val sleepInterval = conf.get(
+  config.STORAGE_DECOMMISSION_REPLICATION_REATTEMPT_INTERVAL)
+Thread.sleep(sleepInterval)
+  } catch {
+case _: InterruptedException =>
+  // no-op
+case NonFatal(e) =>
+  logError("Error occurred while trying to " +
+"replicate cached RDD blocks for block manager 
decommissioning", e)
+  }
+}
+  }
+}
+blockReplicationThread.setDaemon(true)
+blockReplicationThread.setName("block-replication-thread")
+
+def start(): Unit = {
+  logInfo("Starting block replication thread")
+  blockReplicationThread.start()
+}
+
+def stop(): Unit = {
+  if (!stopped) {
+stopped = true
+logInfo("Stopping block replication thread")
+blockReplicationThread.interrupt()
+blockReplicationThread.join()

Review comment:
   @holdenk Yeah - But all the tests that are failing in jenkins build are 
not the ones written in this PR. So that means all those tests must be running 
with storage-decommissioning-flag disabled?
   
   1. 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121988/testReport/
   2. 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121917/testReport/
   
   Is the same Spark application going to get used across all these tests? I 
was assuming that new SparkApp will be created and destroyed for my specific 
tests (as BlockManagerDecommissionSuite creates new SparkContext as part of 
test).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] prakharjain09 commented on a change in pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-04-28 Thread GitBox


prakharjain09 commented on a change in pull request #28370:
URL: https://github.com/apache/spark/pull/28370#discussion_r416781804



##
File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
##
@@ -1829,7 +1895,52 @@ private[spark] class BlockManager(
 data.dispose()
   }
 
+  /**
+   * Class to handle block manager decommissioning retries
+   * It creates a Thread to retry offloading all RDD cache blocks
+   */
+  private class BlockManagerDecommissionManager(conf: SparkConf) {
+@volatile private var stopped = false
+private val blockReplicationThread = new Thread {
+  override def run(): Unit = {
+while (blockManagerDecommissioning && !stopped) {
+  try {
+logDebug("Attempting to replicate all cached RDD blocks")
+decommissionRddCacheBlocks()
+logInfo("Attempt to replicate all cached blocks done")
+val sleepInterval = conf.get(
+  config.STORAGE_DECOMMISSION_REPLICATION_REATTEMPT_INTERVAL)
+Thread.sleep(sleepInterval)
+  } catch {
+case _: InterruptedException =>
+  // no-op
+case NonFatal(e) =>
+  logError("Error occurred while trying to " +
+"replicate cached RDD blocks for block manager 
decommissioning", e)
+  }
+}
+  }
+}
+blockReplicationThread.setDaemon(true)
+blockReplicationThread.setName("block-replication-thread")
+
+def start(): Unit = {
+  logInfo("Starting block replication thread")
+  blockReplicationThread.start()
+}
+
+def stop(): Unit = {
+  if (!stopped) {
+stopped = true
+logInfo("Stopping block replication thread")
+blockReplicationThread.interrupt()
+blockReplicationThread.join()

Review comment:
   Yeah - But all the tests that are failing in jenkins build are not the 
ones written in this PR. So that means all those tests must be running with 
storage-decommissioning-flag disabled?
   
   1. 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121988/testReport/
   2. 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121917/testReport/
   
   Is the same Spark application going to get used across all these tests? I 
was assuming that new SparkApp will be created and destroyed for my specific 
tests (as BlockManagerDecommissionSuite creates new SparkContext as part of 
test).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dbtsai commented on pull request #28376: [SPARK-31582] [Yarn] Being able to not populate Hadoop classpath

2020-04-28 Thread GitBox


dbtsai commented on pull request #28376:
URL: https://github.com/apache/spark/pull/28376#issuecomment-620738022


   @tgravescs the standard "with hadoop" Spark built with Yarn is sufficient to 
work in our CDH env. Thanks for the review. I'll update the doc and config.md.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group

2020-04-28 Thread GitBox


SparkQA removed a comment on pull request #28395:
URL: https://github.com/apache/spark/pull/28395#issuecomment-620614897


   **[Test build #121999 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121999/testReport)**
 for PR 28395 at commit 
[`1ead01d`](https://github.com/apache/spark/commit/1ead01db37e214be60fef8f9c1056661e24ddd15).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group

2020-04-28 Thread GitBox


SparkQA commented on pull request #28395:
URL: https://github.com/apache/spark/pull/28395#issuecomment-620736970


   **[Test build #121999 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121999/testReport)**
 for PR 28395 at commit 
[`1ead01d`](https://github.com/apache/spark/commit/1ead01db37e214be60fef8f9c1056661e24ddd15).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on a change in pull request #28392: [SPARK-31594][SQL] Do not display the seed of rand/randn with no argument in output schema

2020-04-28 Thread GitBox


srowen commented on a change in pull request #28392:
URL: https://github.com/apache/spark/pull/28392#discussion_r416773236



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala
##
@@ -102,6 +102,8 @@ case class Rand(child: Expression) extends RDG with 
ExpressionWithRandomSeed {
   }
 
   override def freshCopy(): Rand = Rand(child)
+
+  override def sql: String = "rand()"

Review comment:
   Fair point, consistency is good. I don't feel strongly either way.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28392: [SPARK-31594][SQL] Do not display the seed of rand/randn with no argument in output schema

2020-04-28 Thread GitBox


maropu commented on a change in pull request #28392:
URL: https://github.com/apache/spark/pull/28392#discussion_r416770553



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala
##
@@ -102,6 +102,8 @@ case class Rand(child: Expression) extends RDG with 
ExpressionWithRandomSeed {
   }
 
   override def freshCopy(): Rand = Rand(child)
+
+  override def sql: String = "rand()"

Review comment:
   Yea, I also think it's important that we can check seeds, but 
`df.explain` is not enough for checking it? Actually, the other two expression 
with random seeds (`shuffle` and `uuid`) don't display it in column names;
   ```
   scala> sql("select shuffle(array(1, 2))").show()
   ++
   |shuffle(array(1, 2))|
   ++
   |  [2, 1]|
   ++
   
   scala> sql("select shuffle(array(1, 2))").explain()
   == Physical Plan ==
   *(1) Project [shuffle([1,2], Some(894779230406706679)) AS shuffle(array(1, 
2))#14]
   +- *(1) Scan OneRowRelation[]
   
   scala> sql("select uuid()").show()
   ++
   |  uuid()|
   ++
   |dde93891-8a95-4e9...|
   ++
   
   scala> sql("select uuid()").explain()
   == Physical Plan ==
   *(1) Project [uuid(Some(4613707233104825008)) AS uuid()#23]
   +- *(1) Scan OneRowRelation[]
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28375: [SPARK-30282][SQL][FOLLOWUP] SHOW TBLPROPERTIES should support views

2020-04-28 Thread GitBox


SparkQA commented on pull request #28375:
URL: https://github.com/apache/spark/pull/28375#issuecomment-620728738


   **[Test build #122008 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122008/testReport)**
 for PR 28375 at commit 
[`c4c02d7`](https://github.com/apache/spark/commit/c4c02d78adc79f02033fadf86272040f05165ad4).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on a change in pull request #28389: [SPARK-31592]bufferPoolsBySize in HeapMemoryAllocator should be thread safe

2020-04-28 Thread GitBox


srowen commented on a change in pull request #28389:
URL: https://github.com/apache/spark/pull/28389#discussion_r416769598



##
File path: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/HeapMemoryAllocator.java
##
@@ -50,7 +50,7 @@ public MemoryBlock allocate(long size) throws 
OutOfMemoryError {
 long alignedSize = numWords * 8L;
 assert (alignedSize >= size);
 if (shouldPool(alignedSize)) {
-  synchronized (this) {
+  synchronized (bufferPoolsBySize) {

Review comment:
   I don't see how it can happen. Is this on 2.4, master? the only thing 
added is a new WeakReference(), which can't be null.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28375: [SPARK-30282][SQL][FOLLOWUP] SHOW TBLPROPERTIES should support views

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28375:
URL: https://github.com/apache/spark/pull/28375#issuecomment-620725295







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28375: [SPARK-30282][SQL][FOLLOWUP] SHOW TBLPROPERTIES should support views

2020-04-28 Thread GitBox


AmplabJenkins commented on pull request #28375:
URL: https://github.com/apache/spark/pull/28375#issuecomment-620725295







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #26339: [SPARK-27194][SPARK-29302][SQL] For dynamic partition overwrite operation, fix speculation task conflict issue and FileAlreadyE

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #26339:
URL: https://github.com/apache/spark/pull/26339#issuecomment-620722571







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #26339: [SPARK-27194][SPARK-29302][SQL] For dynamic partition overwrite operation, fix speculation task conflict issue and FileAlreadyExistsExc

2020-04-28 Thread GitBox


AmplabJenkins commented on pull request #26339:
URL: https://github.com/apache/spark/pull/26339#issuecomment-620722571







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] holdenk removed a comment on pull request #28331: [WIP][SPARK-20629][CORE] Copy shuffle data when nodes are being shutdown

2020-04-28 Thread GitBox


holdenk removed a comment on pull request #28331:
URL: https://github.com/apache/spark/pull/28331#issuecomment-619297810


   Jenkins retest this please
   
   On Fri, Apr 24, 2020 at 6:11 PM UCB AMPLab  wrote:
   
   > Test FAILed.
   > Refer to this link for build results (access rights to CI server needed):
   > https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121785/
   > Test FAILed.
   >
   > —
   > You are receiving this because you authored the thread.
   >
   >
   > Reply to this email directly, view it on GitHub
   > , or
   > unsubscribe
   > 

   > .
   >
   -- 
   Cell : 425-233-8271
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28392: [SPARK-31594][SQL] Do not display the seed of rand/randn with no argument in output schema

2020-04-28 Thread GitBox


AmplabJenkins commented on pull request #28392:
URL: https://github.com/apache/spark/pull/28392#issuecomment-620721030







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28392: [SPARK-31594][SQL] Do not display the seed of rand/randn with no argument in output schema

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28392:
URL: https://github.com/apache/spark/pull/28392#issuecomment-620721030







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #26339: [SPARK-27194][SPARK-29302][SQL] For dynamic partition overwrite operation, fix speculation task conflict issue and FileAlreadyExistsE

2020-04-28 Thread GitBox


SparkQA removed a comment on pull request #26339:
URL: https://github.com/apache/spark/pull/26339#issuecomment-620563639


   **[Test build #121993 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121993/testReport)**
 for PR 26339 at commit 
[`cd80e57`](https://github.com/apache/spark/commit/cd80e577dd58890faa22e1d71aad60f59fdd69af).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #26339: [SPARK-27194][SPARK-29302][SQL] For dynamic partition overwrite operation, fix speculation task conflict issue and FileAlreadyExistsException

2020-04-28 Thread GitBox


SparkQA commented on pull request #26339:
URL: https://github.com/apache/spark/pull/26339#issuecomment-620721091


   **[Test build #121993 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121993/testReport)**
 for PR 26339 at commit 
[`cd80e57`](https://github.com/apache/spark/commit/cd80e577dd58890faa22e1d71aad60f59fdd69af).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28392: [SPARK-31594][SQL] Do not display the seed of rand/randn with no argument in output schema

2020-04-28 Thread GitBox


SparkQA commented on pull request #28392:
URL: https://github.com/apache/spark/pull/28392#issuecomment-620720359


   **[Test build #122007 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122007/testReport)**
 for PR 28392 at commit 
[`663fdc7`](https://github.com/apache/spark/commit/663fdc7266bd3ec6353db84d46cc80d2d1d7decf).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] fanyunbojerry commented on a change in pull request #28389: [SPARK-31592]bufferPoolsBySize in HeapMemoryAllocator should be thread safe

2020-04-28 Thread GitBox


fanyunbojerry commented on a change in pull request #28389:
URL: https://github.com/apache/spark/pull/28389#discussion_r416758545



##
File path: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/HeapMemoryAllocator.java
##
@@ -50,7 +50,7 @@ public MemoryBlock allocate(long size) throws 
OutOfMemoryError {
 long alignedSize = numWords * 8L;
 assert (alignedSize >= size);
 if (shouldPool(alignedSize)) {
-  synchronized (this) {
+  synchronized (bufferPoolsBySize) {

Review comment:
   You're right. I checked my executor log again and I find the executor 
got NPE first
   ```
   java.lang.NullPointerException
at 
org.apache.spark.unsafe.memory.HeapMemoryAllocator.allocate(HeapMemoryAllocator.java:58)
at 
org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:302)
at 
org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:96)
at 
org.apache.spark.unsafe.map.BytesToBytesMap.allocate(BytesToBytesMap:800)
...
   ```
And later got the NoSuchElementExceptionException.
   ```
   java.util.NoSuchElementExceptionException
   at java.util.LinkedList.removeFirst(LinkedList.java:270)
   at java.util.LinkedList.remove(LinkedList.java:685)
at 
org.apache.spark.unsafe.memory.HeapMemoryAllocator.allocate(HeapMemoryAllocator.java:57)
at 
org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:302)
at 
org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:96)
at 
org.apache.spark.unsafe.map.BytesToBytesMap.allocate(BytesToBytesMap:800)
...
   ```
   But I can't find out why NPE error here. Maybe a null WeakReference 
added?
   I updated the JIRA SPARK-31592.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #28376: [SPARK-31582] [Yarn] Being able to not populate Hadoop classpath

2020-04-28 Thread GitBox


viirya commented on pull request #28376:
URL: https://github.com/apache/spark/pull/28376#issuecomment-620717201


   @tgravescs I see. That makes sense. So users still need to pay attention on 
dependencies issue if any when running multiple versions there.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya edited a comment on pull request #28376: [SPARK-31582] [Yarn] Being able to not populate Hadoop classpath

2020-04-28 Thread GitBox


viirya edited a comment on pull request #28376:
URL: https://github.com/apache/spark/pull/28376#issuecomment-620717201


   @tgravescs I see. Thanks. That makes sense. So users still need to pay 
attention on dependencies issue if any when running multiple versions there.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on pull request #28290: [SPARK-29458][SQL][DOCS] Add a paragraph for scalar function in sql getting started

2020-04-28 Thread GitBox


huaxingao commented on pull request #28290:
URL: https://github.com/apache/spark/pull/28290#issuecomment-620713716


   Thanks! @maropu @srowen 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28194: [SPARK-31372][SQL][TEST] Display expression schema for double check.

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28194:
URL: https://github.com/apache/spark/pull/28194#issuecomment-620711811







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28194: [SPARK-31372][SQL][TEST] Display expression schema for double check.

2020-04-28 Thread GitBox


AmplabJenkins commented on pull request #28194:
URL: https://github.com/apache/spark/pull/28194#issuecomment-620711811







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28194: [SPARK-31372][SQL][TEST] Display expression schema for double check.

2020-04-28 Thread GitBox


SparkQA commented on pull request #28194:
URL: https://github.com/apache/spark/pull/28194#issuecomment-620710970


   **[Test build #122006 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122006/testReport)**
 for PR 28194 at commit 
[`a7bc72d`](https://github.com/apache/spark/commit/a7bc72d8c8d6bc1990de5558765c3feab537858e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on pull request #28290: [SPARK-29458][SQL][DOCS] Add a paragraph for scalar function in sql getting started

2020-04-28 Thread GitBox


srowen commented on pull request #28290:
URL: https://github.com/apache/spark/pull/28290#issuecomment-620708879


   Merged to master/3.0



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] attilapiros commented on a change in pull request #28094: [SPARK-29303][Web UI] Add UI support for stage level scheduling

2020-04-28 Thread GitBox


attilapiros commented on a change in pull request #28094:
URL: https://github.com/apache/spark/pull/28094#discussion_r416743570



##
File path: core/src/main/scala/org/apache/spark/ui/env/EnvironmentPage.scala
##
@@ -38,6 +40,34 @@ private[ui] class EnvironmentPage(
   "Java Home" -> appEnv.runtime.javaHome,
   "Scala Version" -> appEnv.runtime.scalaVersion)
 
+def constructExecutorRequestString(ereqs: Map[String, 
ExecutorResourceRequest]): String = {
+  ereqs.map {
+case (_, ereq) =>
+  val execStr = new mutable.StringBuilder()
+  execStr ++= s"\t${ereq.resourceName}: [amount: ${ereq.amount}"
+  if (ereq.discoveryScript.nonEmpty) execStr ++= s", discovery: 
${ereq.discoveryScript}"
+  if (ereq.vendor.nonEmpty) execStr ++= s", vendor: ${ereq.vendor}"
+  execStr ++= "]"
+  execStr.toString()
+  }.mkString("\n")
+}
+
+def constructTaskRequestString(treqs: Map[String, TaskResourceRequest]): 
String = {
+  treqs.map {
+case (_, ereq) => s"\t${ereq.resourceName}: [amount: ${ereq.amount}]"

Review comment:
   Nit: `treqs`  => `taskReqs` and more importantly `ereq` => `taskReq`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #28397: [SPARK-31519][SQL][2.4] Cast in having aggregate expressions returns the wrong result

2020-04-28 Thread GitBox


dongjoon-hyun commented on pull request #28397:
URL: https://github.com/apache/spark/pull/28397#issuecomment-620704036


   Hi, @xuanyuanking and @cloud-fan . Could you update the PR description 
appropriately because it's invalid in branch-2.4.
   ```
   scala> spark.version
   res7: String = 2.4.5
   
   scala> sql("SELECT SUM(a) AS b, CAST('2020-01-01' AS DATE) AS fake FROM 
VALUES (1, 10), (2, 20) AS T(a, b) GROUP BY b HAVING b > 10").show
   +---+--+
   |  b|  fake|
   +---+--+
   |  2|2020-01-01|
   +---+--+
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28138: [SPARK-31366][DOCS][SQL] Add doc for the aggregation in SQL reference guide

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28138:
URL: https://github.com/apache/spark/pull/28138#issuecomment-610096272


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28370:
URL: https://github.com/apache/spark/pull/28370#issuecomment-620702413







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-04-28 Thread GitBox


AmplabJenkins commented on pull request #28370:
URL: https://github.com/apache/spark/pull/28370#issuecomment-620702413







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-04-28 Thread GitBox


SparkQA commented on pull request #28370:
URL: https://github.com/apache/spark/pull/28370#issuecomment-620701534


   **[Test build #122005 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122005/testReport)**
 for PR 28370 at commit 
[`5847c1c`](https://github.com/apache/spark/commit/5847c1cb655f5a376547c438347d8334e964d454).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28138: [SPARK-31366][DOCS][SQL] Add doc for the aggregation in SQL reference guide

2020-04-28 Thread GitBox


AmplabJenkins commented on pull request #28138:
URL: https://github.com/apache/spark/pull/28138#issuecomment-620701289


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on pull request #28332: [SPARK-31556][SQL][DOCS] Document LIKE clause in SQL Reference

2020-04-28 Thread GitBox


huaxingao commented on pull request #28332:
URL: https://github.com/apache/spark/pull/28332#issuecomment-620700495


   cc @srowen 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on pull request #28290: [SPARK-29458][SQL][DOCS] Add a paragraph for scalar function in sql getting started

2020-04-28 Thread GitBox


huaxingao commented on pull request #28290:
URL: https://github.com/apache/spark/pull/28290#issuecomment-620698960


   @srowen This PR is ready to be merged. @maropu was suggesting to document 
more keywords later on. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #28194: [SPARK-31372][SQL][TEST] Display expression schema for double check.

2020-04-28 Thread GitBox


beliefer commented on a change in pull request #28194:
URL: https://github.com/apache/spark/pull/28194#discussion_r416733719



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala
##
@@ -0,0 +1,230 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.io.File
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.sql.catalyst.expressions.ExpressionInfo
+import org.apache.spark.sql.catalyst.util.{fileToString, stringToFile}
+import org.apache.spark.sql.execution.HiveResult.hiveResultString
+import org.apache.spark.sql.test.SharedSparkSession
+import org.apache.spark.tags.ExtendedSQLTest
+
+// scalastyle:off line.size.limit
+/**
+ * End-to-end test cases for SQL schemas of expression examples.
+ * The golden result file is 
"spark/sql/core/src/test/resources/sql-functions/sql-expression-schema.md".
+ *
+ * To run the entire test suite:
+ * {{{
+ *   build/sbt "sql/test-only *ExpressionsSchemaSuite"
+ * }}}
+ *
+ * To re-generate golden files for entire suite, run:
+ * {{{
+ *   SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only 
*ExpressionsSchemaSuite"
+ * }}}
+ *
+ * For example:
+ * {{{
+ *   ...
+ *   @ExpressionDescription(
+ * usage = "_FUNC_(str, n) - Returns the string which repeats the given 
string value n times.",
+ * examples = """
+ *   Examples:
+ * > SELECT _FUNC_('123', 2);
+ *  123123
+ * """,
+ * since = "1.5.0")
+ *   case class StringRepeat(str: Expression, times: Expression)
+ *   ...
+ * }}}
+ *
+ * The format for golden result files look roughly like:
+ * {{{
+ *   ...
+ *   | 238 | org.apache.spark.sql.catalyst.expressions.StringRepeat | repeat | 
SELECT repeat('123', 2) | struct |
+ *   ...
+ * }}}
+ */
+// scalastyle:on line.size.limit
+@ExtendedSQLTest
+class ExpressionsSchemaSuite extends QueryTest with SharedSparkSession {
+
+  private val regenerateGoldenFiles: Boolean = 
System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1"
+
+  private val baseResourcePath = {
+// We use a path based on Spark home for 2 reasons:
+//   1. Maven can't get correct resource directory when resources in other 
jars.
+//   2. We test subclasses in the hive-thriftserver module.
+val sparkHome = {
+  assert(sys.props.contains("spark.test.home") ||
+sys.env.contains("SPARK_HOME"), "spark.test.home or SPARK_HOME is not 
set.")
+  sys.props.getOrElse("spark.test.home", sys.env("SPARK_HOME"))
+}
+
+java.nio.file.Paths.get(sparkHome,
+  "sql", "core", "src", "test", "resources", "sql-functions").toFile
+  }
+
+  private val resultFile = new File(baseResourcePath, 
"sql-expression-schema.md")
+
+  val ignoreSet = Set(
+// One of examples shows getting the current timestamp
+"org.apache.spark.sql.catalyst.expressions.UnixTimestamp",
+// Random output without a seed
+"org.apache.spark.sql.catalyst.expressions.Rand",
+"org.apache.spark.sql.catalyst.expressions.Randn",
+"org.apache.spark.sql.catalyst.expressions.Shuffle",
+"org.apache.spark.sql.catalyst.expressions.Uuid",
+// The example calls methods that return unstable results.
+"org.apache.spark.sql.catalyst.expressions.CallMethodViaReflection")

Review comment:
   Thanks. I will update this PR,  If #28392 accepted





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-04-28 Thread GitBox


dongjoon-hyun commented on pull request #28370:
URL: https://github.com/apache/spark/pull/28370#issuecomment-620697820


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] attilapiros commented on a change in pull request #28094: [SPARK-29303][Web UI] Add UI support for stage level scheduling

2020-04-28 Thread GitBox


attilapiros commented on a change in pull request #28094:
URL: https://github.com/apache/spark/pull/28094#discussion_r416618745



##
File path: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala
##
@@ -197,10 +217,16 @@ private[spark] class AppStatusListener(
 exec.host = event.executorInfo.executorHost
 exec.isActive = true
 exec.totalCores = event.executorInfo.totalCores
-exec.maxTasks = event.executorInfo.totalCores / coresPerTask
+val rpId = event.executorInfo.resourceProfileId
+val liveRP = liveResourceProfiles.get(rpId)
+val cpusPerTask = liveRP.map(_.taskResources.get(CPUS)
+  
.map(_.amount.toInt).getOrElse(defaultCoresPerTask)).getOrElse(defaultCoresPerTask)

Review comment:
   What about flatMap and saving the first 
`.getOrElse(defaultCoresPerTask)` ?
   
   ```scala
val cpusPerTask = liveRP.flatMap(_.taskResources.get(CPUS)
  .map(_.amount.toInt)).getOrElse(defaultCoresPerTask)
   ```

##
File path: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala
##
@@ -44,6 +45,12 @@ private[spark] class AppStatusStore(
 store.read(klass, klass.getName()).info
   }
 
+  def resourceProfileInfo(): Seq[v1.ResourceProfileInfo] = {
+val klass = classOf[ResourceProfileWrapper]

Review comment:
   This line is not needed.

##
File path: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala
##
@@ -44,6 +45,12 @@ private[spark] class AppStatusStore(
 store.read(klass, klass.getName()).info
   }
 
+  def resourceProfileInfo(): Seq[v1.ResourceProfileInfo] = {
+val klass = classOf[ResourceProfileWrapper]
+val it = store.view(classOf[ResourceProfileWrapper]).asScala.map(_.rpInfo)

Review comment:
   Nit: why  the `it` introduced as a `val` i mean the last line can be:
   ```scala
   store.view(classOf[ResourceProfileWrapper]).asScala.map(_.rpInfo).toSeq
   ```

##
File path: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala
##
@@ -145,6 +147,23 @@ private[spark] class AppStatusListener(
 }
   }
 
+  override def onResourceProfileAdded(event: 
SparkListenerResourceProfileAdded): Unit = {
+val liveRP = new LiveResourceProfile(event.resourceProfile.id)
+liveResourceProfiles(event.resourceProfile.id) = liveRP
+liveRP.taskResources = event.resourceProfile.taskResources
+liveRP.executorResources = event.resourceProfile.executorResources
+val maxTasks = event.resourceProfile.maxTasksPerExecutor(conf)
+liveRP.maxTasksPerExecutor = if (event.resourceProfile.isCoresLimitKnown) {
+  Some(maxTasks)
+} else {
+  None
+}
+val rpInfo = new v1.ResourceProfileInfo(liveRP.resourceProfileId,
+  liveRP.executorResources, liveRP.taskResources)
+logWarning("Resource Profile added id " + liveRP.resourceProfileId)

Review comment:
   This logging probably left here from a debugging session.

##
File path: core/src/main/scala/org/apache/spark/ui/env/EnvironmentPage.scala
##
@@ -38,6 +40,34 @@ private[ui] class EnvironmentPage(
   "Java Home" -> appEnv.runtime.javaHome,
   "Scala Version" -> appEnv.runtime.scalaVersion)
 
+def constructExecutorRequestString(ereqs: Map[String, 
ExecutorResourceRequest]): String = {

Review comment:
   Nit: missing camel case in `ereqs`. What about `execReqs`? 

##
File path: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala
##
@@ -159,10 +178,11 @@ private[spark] class AppStatusListener(
   details.getOrElse("Spark Properties", Nil),
   details.getOrElse("Hadoop Properties", Nil),
   details.getOrElse("System Properties", Nil),
-  details.getOrElse("Classpath Entries", Nil))
+  details.getOrElse("Classpath Entries", Nil),
+  Nil)
 
-coresPerTask = 
envInfo.sparkProperties.toMap.get(CPUS_PER_TASK.key).map(_.toInt)
-  .getOrElse(coresPerTask)
+defaultCoresPerTask = 
envInfo.sparkProperties.toMap.get(CPUS_PER_TASK.key).map(_.toInt)

Review comment:
   Nit: Sometimes the code talks about cores and sometimes about CPUs and 
within this line we have a conversion between the two. Is our terminology 
correct?  

##
File path: core/src/main/scala/org/apache/spark/ui/env/EnvironmentPage.scala
##
@@ -38,6 +40,34 @@ private[ui] class EnvironmentPage(
   "Java Home" -> appEnv.runtime.javaHome,
   "Scala Version" -> appEnv.runtime.scalaVersion)
 
+def constructExecutorRequestString(ereqs: Map[String, 
ExecutorResourceRequest]): String = {
+  ereqs.map {
+case (_, ereq) =>
+  val execStr = new mutable.StringBuilder()
+  execStr ++= s"\t${ereq.resourceName}: [amount: ${ereq.amount}"
+  if (ereq.discoveryScript.nonEmpty) execStr ++= s", discovery: 
${ereq.discoveryScript}"

Review comment:
   Nit: use `{` and ` }`
   https://github.com/databricks/scala-style-guide#curly
   





[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28376: [SPARK-31582] [Yarn] Being able to not populate Hadoop classpath

2020-04-28 Thread GitBox


dongjoon-hyun commented on a change in pull request #28376:
URL: https://github.com/apache/spark/pull/28376#discussion_r416733376



##
File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala
##
@@ -70,6 +70,13 @@ package object config {
 .booleanConf
 .createWithDefault(false)
 
+  private[spark] val POPULATE_HADOOP_CLASSPATH = 
ConfigBuilder("spark.yarn.populateHadoopClasspath")
+.doc("Whether to populate Hadoop classpath from 
`yarn.application.classpath` and " +

Review comment:
   +1 for the comment and the doc.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] holdenk commented on a change in pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-04-28 Thread GitBox


holdenk commented on a change in pull request #28370:
URL: https://github.com/apache/spark/pull/28370#discussion_r416733183



##
File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
##
@@ -1829,7 +1895,52 @@ private[spark] class BlockManager(
 data.dispose()
   }
 
+  /**
+   * Class to handle block manager decommissioning retries
+   * It creates a Thread to retry offloading all RDD cache blocks
+   */
+  private class BlockManagerDecommissionManager(conf: SparkConf) {
+@volatile private var stopped = false
+private val blockReplicationThread = new Thread {
+  override def run(): Unit = {
+while (blockManagerDecommissioning && !stopped) {
+  try {
+logDebug("Attempting to replicate all cached RDD blocks")
+decommissionRddCacheBlocks()
+logInfo("Attempt to replicate all cached blocks done")
+val sleepInterval = conf.get(
+  config.STORAGE_DECOMMISSION_REPLICATION_REATTEMPT_INTERVAL)
+Thread.sleep(sleepInterval)
+  } catch {
+case _: InterruptedException =>
+  // no-op
+case NonFatal(e) =>
+  logError("Error occurred while trying to " +
+"replicate cached RDD blocks for block manager 
decommissioning", e)
+  }
+}
+  }
+}
+blockReplicationThread.setDaemon(true)
+blockReplicationThread.setName("block-replication-thread")
+
+def start(): Unit = {
+  logInfo("Starting block replication thread")
+  blockReplicationThread.start()
+}
+
+def stop(): Unit = {
+  if (!stopped) {
+stopped = true
+logInfo("Stopping block replication thread")
+blockReplicationThread.interrupt()
+blockReplicationThread.join()

Review comment:
   Sure but when it is turned on (in your test case) this might keep the 
worker process from exiting when we ask it to stop?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #28359: [SPARK-31534][WEBUI][3.0] Text for tooltip should be escaped

2020-04-28 Thread GitBox


dongjoon-hyun commented on pull request #28359:
URL: https://github.com/apache/spark/pull/28359#issuecomment-620696335


   Thank you, @sarutak and @gengliangwang . Merged to branch-3.0.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #28388: [SPARK-31553][SQL] Revert "[SPARK-29048] Improve performance on Column.isInCollection() with a large size collection"

2020-04-28 Thread GitBox


dongjoon-hyun commented on pull request #28388:
URL: https://github.com/apache/spark/pull/28388#issuecomment-620692526


   +1, late LGTM. Thank you for reverting.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28390: [SPARK-27340][SS][TESTS][FOLLOW-UP] Rephrase API comments and simplify tests

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28390:
URL: https://github.com/apache/spark/pull/28390#issuecomment-620684515







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28390: [SPARK-27340][SS][TESTS][FOLLOW-UP] Rephrase API comments and simplify tests

2020-04-28 Thread GitBox


AmplabJenkins commented on pull request #28390:
URL: https://github.com/apache/spark/pull/28390#issuecomment-620684515







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #28369: [SPARK-31527][SQL][TESTS][FOLLOWUP] Add a benchmark test for datetime add/subtract interval operations

2020-04-28 Thread GitBox


cloud-fan commented on pull request #28369:
URL: https://github.com/apache/spark/pull/28369#issuecomment-620684485


   It's benchmark only so we don't need to wait for jenkins.
   
   Thanks, merging to master/3.0!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28390: [SPARK-27340][SS][TESTS][FOLLOW-UP] Rephrase API comments and simplify tests

2020-04-28 Thread GitBox


SparkQA removed a comment on pull request #28390:
URL: https://github.com/apache/spark/pull/28390#issuecomment-620499465


   **[Test build #121987 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121987/testReport)**
 for PR 28390 at commit 
[`6e98444`](https://github.com/apache/spark/commit/6e9844421a172d31bea4d28a8f5c9a8cd761c436).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28390: [SPARK-27340][SS][TESTS][FOLLOW-UP] Rephrase API comments and simplify tests

2020-04-28 Thread GitBox


SparkQA commented on pull request #28390:
URL: https://github.com/apache/spark/pull/28390#issuecomment-620683132


   **[Test build #121987 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121987/testReport)**
 for PR 28390 at commit 
[`6e98444`](https://github.com/apache/spark/commit/6e9844421a172d31bea4d28a8f5c9a8cd761c436).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28369: [SPARK-31527][SQL][TESTS][FOLLOWUP] Add a benchmark test for datetime add/subtract interval operations

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28369:
URL: https://github.com/apache/spark/pull/28369#issuecomment-620674459







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28369: [SPARK-31527][SQL][TESTS][FOLLOWUP] Add a benchmark test for datetime add/subtract interval operations

2020-04-28 Thread GitBox


AmplabJenkins commented on pull request #28369:
URL: https://github.com/apache/spark/pull/28369#issuecomment-620674459







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28369: [SPARK-31527][SQL][TESTS][FOLLOWUP] Add a benchmark test for datetime add/subtract interval operations

2020-04-28 Thread GitBox


SparkQA commented on pull request #28369:
URL: https://github.com/apache/spark/pull/28369#issuecomment-620673509


   **[Test build #122004 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122004/testReport)**
 for PR 28369 at commit 
[`aa76f81`](https://github.com/apache/spark/commit/aa76f8104af7f34b379f1e656ecece8931de4d7a).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28370:
URL: https://github.com/apache/spark/pull/28370#issuecomment-620671101


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121988/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28370:
URL: https://github.com/apache/spark/pull/28370#issuecomment-620671076


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-04-28 Thread GitBox


AmplabJenkins commented on pull request #28370:
URL: https://github.com/apache/spark/pull/28370#issuecomment-620671076







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-04-28 Thread GitBox


SparkQA removed a comment on pull request #28370:
URL: https://github.com/apache/spark/pull/28370#issuecomment-620507280


   **[Test build #121988 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121988/testReport)**
 for PR 28370 at commit 
[`5847c1c`](https://github.com/apache/spark/commit/5847c1cb655f5a376547c438347d8334e964d454).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-04-28 Thread GitBox


SparkQA commented on pull request #28370:
URL: https://github.com/apache/spark/pull/28370#issuecomment-620669591


   **[Test build #121988 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121988/testReport)**
 for PR 28370 at commit 
[`5847c1c`](https://github.com/apache/spark/commit/5847c1cb655f5a376547c438347d8334e964d454).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `case class AggregateWithHaving(`
 * `abstract class CurrentTimestampLike() extends LeafExpression with 
CodegenFallback `
 * `case class CurrentTimestamp() extends CurrentTimestampLike `
 * `case class Now() extends CurrentTimestampLike `
 * `case class DateAddInterval(`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28392: [SPARK-31594][SQL] Do not display rand/randn seed numbers in schema

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28392:
URL: https://github.com/apache/spark/pull/28392#issuecomment-620653944


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121990/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28392: [SPARK-31594][SQL] Do not display rand/randn seed numbers in schema

2020-04-28 Thread GitBox


AmplabJenkins removed a comment on pull request #28392:
URL: https://github.com/apache/spark/pull/28392#issuecomment-620653926


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28392: [SPARK-31594][SQL] Do not display rand/randn seed numbers in schema

2020-04-28 Thread GitBox


AmplabJenkins commented on pull request #28392:
URL: https://github.com/apache/spark/pull/28392#issuecomment-620653926







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28392: [SPARK-31594][SQL] Do not display rand/randn seed numbers in schema

2020-04-28 Thread GitBox


SparkQA removed a comment on pull request #28392:
URL: https://github.com/apache/spark/pull/28392#issuecomment-620518808


   **[Test build #121990 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121990/testReport)**
 for PR 28392 at commit 
[`7216511`](https://github.com/apache/spark/commit/721651119caf695689581a8f0eed83f5313a4110).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on pull request #28365: [SPARK-31571][R] Overhaul stop/message/warning calls to be more translation-friendly/canonical

2020-04-28 Thread GitBox


srowen commented on pull request #28365:
URL: https://github.com/apache/spark/pull/28365#issuecomment-620653088


   I doubt we will ever have translations. If that is the only upside to some 
of the changes, I maybe wouldn't do it. If there are other simple 
standardizations or simplifications in the error messages though, those could 
be fine.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28392: [SPARK-31594][SQL] Do not display rand/randn seed numbers in schema

2020-04-28 Thread GitBox


SparkQA commented on pull request #28392:
URL: https://github.com/apache/spark/pull/28392#issuecomment-620653293


   **[Test build #121990 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121990/testReport)**
 for PR 28392 at commit 
[`7216511`](https://github.com/apache/spark/commit/721651119caf695689581a8f0eed83f5313a4110).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on a change in pull request #28392: [SPARK-31594][SQL] Do not display rand/randn seed numbers in schema

2020-04-28 Thread GitBox


srowen commented on a change in pull request #28392:
URL: https://github.com/apache/spark/pull/28392#discussion_r416672687



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala
##
@@ -102,6 +102,8 @@ case class Rand(child: Expression) extends RDG with 
ExpressionWithRandomSeed {
   }
 
   override def freshCopy(): Rand = Rand(child)
+
+  override def sql: String = "rand()"

Review comment:
   I think maybe the current output is useful. Yes the seed was randomly 
chosen, but you might want to know what it was.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6   7   8   9   >