[GitHub] [spark] AmplabJenkins removed a comment on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-08-13 Thread GitBox


AmplabJenkins removed a comment on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-673865720







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-08-13 Thread GitBox


SparkQA commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-673865344


   **[Test build #127432 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127432/testReport)**
 for PR 28841 at commit 
[`1ee4af4`](https://github.com/apache/spark/commit/1ee4af433229baa55b3b1d3c970ef362bb2525fa).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-08-13 Thread GitBox


AmplabJenkins removed a comment on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-673864612


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127431/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-08-13 Thread GitBox


AmplabJenkins removed a comment on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-673864606


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-08-13 Thread GitBox


SparkQA removed a comment on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-673861861


   **[Test build #127431 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127431/testReport)**
 for PR 28841 at commit 
[`4329c8a`](https://github.com/apache/spark/commit/4329c8abeb64702c5b92880e11b76511087da841).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-08-13 Thread GitBox


SparkQA commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-673864599


   **[Test build #127431 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127431/testReport)**
 for PR 28841 at commit 
[`4329c8a`](https://github.com/apache/spark/commit/4329c8abeb64702c5b92880e11b76511087da841).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-08-13 Thread GitBox


AmplabJenkins commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-673864606







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-08-13 Thread GitBox


AmplabJenkins removed a comment on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-673862283







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-08-13 Thread GitBox


AmplabJenkins commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-673862283







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-08-13 Thread GitBox


SparkQA commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-673861861


   **[Test build #127431 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127431/testReport)**
 for PR 28841 at commit 
[`4329c8a`](https://github.com/apache/spark/commit/4329c8abeb64702c5b92880e11b76511087da841).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] manuzhang closed pull request #28954: [SPARK-32083][SQL] Apply CoalesceShufflePartitions when input RDD has 0 partitions with AQE

2020-08-13 Thread GitBox


manuzhang closed pull request #28954:
URL: https://github.com/apache/spark/pull/28954


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29424: [SQL][MINOR] Fixed approx_count_distinct rsd param description

2020-08-13 Thread GitBox


HyukjinKwon commented on pull request #29424:
URL: https://github.com/apache/spark/pull/29424#issuecomment-673856142


   Looks fine. Can you address the style nits @Comonut?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29422: [SPARK-32613][CORE] Fix regressions in DecommissionWorkerSuite

2020-08-13 Thread GitBox


AmplabJenkins removed a comment on pull request #29422:
URL: https://github.com/apache/spark/pull/29422#issuecomment-673855623







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29422: [SPARK-32613][CORE] Fix regressions in DecommissionWorkerSuite

2020-08-13 Thread GitBox


AmplabJenkins commented on pull request #29422:
URL: https://github.com/apache/spark/pull/29422#issuecomment-673855623







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29422: [CORE][SPARK-32613] Fix regressions in DecommissionWorkerSuite

2020-08-13 Thread GitBox


SparkQA removed a comment on pull request #29422:
URL: https://github.com/apache/spark/pull/29422#issuecomment-673773346


   **[Test build #127428 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127428/testReport)**
 for PR 29422 at commit 
[`c051532`](https://github.com/apache/spark/commit/c051532a08f067ffa77b13e12207723d4ecbe27f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29422: [CORE][SPARK-32613] Fix regressions in DecommissionWorkerSuite

2020-08-13 Thread GitBox


SparkQA commented on pull request #29422:
URL: https://github.com/apache/spark/pull/29422#issuecomment-673855161


   **[Test build #127428 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127428/testReport)**
 for PR 29422 at commit 
[`c051532`](https://github.com/apache/spark/commit/c051532a08f067ffa77b13e12207723d4ecbe27f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

2020-08-13 Thread GitBox


beliefer commented on a change in pull request #28685:
URL: https://github.com/apache/spark/pull/28685#discussion_r470379655



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala
##
@@ -137,7 +137,11 @@ trait WindowExecBase extends UnaryExecNode {
   function match {
 case AggregateExpression(f, _, _, _, _) => collect("AGGREGATE", 
frame, e, f)
 case f: AggregateWindowFunction => collect("AGGREGATE", frame, e, 
f)
-case f: OffsetWindowFunction => collect("OFFSET", frame, e, f)
+case f: OffsetWindowFunction => if (f.isWholeBased) {

Review comment:
   According to the plan, `first_value` and `last_value` need to be 
realized.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

2020-08-13 Thread GitBox


beliefer commented on a change in pull request #28685:
URL: https://github.com/apache/spark/pull/28685#discussion_r470377487



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##
@@ -474,6 +479,55 @@ case class Lag(input: Expression, offset: Expression, 
default: Expression)
   override val direction = Descending
 }
 
+/**
+ * The NthValue function returns the value of `input` at the row that is the 
`offset`th row of
+ * the window frame (counting from 1). Offsets start at 0, which is the 
current row. When the
+ * value of `input` is null at the `offset`th row or there is no such an 
`offset`th row, null
+ * is returned.
+ *
+ * @param input expression to evaluate `offset`th row of the window frame.
+ * @param offset rows to jump ahead in the partition.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(input[, offset]) - Returns the value of `input` at the row that is 
the `offset`th row
+  of the window frame (counting from 1). If the value of `input` at the 
`offset`th row is

Review comment:
   ``offset`th row of the window` ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

2020-08-13 Thread GitBox


beliefer commented on a change in pull request #28685:
URL: https://github.com/apache/spark/pull/28685#discussion_r470375984



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##
@@ -474,6 +479,55 @@ case class Lag(input: Expression, offset: Expression, 
default: Expression)
   override val direction = Descending
 }
 
+/**
+ * The NthValue function returns the value of `input` at the row that is the 
`offset`th row of
+ * the window frame (counting from 1). Offsets start at 0, which is the 
current row. When the

Review comment:
   Sorry! I made a mistake.
   `Offsets start at 1`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon edited a comment on pull request #29333: [SPARK-32357][INFRA] Publish failed and succeeded test reports in GitHub Actions

2020-08-13 Thread GitBox


HyukjinKwon edited a comment on pull request #29333:
URL: https://github.com/apache/spark/pull/29333#issuecomment-673844193


   Okay the example [looks working 
fine](https://github.com/HyukjinKwon/spark/runs/980622718). This PR should be 
ready for a review and merged. @srowen, @gengliangwang, @dongjoon-hyun, 
@dbtsai, @viirya and @maropu  can you take a look please?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29333: [SPARK-32357][INFRA] Publish failed and succeeded test reports in GitHub Actions

2020-08-13 Thread GitBox


HyukjinKwon commented on pull request #29333:
URL: https://github.com/apache/spark/pull/29333#issuecomment-673844193


   Okay the example [looks working 
fine](https://github.com/HyukjinKwon/spark/runs/980622718). This PR should be 
ready for a review and merged. cc @srowen, @gengliangwang, @dongjoon-hyun, 
@dbtsai, @viirya can you take a look please?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29428: [SPARK-32608][SQL] Script Transform ROW FORMAT DELIMIT value should format value

2020-08-13 Thread GitBox


AmplabJenkins removed a comment on pull request #29428:
URL: https://github.com/apache/spark/pull/29428#issuecomment-673842847







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29428: [SPARK-32608][SQL] Script Transform ROW FORMAT DELIMIT value should format value

2020-08-13 Thread GitBox


AmplabJenkins commented on pull request #29428:
URL: https://github.com/apache/spark/pull/29428#issuecomment-673842847







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29428: [SPARK-32608][SQL] Script Transform ROW FORMAT DELIMIT value should format value

2020-08-13 Thread GitBox


SparkQA commented on pull request #29428:
URL: https://github.com/apache/spark/pull/29428#issuecomment-673842527


   **[Test build #127430 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127430/testReport)**
 for PR 29428 at commit 
[`b4d816e`](https://github.com/apache/spark/commit/b4d816e26766923a40c42d2b3ae4356802b16886).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu opened a new pull request #29428: [SPARK-32608][SQL] Script Transform ROW FORMAT DELIMIT value should format value

2020-08-13 Thread GitBox


AngersZh opened a new pull request #29428:
URL: https://github.com/apache/spark/pull/29428


   ### What changes were proposed in this pull request?
   For SQL
   ```
   SELECT TRANSFORM(a, b, c)
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY ','
 LINES TERMINATED BY '\n'
 NULL DEFINED AS 'null'
 USING 'cat' AS (a, b, c)
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY ','
 LINES TERMINATED BY '\n'
 NULL DEFINED AS 'NULL'
   FROM testData
   ```
   The correct 
   
   TOK_TABLEROWFORMATFIELD should be `, `nut actually ` ','`
   
   TOK_TABLEROWFORMATLINES should be `\n`  but actually` '\n'`
   
   
   ### Why are the changes needed?
   Fix string value format
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Added UT



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #29428: [SPARK-32608][SQL] Script Transform ROW FORMAT DELIMIT value should format value

2020-08-13 Thread GitBox


AngersZh commented on pull request #29428:
URL: https://github.com/apache/spark/pull/29428#issuecomment-673841449


   FYI @maropu 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression

2020-08-13 Thread GitBox


AmplabJenkins removed a comment on pull request #29270:
URL: https://github.com/apache/spark/pull/29270#issuecomment-673840860







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression

2020-08-13 Thread GitBox


AmplabJenkins commented on pull request #29270:
URL: https://github.com/apache/spark/pull/29270#issuecomment-673840860







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression

2020-08-13 Thread GitBox


SparkQA commented on pull request #29270:
URL: https://github.com/apache/spark/pull/29270#issuecomment-673840529


   **[Test build #127429 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127429/testReport)**
 for PR 29270 at commit 
[`891346e`](https://github.com/apache/spark/commit/891346e6b541cc181f1aa5213d0540330bdf99ec).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression

2020-08-13 Thread GitBox


Ngone51 commented on pull request #29270:
URL: https://github.com/apache/spark/pull/29270#issuecomment-673840363


   retest this please.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression

2020-08-13 Thread GitBox


Ngone51 commented on a change in pull request #29270:
URL: https://github.com/apache/spark/pull/29270#discussion_r470368611



##
File path: sql/core/src/test/scala/org/apache/spark/sql/TPCDSBase.scala
##
@@ -298,12 +302,22 @@ trait TPCDSBase extends SharedSparkSession {
 tableNames.foreach { tableName =>
   createTable(spark, tableName)
   if (injectStats) {
-// To simulate plan generation on actual TPCDS data, injects data 
stats here
+// To simulate plan generation on actual TPC-DS data, injects data 
stats here
 spark.sessionState.catalog.alterTableStats(
   TableIdentifier(tableName), 
Some(TPCDSTableStats.sf100TableStats(tableName)))
   }
 }
   }
 
+  override def afterAll(): Unit = {
+conf.setConf(SQLConf.CBO_ENABLED, originalCBCEnabled)
+conf.setConf(SQLConf.PLAN_STATS_ENABLED, originalPlanStatsEnabled)
+conf.setConf(SQLConf.JOIN_REORDER_ENABLED, originalJoinReorderEnabled)
+tableNames.foreach { tableName =>
+  spark.sessionState.catalog.alterTableStats(TableIdentifier(tableName), 
None)

Review comment:
   Yes sure.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression

2020-08-13 Thread GitBox


Ngone51 commented on a change in pull request #29270:
URL: https://github.com/apache/spark/pull/29270#discussion_r470368540



##
File path: sql/core/src/test/scala/org/apache/spark/sql/PlanStabilitySuite.scala
##
@@ -0,0 +1,335 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.io.File
+import java.nio.charset.StandardCharsets
+
+import scala.collection.mutable
+
+import org.apache.commons.io.FileUtils
+
+import org.apache.spark.sql.catalyst.expressions.AttributeSet
+import org.apache.spark.sql.catalyst.util._
+import org.apache.spark.sql.execution._
+import org.apache.spark.sql.execution.adaptive.DisableAdaptiveExecutionSuite
+import org.apache.spark.sql.execution.exchange.{Exchange, ReusedExchangeExec}
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.tags.ExtendedSQLTest
+
+// scalastyle:off line.size.limit
+/**
+ * Check that TPC-DS SparkPlans don't change.
+ * If there are plan differences, the error message looks like this:
+ *   Plans did not match:
+ *   last approved simplified plan: 
/path/to/tpcds-plan-stability/approved-plans-xxx/q1/simplified.txt
+ *   last approved explain plan: 
/path/to/tpcds-plan-stability/approved-plans-xxx/q1/explain.txt
+ *   [last approved simplified plan]
+ *
+ *   actual simplified plan: /path/to/tmp/q1.actual.simplified.txt
+ *   actual explain plan: /path/to/tmp/q1.actual.explain.txt
+ *   [actual simplified plan]
+ *
+ * The explain files are saved to help debug later, they are not checked. Only 
the simplified
+ * plans are checked (by string comparison).
+ *
+ *
+ * To run the entire test suite:
+ * {{{
+ *   build/sbt "sql/test-only *PlanStability[WithStats]Suite"
+ * }}}
+ *
+ * To run a single test file upon change:
+ * {{{
+ *   build/sbt "sql/test-only *PlanStability[WithStats]Suite -- -z 
(tpcds-v1.4/q49)"
+ * }}}
+ *
+ * To re-generate golden files for entire suite, run:
+ * {{{
+ *   SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only 
*PlanStability[WithStats]Suite"
+ * }}}
+ *
+ * To re-generate golden file for a single test, run:
+ * {{{
+ *   SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only 
*PlanStability[WithStats]Suite -- -z (tpcds-v1.4/q49)"
+ * }}}
+ */
+// scalastyle:on line.size.limit
+trait PlanStabilitySuite extends TPCDSBase with DisableAdaptiveExecutionSuite {
+
+  private val originalMaxToStringFields = conf.maxToStringFields
+
+  override def beforeAll(): Unit = {
+conf.setConf(SQLConf.MAX_TO_STRING_FIELDS, Int.MaxValue)
+super.beforeAll()
+  }
+
+  override def afterAll(): Unit = {
+super.afterAll()
+conf.setConf(SQLConf.MAX_TO_STRING_FIELDS, originalMaxToStringFields)
+  }
+
+  private val regenerateGoldenFiles: Boolean = 
System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1"
+
+  protected val baseResourcePath = {
+// use the same way as `SQLQueryTestSuite` to get the resource path
+java.nio.file.Paths.get("src", "test", "resources", 
"tpcds-plan-stability").toFile
+  }
+
+  def goldenFilePath: String
+
+  private def getDirForTest(name: String): File = {
+new File(goldenFilePath, name)
+  }
+
+  private def isApproved(dir: File, actualSimplifiedPlan: String): Boolean = {
+val file = new File(dir, "simplified.txt")
+val approved = FileUtils.readFileToString(file, StandardCharsets.UTF_8)
+approved == actualSimplifiedPlan
+  }
+
+  /**
+   * Serialize and save this SparkPlan.
+   * The resulting file is used by [[checkWithApproved]] to check stability.
+   *
+   * @param planthe SparkPlan
+   * @param namethe name of the query
+   * @param explain the full explain output; this is saved to help debug later 
as the simplified
+   *plan is not too useful for debugging
+   */
+  private def generateApprovedPlanFile(plan: SparkPlan, name: String, explain: 
String): Unit = {
+val dir = getDirForTest(name)
+val simplified = getSimplifiedPlan(plan)
+val foundMatch = dir.exists() && isApproved(dir, simplified)
+
+if (!foundMatch) {
+  FileUtils.deleteDirectory(dir)
+  assert(dir.mkdirs())
+
+  val file = new File(dir, "simplified.txt")
+  FileUtils.writeStringToFile(file, simplified, StandardCharsets.UTF_

[GitHub] [spark] Ngone51 commented on a change in pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression

2020-08-13 Thread GitBox


Ngone51 commented on a change in pull request #29270:
URL: https://github.com/apache/spark/pull/29270#discussion_r470368356



##
File path: sql/core/src/test/scala/org/apache/spark/sql/PlanStabilitySuite.scala
##
@@ -0,0 +1,335 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.io.File
+import java.nio.charset.StandardCharsets
+
+import scala.collection.mutable
+
+import org.apache.commons.io.FileUtils
+
+import org.apache.spark.sql.catalyst.expressions.AttributeSet
+import org.apache.spark.sql.catalyst.util._
+import org.apache.spark.sql.execution._
+import org.apache.spark.sql.execution.adaptive.DisableAdaptiveExecutionSuite
+import org.apache.spark.sql.execution.exchange.{Exchange, ReusedExchangeExec}
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.tags.ExtendedSQLTest
+
+// scalastyle:off line.size.limit
+/**
+ * Check that TPC-DS SparkPlans don't change.
+ * If there are plan differences, the error message looks like this:
+ *   Plans did not match:
+ *   last approved simplified plan: 
/path/to/tpcds-plan-stability/approved-plans-xxx/q1/simplified.txt
+ *   last approved explain plan: 
/path/to/tpcds-plan-stability/approved-plans-xxx/q1/explain.txt
+ *   [last approved simplified plan]
+ *
+ *   actual simplified plan: /path/to/tmp/q1.actual.simplified.txt
+ *   actual explain plan: /path/to/tmp/q1.actual.explain.txt
+ *   [actual simplified plan]
+ *
+ * The explain files are saved to help debug later, they are not checked. Only 
the simplified
+ * plans are checked (by string comparison).
+ *
+ *
+ * To run the entire test suite:
+ * {{{
+ *   build/sbt "sql/test-only *PlanStability[WithStats]Suite"
+ * }}}
+ *
+ * To run a single test file upon change:
+ * {{{
+ *   build/sbt "sql/test-only *PlanStability[WithStats]Suite -- -z 
(tpcds-v1.4/q49)"
+ * }}}
+ *
+ * To re-generate golden files for entire suite, run:
+ * {{{
+ *   SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only 
*PlanStability[WithStats]Suite"
+ * }}}
+ *
+ * To re-generate golden file for a single test, run:
+ * {{{
+ *   SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only 
*PlanStability[WithStats]Suite -- -z (tpcds-v1.4/q49)"
+ * }}}
+ */
+// scalastyle:on line.size.limit
+trait PlanStabilitySuite extends TPCDSBase with DisableAdaptiveExecutionSuite {
+
+  private val originalMaxToStringFields = conf.maxToStringFields
+
+  override def beforeAll(): Unit = {
+conf.setConf(SQLConf.MAX_TO_STRING_FIELDS, Int.MaxValue)
+super.beforeAll()
+  }
+
+  override def afterAll(): Unit = {
+super.afterAll()
+conf.setConf(SQLConf.MAX_TO_STRING_FIELDS, originalMaxToStringFields)
+  }
+
+  private val regenerateGoldenFiles: Boolean = 
System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1"
+
+  protected val baseResourcePath = {
+// use the same way as `SQLQueryTestSuite` to get the resource path
+java.nio.file.Paths.get("src", "test", "resources", 
"tpcds-plan-stability").toFile
+  }
+
+  def goldenFilePath: String
+
+  private def getDirForTest(name: String): File = {
+new File(goldenFilePath, name)
+  }
+
+  private def isApproved(dir: File, actualSimplifiedPlan: String): Boolean = {
+val file = new File(dir, "simplified.txt")
+val approved = FileUtils.readFileToString(file, StandardCharsets.UTF_8)
+approved == actualSimplifiedPlan
+  }
+
+  /**
+   * Serialize and save this SparkPlan.
+   * The resulting file is used by [[checkWithApproved]] to check stability.
+   *
+   * @param planthe SparkPlan
+   * @param namethe name of the query
+   * @param explain the full explain output; this is saved to help debug later 
as the simplified
+   *plan is not too useful for debugging
+   */
+  private def generateApprovedPlanFile(plan: SparkPlan, name: String, explain: 
String): Unit = {
+val dir = getDirForTest(name)
+val simplified = getSimplifiedPlan(plan)
+val foundMatch = dir.exists() && isApproved(dir, simplified)
+
+if (!foundMatch) {
+  FileUtils.deleteDirectory(dir)
+  assert(dir.mkdirs())
+
+  val file = new File(dir, "simplified.txt")
+  FileUtils.writeStringToFile(file, simplified, StandardCharsets.UTF_

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28939: [SPARK-32119][CORE] ExecutorPlugin doesn't work with Standalone Cluster and Kubernetes with --jars

2020-08-13 Thread GitBox


AmplabJenkins removed a comment on pull request #28939:
URL: https://github.com/apache/spark/pull/28939#issuecomment-673839896







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28939: [SPARK-32119][CORE] ExecutorPlugin doesn't work with Standalone Cluster and Kubernetes with --jars

2020-08-13 Thread GitBox


AmplabJenkins commented on pull request #28939:
URL: https://github.com/apache/spark/pull/28939#issuecomment-673839896







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28939: [SPARK-32119][CORE] ExecutorPlugin doesn't work with Standalone Cluster and Kubernetes with --jars

2020-08-13 Thread GitBox


SparkQA commented on pull request #28939:
URL: https://github.com/apache/spark/pull/28939#issuecomment-673838923


   **[Test build #127425 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127425/testReport)**
 for PR 28939 at commit 
[`5d65caf`](https://github.com/apache/spark/commit/5d65caf55c7b87fc0035444b60847e4037ad0f40).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28939: [SPARK-32119][CORE] ExecutorPlugin doesn't work with Standalone Cluster and Kubernetes with --jars

2020-08-13 Thread GitBox


SparkQA removed a comment on pull request #28939:
URL: https://github.com/apache/spark/pull/28939#issuecomment-673754055


   **[Test build #127425 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127425/testReport)**
 for PR 28939 at commit 
[`5d65caf`](https://github.com/apache/spark/commit/5d65caf55c7b87fc0035444b60847e4037ad0f40).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm edited a comment on pull request #28618: [SPARK-31801][API][SHUFFLE] Register map output metadata

2020-08-13 Thread GitBox


mridulm edited a comment on pull request #28618:
URL: https://github.com/apache/spark/pull/28618#issuecomment-673827204


   @mccheah I will take a look at this later this week/early next week.
   +CC @squito, @holdenk who reviewed the design doc.
   Thanks for working on this !



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on pull request #28618: [SPARK-31801][API][SHUFFLE] Register map output metadata

2020-08-13 Thread GitBox


mridulm commented on pull request #28618:
URL: https://github.com/apache/spark/pull/28618#issuecomment-673827204


   @mccheah I will take a look at this later this week/early next week.
   +CC @squito, @holdenk who reviewed the design doc.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29426: [SPARK-32610][DOCS] Fix the link to metrics.dropwizard.io in monitoring.md to refer the proper version

2020-08-13 Thread GitBox


AmplabJenkins removed a comment on pull request #29426:
URL: https://github.com/apache/spark/pull/29426#issuecomment-673824399







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29426: [SPARK-32610][DOCS] Fix the link to metrics.dropwizard.io in monitoring.md to refer the proper version

2020-08-13 Thread GitBox


AmplabJenkins commented on pull request #29426:
URL: https://github.com/apache/spark/pull/29426#issuecomment-673824399







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on pull request #28939: [SPARK-32119][CORE] ExecutorPlugin doesn't work with Standalone Cluster and Kubernetes with --jars

2020-08-13 Thread GitBox


mridulm commented on pull request #28939:
URL: https://github.com/apache/spark/pull/28939#issuecomment-673823982


   This looks good to me - once Tom's comment is addressed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29422: [CORE][SPARK-32613] Fix regressions in DecommissionWorkerSuite

2020-08-13 Thread GitBox


AmplabJenkins removed a comment on pull request #29422:
URL: https://github.com/apache/spark/pull/29422#issuecomment-673822318







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29426: [SPARK-32610][DOCS] Fix the link to metrics.dropwizard.io in monitoring.md to refer the proper version

2020-08-13 Thread GitBox


SparkQA commented on pull request #29426:
URL: https://github.com/apache/spark/pull/29426#issuecomment-673821660


   **[Test build #127423 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127423/testReport)**
 for PR 29426 at commit 
[`ddbf11f`](https://github.com/apache/spark/commit/ddbf11f5f4073ed378dd51654c1b085afb00128e).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29426: [SPARK-32610][DOCS] Fix the link to metrics.dropwizard.io in monitoring.md to refer the proper version

2020-08-13 Thread GitBox


SparkQA removed a comment on pull request #29426:
URL: https://github.com/apache/spark/pull/29426#issuecomment-673751893


   **[Test build #127423 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127423/testReport)**
 for PR 29426 at commit 
[`ddbf11f`](https://github.com/apache/spark/commit/ddbf11f5f4073ed378dd51654c1b085afb00128e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29422: [CORE][SPARK-32613] Fix regressions in DecommissionWorkerSuite

2020-08-13 Thread GitBox


AmplabJenkins commented on pull request #29422:
URL: https://github.com/apache/spark/pull/29422#issuecomment-673822318







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29422: [CORE][SPARK-32613] Fix regressions in DecommissionWorkerSuite

2020-08-13 Thread GitBox


SparkQA removed a comment on pull request #29422:
URL: https://github.com/apache/spark/pull/29422#issuecomment-673751916


   **[Test build #127424 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127424/testReport)**
 for PR 29422 at commit 
[`6334f80`](https://github.com/apache/spark/commit/6334f80e5d593d7e8a29bcd9598ec5e19d756162).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29422: [CORE][SPARK-32613] Fix regressions in DecommissionWorkerSuite

2020-08-13 Thread GitBox


SparkQA commented on pull request #29422:
URL: https://github.com/apache/spark/pull/29422#issuecomment-673819205


   **[Test build #127424 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127424/testReport)**
 for PR 29422 at commit 
[`6334f80`](https://github.com/apache/spark/commit/6334f80e5d593d7e8a29bcd9598ec5e19d756162).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #28490: [SPARK-31670][SQL]Resolve Struct Field in Grouping Aggregate with same ExprId

2020-08-13 Thread GitBox


AngersZh commented on a change in pull request #28490:
URL: https://github.com/apache/spark/pull/28490#discussion_r470362613



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -1479,6 +1479,33 @@ class Analyzer(
   // Skip the having clause here, this will be handled in 
ResolveAggregateFunctions.
   case h: UnresolvedHaving => h
 
+  case agg @ (_: Aggregate | _: GroupingSets) =>
+val resolved = agg.mapExpressions(resolveExpressionTopDown(_, agg))
+val hasStructField = resolved.expressions.exists {
+  _.collectFirst { case gsf: GetStructField => gsf }.isDefined
+}
+if (hasStructField) {
+  // For struct field, it will be resolve as Alias(GetStructField, 
name),
+  // In Aggregate/GroupingSets this behavior will cause same struct 
field
+  // in aggExprs/groupExprs/selectedGroupByExprs will be resolved 
divided
+  // with different ExprId of Alias and replace failed when construct
+  // Aggregate in ResolveGroupingAnalytics, so we resolve duplicated 
struct
+  // field here with same ExprId

Review comment:
   > I don't get it. `CleanupAliases` will remove aliases from the grouping 
expressions. Why do we hit the bug?
   
   This error happen when `ResolveGroupingAnalytics` construct Grouping 
Analytics Aggregation,
   When expand expression, match error because of different ExprID





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29422: [CORE][SPARK-32613] Fix regressions in DecommissionWorkerSuite

2020-08-13 Thread GitBox


AmplabJenkins commented on pull request #29422:
URL: https://github.com/apache/spark/pull/29422#issuecomment-673796082







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29422: [CORE][SPARK-32613] Fix regressions in DecommissionWorkerSuite

2020-08-13 Thread GitBox


AmplabJenkins removed a comment on pull request #29422:
URL: https://github.com/apache/spark/pull/29422#issuecomment-673796082







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29422: [CORE][SPARK-32613] Fix regressions in DecommissionWorkerSuite

2020-08-13 Thread GitBox


SparkQA commented on pull request #29422:
URL: https://github.com/apache/spark/pull/29422#issuecomment-673793268


   **[Test build #127422 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127422/testReport)**
 for PR 29422 at commit 
[`b890443`](https://github.com/apache/spark/commit/b8904432aee47f69a21a2fc864f9794ef60c1101).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29422: [CORE][SPARK-32613] Fix regressions in DecommissionWorkerSuite

2020-08-13 Thread GitBox


SparkQA removed a comment on pull request #29422:
URL: https://github.com/apache/spark/pull/29422#issuecomment-673749927


   **[Test build #127422 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127422/testReport)**
 for PR 29422 at commit 
[`b890443`](https://github.com/apache/spark/commit/b8904432aee47f69a21a2fc864f9794ef60c1101).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] agrawaldevesh commented on pull request #29422: [CORE][SPARK-32613] Fix regressions in DecommissionWorkerSuite

2020-08-13 Thread GitBox


agrawaldevesh commented on pull request #29422:
URL: https://github.com/apache/spark/pull/29422#issuecomment-673779207


   cc: @attilapiros @holdenk @HyukjinKwon



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] closed pull request #28197: [SPARK-31431][SQL] Add CalendarInterval encoder support

2020-08-13 Thread GitBox


github-actions[bot] closed pull request #28197:
URL: https://github.com/apache/spark/pull/28197


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join

2020-08-13 Thread GitBox


AmplabJenkins commented on pull request #29342:
URL: https://github.com/apache/spark/pull/29342#issuecomment-673774426







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join

2020-08-13 Thread GitBox


AmplabJenkins removed a comment on pull request #29342:
URL: https://github.com/apache/spark/pull/29342#issuecomment-673774426







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29422: [CORE][SPARK-32613] Fix regressions in DecommissionWorkerSuite

2020-08-13 Thread GitBox


AmplabJenkins removed a comment on pull request #29422:
URL: https://github.com/apache/spark/pull/29422#issuecomment-673773824







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join

2020-08-13 Thread GitBox


SparkQA removed a comment on pull request #29342:
URL: https://github.com/apache/spark/pull/29342#issuecomment-673733876


   **[Test build #127421 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127421/testReport)**
 for PR 29342 at commit 
[`82ed0b4`](https://github.com/apache/spark/commit/82ed0b412e9a0bdb565fda07921805b92b631f0f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join

2020-08-13 Thread GitBox


SparkQA commented on pull request #29342:
URL: https://github.com/apache/spark/pull/29342#issuecomment-673773892


   **[Test build #127421 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127421/testReport)**
 for PR 29342 at commit 
[`82ed0b4`](https://github.com/apache/spark/commit/82ed0b412e9a0bdb565fda07921805b92b631f0f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29422: [CORE][SPARK-32613] Fix regressions in DecommissionWorkerSuite

2020-08-13 Thread GitBox


AmplabJenkins commented on pull request #29422:
URL: https://github.com/apache/spark/pull/29422#issuecomment-673773824







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-13 Thread GitBox


maropu commented on pull request #28804:
URL: https://github.com/apache/spark/pull/28804#issuecomment-673773652


   > @maropu The stats(specifically number of records from aggregation map 
after a threshold) that we are looking for is available only at the operator 
level at runtime.
   
   I pointed out not the current approach, but the previous one: 
https://github.com/apache/spark/pull/28804/files#r446720097



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29422: [CORE][SPARK-32613] Fix regressions in DecommissionWorkerSuite

2020-08-13 Thread GitBox


SparkQA commented on pull request #29422:
URL: https://github.com/apache/spark/pull/29422#issuecomment-673773346


   **[Test build #127428 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127428/testReport)**
 for PR 29422 at commit 
[`c051532`](https://github.com/apache/spark/commit/c051532a08f067ffa77b13e12207723d4ecbe27f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28939: [SPARK-32119][CORE] ExecutorPlugin doesn't work with Standalone Cluster and Kubernetes with --jars

2020-08-13 Thread GitBox


AmplabJenkins removed a comment on pull request #28939:
URL: https://github.com/apache/spark/pull/28939#issuecomment-673770514


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/32046/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28939: [SPARK-32119][CORE] ExecutorPlugin doesn't work with Standalone Cluster and Kubernetes with --jars

2020-08-13 Thread GitBox


AmplabJenkins removed a comment on pull request #28939:
URL: https://github.com/apache/spark/pull/28939#issuecomment-673770509


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28939: [SPARK-32119][CORE] ExecutorPlugin doesn't work with Standalone Cluster and Kubernetes with --jars

2020-08-13 Thread GitBox


AmplabJenkins commented on pull request #28939:
URL: https://github.com/apache/spark/pull/28939#issuecomment-673770509







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28939: [SPARK-32119][CORE] ExecutorPlugin doesn't work with Standalone Cluster and Kubernetes with --jars

2020-08-13 Thread GitBox


SparkQA commented on pull request #28939:
URL: https://github.com/apache/spark/pull/28939#issuecomment-673770501


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/32046/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29423: [SPARK-20680][SQL][FOLLOW-UP] Add HiveVoidType in HiveClientImpl

2020-08-13 Thread GitBox


AmplabJenkins removed a comment on pull request #29423:
URL: https://github.com/apache/spark/pull/29423#issuecomment-673767918







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29423: [SPARK-20680][SQL][FOLLOW-UP] Add HiveVoidType in HiveClientImpl

2020-08-13 Thread GitBox


AmplabJenkins commented on pull request #29423:
URL: https://github.com/apache/spark/pull/29423#issuecomment-673767918







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29423: [SPARK-20680][SQL][FOLLOW-UP] Add HiveVoidType in HiveClientImpl

2020-08-13 Thread GitBox


SparkQA commented on pull request #29423:
URL: https://github.com/apache/spark/pull/29423#issuecomment-673767434


   **[Test build #127427 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127427/testReport)**
 for PR 29423 at commit 
[`57d8fd8`](https://github.com/apache/spark/commit/57d8fd86c93caf34d1586175f96df173a6239946).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #29414: [SPARK-32106][SQL] Implement script transform in sql/core

2020-08-13 Thread GitBox


maropu commented on a change in pull request #29414:
URL: https://github.com/apache/spark/pull/29414#discussion_r470300313



##
File path: sql/core/src/test/resources/sql-tests/results/transform.sql.out
##
@@ -0,0 +1,224 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 15
+
+
+-- !query
+CREATE OR REPLACE TEMPORARY VIEW t AS SELECT * FROM VALUES
+('1', true, unhex('537061726B2053514C'), tinyint(1), 1, smallint(100), 
bigint(1), float(1.0), 1.0, Decimal(1.0), timestamp('1997-01-02'), 
date('2000-04-01')),
+('2', false, unhex('537061726B2053514C'), tinyint(2), 2,  smallint(200), 
bigint(2), float(2.0), 2.0, Decimal(2.0), timestamp('1997-01-02 03:04:05'), 
date('2000-04-02')),
+('3', true, unhex('537061726B2053514C'), tinyint(3), 3, smallint(300), 
bigint(3), float(3.0), 3.0, Decimal(3.0), timestamp('1997-02-10 17:32:01-08'), 
date('2000-04-03'))
+AS t(a, b, c, d, e, f, g, h, i, j, k, l)
+-- !query schema
+struct<>
+-- !query output
+
+
+
+-- !query
+SELECT TRANSFORM(a)
+USING 'cat' AS (a)
+FROM t
+-- !query schema
+struct
+-- !query output
+1
+2
+3
+
+
+-- !query
+SELECT TRANSFORM(a)
+USING 'some_non_existent_command' AS (a)
+FROM t
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.SparkException
+Subprocess exited with status 127. Error: /bin/bash: 
some_non_existent_command: command not found
+
+
+-- !query
+SELECT TRANSFORM(a)
+USING 'python some_non_existent_file' AS (a)
+FROM t
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.SparkException
+Subprocess exited with status 2. Error: python: can't open file 
'some_non_existent_file': [Errno 2] No such file or directory
+
+
+-- !query
+SELECT a, b, decode(c, 'UTF-8'), d, e, f, g, h, i, j, k, l FROM (
+  SELECT TRANSFORM(a, b, c, d, e, f, g, h, i, j, k, l)
+  USING 'cat' AS (
+a string,
+b boolean,
+c binary,
+d tinyint,
+e int,
+f smallint,
+g long,
+h float,
+i double,
+j decimal(38, 18),
+k timestamp,
+l date)
+  FROM t
+) tmp
+-- !query schema
+struct
+-- !query output
+1  trueSpark SQL   1   1   100 1   1.0 1.0 
1.001997-01-02 00:00:00 2000-04-01
+2  false   Spark SQL   2   2   200 2   2.0 2.0 
2.001997-01-02 03:04:05 2000-04-02
+3  trueSpark SQL   3   3   300 3   3.0 3.0 
3.001997-02-10 17:32:01 2000-04-03
+
+
+-- !query
+SELECT a, b, decode(c, 'UTF-8'), d, e, f, g, h, i, j, k, l FROM (
+  SELECT TRANSFORM(a, b, c, d, e, f, g, h, i, j, k, l)
+  USING 'cat' AS (
+a string,
+b string,
+c string,
+d string,
+e string,
+f string,
+g string,
+h string,
+i string,
+j string,
+k string,
+l string)
+  FROM t
+) tmp
+-- !query schema
+struct
+-- !query output
+1  trueSpark SQL   1   1   100 1   1.0 1.0 
1   1997-01-02 00:00:00 2000-04-01
+2  false   Spark SQL   2   2   200 2   2.0 2.0 
2   1997-01-02 03:04:05 2000-04-02
+3  trueSpark SQL   3   3   300 3   3.0 3.0 
3   1997-02-10 17:32:01 2000-04-03
+
+
+-- !query
+SELECT TRANSFORM(a)
+USING 'cat'
+FROM t
+-- !query schema
+struct<>
+-- !query output
+java.lang.ArrayIndexOutOfBoundsException
+1
+
+
+-- !query
+SELECT TRANSFORM(a, b)
+USING 'cat'
+FROM t
+-- !query schema
+struct
+-- !query output
+1  true
+2  false
+3  true
+
+
+-- !query
+SELECT TRANSFORM(a, b, c)
+USING 'cat'
+FROM t
+-- !query schema
+struct
+-- !query output
+1  true
+2  false
+3  true
+
+
+-- !query
+SELECT TRANSFORM(a, b, c, d, e, f, g, h, i)
+USING 'cat' AS (a int, b short, c long, d byte, e float, f double, g 
decimal(38, 18), h date, i timestamp)
+FROM VALUES
+('a','','1231a','a','213.21a','213.21a','0a.21d','2000-04-01123','1997-0102 
00:00:') tmp(a, b, c, d, e, f, g, h, i)
+-- !query schema
+struct
+-- !query output
+NULL   NULLNULLNULLNULLNULLNULLNULLNULL
+
+
+-- !query
+SELECT TRANSFORM(b, max(a), sum(f))
+USING 'cat' AS (a, b)
+FROM t
+GROUP BY b
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.catalyst.parser.ParseException
+
+mismatched input 'GROUP' expecting {, ';'}(line 4, pos 0)
+
+== SQL ==
+SELECT TRANSFORM(b, max(a), sum(f))
+USING 'cat' AS (a, b)
+FROM t
+GROUP BY b
+^^^
+
+
+-- !query
+MAP a, b USING 'cat' AS (a, b) FROM t
+-- !query schema
+struct
+-- !query output
+1  true
+2  false
+3  true
+
+
+-- !query
+REDUCE a, b USING 'cat' AS (a, b) FROM t
+-- !query schema
+struct
+-- !query output
+1  true
+2  false
+3  true
+
+
+-- !query
+SELECT TRANSFORM(a, b, c, null)
+  ROW FORMAT DELIMITED
+  FIELDS TERMINATED BY '|'
+  LINES TERMINATED BY '\n'
+  NULL DEFINED AS 'NULL'
+USING 'cat' AS (a, b, c, d)

Review comment:
   Oh, I see...




---

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29427: [SPARK-25557][SQL][TEST][Followup] Add case-sensitivity test for ORC predicate pushdown

2020-08-13 Thread GitBox


AmplabJenkins removed a comment on pull request #29427:
URL: https://github.com/apache/spark/pull/29427#issuecomment-673765512







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29427: [SPARK-25557][SQL][TEST][Followup] Add case-sensitivity test for ORC predicate pushdown

2020-08-13 Thread GitBox


AmplabJenkins commented on pull request #29427:
URL: https://github.com/apache/spark/pull/29427#issuecomment-673765512







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29427: [SPARK-25557][SQL][TEST][Followup] Add case-sensitivity test for ORC predicate pushdown

2020-08-13 Thread GitBox


SparkQA removed a comment on pull request #29427:
URL: https://github.com/apache/spark/pull/29427#issuecomment-673673773


   **[Test build #127419 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127419/testReport)**
 for PR 29427 at commit 
[`d89eac6`](https://github.com/apache/spark/commit/d89eac65984836d64c7c456f197f350fd996e38a).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29427: [SPARK-25557][SQL][TEST][Followup] Add case-sensitivity test for ORC predicate pushdown

2020-08-13 Thread GitBox


SparkQA commented on pull request #29427:
URL: https://github.com/apache/spark/pull/29427#issuecomment-673765011


   **[Test build #127419 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127419/testReport)**
 for PR 29427 at commit 
[`d89eac6`](https://github.com/apache/spark/commit/d89eac65984836d64c7c456f197f350fd996e38a).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression

2020-08-13 Thread GitBox


maropu commented on a change in pull request #29270:
URL: https://github.com/apache/spark/pull/29270#discussion_r470303778



##
File path: sql/core/src/test/scala/org/apache/spark/sql/PlanStabilitySuite.scala
##
@@ -0,0 +1,335 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.io.File
+import java.nio.charset.StandardCharsets
+
+import scala.collection.mutable
+
+import org.apache.commons.io.FileUtils
+
+import org.apache.spark.sql.catalyst.expressions.AttributeSet
+import org.apache.spark.sql.catalyst.util._
+import org.apache.spark.sql.execution._
+import org.apache.spark.sql.execution.adaptive.DisableAdaptiveExecutionSuite
+import org.apache.spark.sql.execution.exchange.{Exchange, ReusedExchangeExec}
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.tags.ExtendedSQLTest
+
+// scalastyle:off line.size.limit
+/**
+ * Check that TPC-DS SparkPlans don't change.
+ * If there are plan differences, the error message looks like this:
+ *   Plans did not match:
+ *   last approved simplified plan: 
/path/to/tpcds-plan-stability/approved-plans-xxx/q1/simplified.txt
+ *   last approved explain plan: 
/path/to/tpcds-plan-stability/approved-plans-xxx/q1/explain.txt
+ *   [last approved simplified plan]
+ *
+ *   actual simplified plan: /path/to/tmp/q1.actual.simplified.txt
+ *   actual explain plan: /path/to/tmp/q1.actual.explain.txt
+ *   [actual simplified plan]
+ *
+ * The explain files are saved to help debug later, they are not checked. Only 
the simplified
+ * plans are checked (by string comparison).
+ *
+ *
+ * To run the entire test suite:
+ * {{{
+ *   build/sbt "sql/test-only *PlanStability[WithStats]Suite"
+ * }}}
+ *
+ * To run a single test file upon change:
+ * {{{
+ *   build/sbt "sql/test-only *PlanStability[WithStats]Suite -- -z 
(tpcds-v1.4/q49)"
+ * }}}
+ *
+ * To re-generate golden files for entire suite, run:
+ * {{{
+ *   SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only 
*PlanStability[WithStats]Suite"
+ * }}}
+ *
+ * To re-generate golden file for a single test, run:
+ * {{{
+ *   SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only 
*PlanStability[WithStats]Suite -- -z (tpcds-v1.4/q49)"
+ * }}}
+ */
+// scalastyle:on line.size.limit
+trait PlanStabilitySuite extends TPCDSBase with DisableAdaptiveExecutionSuite {
+
+  private val originalMaxToStringFields = conf.maxToStringFields
+
+  override def beforeAll(): Unit = {
+conf.setConf(SQLConf.MAX_TO_STRING_FIELDS, Int.MaxValue)
+super.beforeAll()
+  }
+
+  override def afterAll(): Unit = {
+super.afterAll()
+conf.setConf(SQLConf.MAX_TO_STRING_FIELDS, originalMaxToStringFields)
+  }
+
+  private val regenerateGoldenFiles: Boolean = 
System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1"
+
+  protected val baseResourcePath = {
+// use the same way as `SQLQueryTestSuite` to get the resource path
+java.nio.file.Paths.get("src", "test", "resources", 
"tpcds-plan-stability").toFile
+  }
+
+  def goldenFilePath: String
+
+  private def getDirForTest(name: String): File = {
+new File(goldenFilePath, name)
+  }
+
+  private def isApproved(dir: File, actualSimplifiedPlan: String): Boolean = {
+val file = new File(dir, "simplified.txt")
+val approved = FileUtils.readFileToString(file, StandardCharsets.UTF_8)
+approved == actualSimplifiedPlan
+  }
+
+  /**
+   * Serialize and save this SparkPlan.
+   * The resulting file is used by [[checkWithApproved]] to check stability.
+   *
+   * @param planthe SparkPlan
+   * @param namethe name of the query
+   * @param explain the full explain output; this is saved to help debug later 
as the simplified
+   *plan is not too useful for debugging
+   */
+  private def generateApprovedPlanFile(plan: SparkPlan, name: String, explain: 
String): Unit = {
+val dir = getDirForTest(name)
+val simplified = getSimplifiedPlan(plan)
+val foundMatch = dir.exists() && isApproved(dir, simplified)
+
+if (!foundMatch) {
+  FileUtils.deleteDirectory(dir)
+  assert(dir.mkdirs())
+
+  val file = new File(dir, "simplified.txt")
+  FileUtils.writeStringToFile(file, simplified, StandardCharsets.UTF_8

[GitHub] [spark] SparkQA commented on pull request #28939: [SPARK-32119][CORE] ExecutorPlugin doesn't work with Standalone Cluster and Kubernetes with --jars

2020-08-13 Thread GitBox


SparkQA commented on pull request #28939:
URL: https://github.com/apache/spark/pull/28939#issuecomment-673763467


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/32046/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29424: [SQL][MINOR] Fixed approx_count_distinct rsd param description

2020-08-13 Thread GitBox


AmplabJenkins removed a comment on pull request #29424:
URL: https://github.com/apache/spark/pull/29424#issuecomment-673756572


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127426/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29424: [SQL][MINOR] Fixed approx_count_distinct rsd param description

2020-08-13 Thread GitBox


SparkQA removed a comment on pull request #29424:
URL: https://github.com/apache/spark/pull/29424#issuecomment-673756078


   **[Test build #127426 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127426/testReport)**
 for PR 29424 at commit 
[`1a405a8`](https://github.com/apache/spark/commit/1a405a871c0377feff0f865d71ec306cf4b138c9).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29424: [SQL][MINOR] Fixed approx_count_distinct rsd param description

2020-08-13 Thread GitBox


AmplabJenkins commented on pull request #29424:
URL: https://github.com/apache/spark/pull/29424#issuecomment-673756569







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29424: [SQL][MINOR] Fixed approx_count_distinct rsd param description

2020-08-13 Thread GitBox


AmplabJenkins removed a comment on pull request #29424:
URL: https://github.com/apache/spark/pull/29424#issuecomment-673756454







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29424: [SQL][MINOR] Fixed approx_count_distinct rsd param description

2020-08-13 Thread GitBox


SparkQA commented on pull request #29424:
URL: https://github.com/apache/spark/pull/29424#issuecomment-673756557


   **[Test build #127426 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127426/testReport)**
 for PR 29424 at commit 
[`1a405a8`](https://github.com/apache/spark/commit/1a405a871c0377feff0f865d71ec306cf4b138c9).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29424: [SQL][MINOR] Fixed approx_count_distinct rsd param description

2020-08-13 Thread GitBox


AmplabJenkins commented on pull request #29424:
URL: https://github.com/apache/spark/pull/29424#issuecomment-673756454







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #29414: [SPARK-32106][SQL] Implement script transform in sql/core

2020-08-13 Thread GitBox


maropu commented on a change in pull request #29414:
URL: https://github.com/apache/spark/pull/29414#discussion_r470300313



##
File path: sql/core/src/test/resources/sql-tests/results/transform.sql.out
##
@@ -0,0 +1,224 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 15
+
+
+-- !query
+CREATE OR REPLACE TEMPORARY VIEW t AS SELECT * FROM VALUES
+('1', true, unhex('537061726B2053514C'), tinyint(1), 1, smallint(100), 
bigint(1), float(1.0), 1.0, Decimal(1.0), timestamp('1997-01-02'), 
date('2000-04-01')),
+('2', false, unhex('537061726B2053514C'), tinyint(2), 2,  smallint(200), 
bigint(2), float(2.0), 2.0, Decimal(2.0), timestamp('1997-01-02 03:04:05'), 
date('2000-04-02')),
+('3', true, unhex('537061726B2053514C'), tinyint(3), 3, smallint(300), 
bigint(3), float(3.0), 3.0, Decimal(3.0), timestamp('1997-02-10 17:32:01-08'), 
date('2000-04-03'))
+AS t(a, b, c, d, e, f, g, h, i, j, k, l)
+-- !query schema
+struct<>
+-- !query output
+
+
+
+-- !query
+SELECT TRANSFORM(a)
+USING 'cat' AS (a)
+FROM t
+-- !query schema
+struct
+-- !query output
+1
+2
+3
+
+
+-- !query
+SELECT TRANSFORM(a)
+USING 'some_non_existent_command' AS (a)
+FROM t
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.SparkException
+Subprocess exited with status 127. Error: /bin/bash: 
some_non_existent_command: command not found
+
+
+-- !query
+SELECT TRANSFORM(a)
+USING 'python some_non_existent_file' AS (a)
+FROM t
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.SparkException
+Subprocess exited with status 2. Error: python: can't open file 
'some_non_existent_file': [Errno 2] No such file or directory
+
+
+-- !query
+SELECT a, b, decode(c, 'UTF-8'), d, e, f, g, h, i, j, k, l FROM (
+  SELECT TRANSFORM(a, b, c, d, e, f, g, h, i, j, k, l)
+  USING 'cat' AS (
+a string,
+b boolean,
+c binary,
+d tinyint,
+e int,
+f smallint,
+g long,
+h float,
+i double,
+j decimal(38, 18),
+k timestamp,
+l date)
+  FROM t
+) tmp
+-- !query schema
+struct
+-- !query output
+1  trueSpark SQL   1   1   100 1   1.0 1.0 
1.001997-01-02 00:00:00 2000-04-01
+2  false   Spark SQL   2   2   200 2   2.0 2.0 
2.001997-01-02 03:04:05 2000-04-02
+3  trueSpark SQL   3   3   300 3   3.0 3.0 
3.001997-02-10 17:32:01 2000-04-03
+
+
+-- !query
+SELECT a, b, decode(c, 'UTF-8'), d, e, f, g, h, i, j, k, l FROM (
+  SELECT TRANSFORM(a, b, c, d, e, f, g, h, i, j, k, l)
+  USING 'cat' AS (
+a string,
+b string,
+c string,
+d string,
+e string,
+f string,
+g string,
+h string,
+i string,
+j string,
+k string,
+l string)
+  FROM t
+) tmp
+-- !query schema
+struct
+-- !query output
+1  trueSpark SQL   1   1   100 1   1.0 1.0 
1   1997-01-02 00:00:00 2000-04-01
+2  false   Spark SQL   2   2   200 2   2.0 2.0 
2   1997-01-02 03:04:05 2000-04-02
+3  trueSpark SQL   3   3   300 3   3.0 3.0 
3   1997-02-10 17:32:01 2000-04-03
+
+
+-- !query
+SELECT TRANSFORM(a)
+USING 'cat'
+FROM t
+-- !query schema
+struct<>
+-- !query output
+java.lang.ArrayIndexOutOfBoundsException
+1
+
+
+-- !query
+SELECT TRANSFORM(a, b)
+USING 'cat'
+FROM t
+-- !query schema
+struct
+-- !query output
+1  true
+2  false
+3  true
+
+
+-- !query
+SELECT TRANSFORM(a, b, c)
+USING 'cat'
+FROM t
+-- !query schema
+struct
+-- !query output
+1  true
+2  false
+3  true
+
+
+-- !query
+SELECT TRANSFORM(a, b, c, d, e, f, g, h, i)
+USING 'cat' AS (a int, b short, c long, d byte, e float, f double, g 
decimal(38, 18), h date, i timestamp)
+FROM VALUES
+('a','','1231a','a','213.21a','213.21a','0a.21d','2000-04-01123','1997-0102 
00:00:') tmp(a, b, c, d, e, f, g, h, i)
+-- !query schema
+struct
+-- !query output
+NULL   NULLNULLNULLNULLNULLNULLNULLNULL
+
+
+-- !query
+SELECT TRANSFORM(b, max(a), sum(f))
+USING 'cat' AS (a, b)
+FROM t
+GROUP BY b
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.catalyst.parser.ParseException
+
+mismatched input 'GROUP' expecting {, ';'}(line 4, pos 0)
+
+== SQL ==
+SELECT TRANSFORM(b, max(a), sum(f))
+USING 'cat' AS (a, b)
+FROM t
+GROUP BY b
+^^^
+
+
+-- !query
+MAP a, b USING 'cat' AS (a, b) FROM t
+-- !query schema
+struct
+-- !query output
+1  true
+2  false
+3  true
+
+
+-- !query
+REDUCE a, b USING 'cat' AS (a, b) FROM t
+-- !query schema
+struct
+-- !query output
+1  true
+2  false
+3  true
+
+
+-- !query
+SELECT TRANSFORM(a, b, c, null)
+  ROW FORMAT DELIMITED
+  FIELDS TERMINATED BY '|'
+  LINES TERMINATED BY '\n'
+  NULL DEFINED AS 'NULL'
+USING 'cat' AS (a, b, c, d)

Review comment:
   Oh, I see...




---

[GitHub] [spark] SparkQA commented on pull request #29424: [SQL][MINOR] Fixed approx_count_distinct rsd param description

2020-08-13 Thread GitBox


SparkQA commented on pull request #29424:
URL: https://github.com/apache/spark/pull/29424#issuecomment-673756078


   **[Test build #127426 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127426/testReport)**
 for PR 29424 at commit 
[`1a405a8`](https://github.com/apache/spark/commit/1a405a871c0377feff0f865d71ec306cf4b138c9).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29424: [SQL][MINOR] Fixed approx_count_distinct rsd param description

2020-08-13 Thread GitBox


AmplabJenkins removed a comment on pull request #29424:
URL: https://github.com/apache/spark/pull/29424#issuecomment-673497518


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #29424: [MINOR] Fixed approx_count_distinct rsd param description

2020-08-13 Thread GitBox


maropu commented on pull request #29424:
URL: https://github.com/apache/spark/pull/29424#issuecomment-673755561


   Looks okay.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #29424: [MINOR] Fixed approx_count_distinct rsd param description

2020-08-13 Thread GitBox


maropu commented on pull request #29424:
URL: https://github.com/apache/spark/pull/29424#issuecomment-673755517


   ok to test



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sarutak commented on a change in pull request #29082: [SPARK-32288][UI] Add exception summary for failed tasks in stage page

2020-08-13 Thread GitBox


sarutak commented on a change in pull request #29082:
URL: https://github.com/apache/spark/pull/29082#discussion_r470298801



##
File path: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala
##
@@ -421,6 +420,51 @@ private[spark] class AppStatusStore(
 constructTaskDataList(taskDataWrapperIter)
   }
 
+  def exceptionSummary(stageId: Int, attemptId: Int): Seq[v1.ExceptionSummary] 
= {
+val (stageData, _) = stageAttempt(stageId, attemptId)
+val key = Array(stageId, attemptId, stageData.numFailedTasks)
+asOption(store.read(classOf[CachedExceptionSummary], key))
+  .map(_.exceptionSummary)
+  .getOrElse {
+val exceptionSummary = computeExceptionSummary(stageId, attemptId)
+val cachedExceptionSummary = new CachedExceptionSummary(
+  stageId,
+  attemptId,
+  stageData.numFailedTasks,
+  exceptionSummary
+)
+store.write(cachedExceptionSummary)

Review comment:
   `CachedExceptionSummary` seems to never be deleted. This would be a 
potential cause of memory leak?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sarutak commented on a change in pull request #29082: [SPARK-32288][UI] Add exception summary for failed tasks in stage page

2020-08-13 Thread GitBox


sarutak commented on a change in pull request #29082:
URL: https://github.com/apache/spark/pull/29082#discussion_r470297451



##
File path: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala
##
@@ -421,6 +420,51 @@ private[spark] class AppStatusStore(
 constructTaskDataList(taskDataWrapperIter)
   }
 
+  def exceptionSummary(stageId: Int, attemptId: Int): Seq[v1.ExceptionSummary] 
= {
+val (stageData, _) = stageAttempt(stageId, attemptId)
+val key = Array(stageId, attemptId, stageData.numFailedTasks)
+asOption(store.read(classOf[CachedExceptionSummary], key))
+  .map(_.exceptionSummary)
+  .getOrElse {
+val exceptionSummary = computeExceptionSummary(stageId, attemptId)
+val cachedExceptionSummary = new CachedExceptionSummary(
+  stageId,
+  attemptId,
+  stageData.numFailedTasks,
+  exceptionSummary
+)
+store.write(cachedExceptionSummary)
+exceptionSummary
+  }
+  }
+
+  def computeExceptionSummary(stageId: Int, attemptId: Int): 
Seq[v1.ExceptionSummary] = {
+val tasks = taskList(stageId, attemptId, Int.MaxValue)
+tasks.filter(t => t.status.equalsIgnoreCase("failed"))
+  .flatMap(t => t.errorMessage)
+  .flatMap(parseErrorMessage)
+  .groupBy(e => (e.exceptionType, e.message))
+  .map(t => new v1.ExceptionSummary(t._2.head, t._2.length))
+  .toSeq
+  .sortBy(s => (s.count, s.exceptionFailure.exceptionType))(Ordering[(Int, 
String)].reverse)
+  .take(10)

Review comment:
   It would be better to have `10` as a named constant.

##
File path: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala
##
@@ -421,6 +420,33 @@ private[spark] class AppStatusStore(
 constructTaskDataList(taskDataWrapperIter)
   }
 
+  def exceptionSummary(stageId: Int, attemptId: Int): Seq[v1.ExceptionSummary] 
= {
+val tasks = taskList(stageId, attemptId, Int.MaxValue)

Review comment:
   `CachedExceptionSummary` seems to never be deleted. This would be a 
potential cause of memory leak?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28939: [SPARK-32119][CORE] ExecutorPlugin doesn't work with Standalone Cluster and Kubernetes with --jars

2020-08-13 Thread GitBox


SparkQA commented on pull request #28939:
URL: https://github.com/apache/spark/pull/28939#issuecomment-673754055


   **[Test build #127425 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127425/testReport)**
 for PR 28939 at commit 
[`5d65caf`](https://github.com/apache/spark/commit/5d65caf55c7b87fc0035444b60847e4037ad0f40).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu edited a comment on pull request #29360: [SPARK-32542][SQL]Add a Batch in Optimizer to improve performance in multidimensional analysis

2020-08-13 Thread GitBox


maropu edited a comment on pull request #29360:
URL: https://github.com/apache/spark/pull/29360#issuecomment-673753587


   > But shuffle is happened during Aggregate here, right? By splitting, the 
total amount of shuffled data is not changed, but split into several ones. Does 
it really result significant improvement?
   
   As @viirya said above, I think the same. Why can this reduce the amount of 
shuffle writes (and improve the performance)? In the case of `expand -> partial 
aggregates`, the aggregates seem to have the same **total** amount of output 
size.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #29360: [SPARK-32542][SQL]Add a Batch in Optimizer to improve performance in multidimensional analysis

2020-08-13 Thread GitBox


maropu commented on pull request #29360:
URL: https://github.com/apache/spark/pull/29360#issuecomment-673753587


   > But shuffle is happened during Aggregate here, right? By splitting, the 
total amount of shuffled data is not changed, but split into several ones. Does 
it really result significant improvement?
   
   As @viirya said above, I think the same. Why can this reduce the amount of 
shuffle writes? In the case of `expand -> partial aggregates`, the aggregates 
seem to have the same **total** amount of output size.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29422: [wip][testing][dnr] Decom fixes

2020-08-13 Thread GitBox


AmplabJenkins removed a comment on pull request #29422:
URL: https://github.com/apache/spark/pull/29422#issuecomment-673752394







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29426: [SPARK-32610][DOCS] Fix the link to metrics.dropwizard.io in monitoring.md to refer the proper version

2020-08-13 Thread GitBox


AmplabJenkins removed a comment on pull request #29426:
URL: https://github.com/apache/spark/pull/29426#issuecomment-673752312







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29422: [wip][testing][dnr] Decom fixes

2020-08-13 Thread GitBox


AmplabJenkins commented on pull request #29422:
URL: https://github.com/apache/spark/pull/29422#issuecomment-673752394







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29426: [SPARK-32610][DOCS] Fix the link to metrics.dropwizard.io in monitoring.md to refer the proper version

2020-08-13 Thread GitBox


AmplabJenkins commented on pull request #29426:
URL: https://github.com/apache/spark/pull/29426#issuecomment-673752312







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sarutak commented on pull request #28939: [SPARK-32119][CORE] ExecutorPlugin doesn't work with Standalone Cluster and Kubernetes with --jars

2020-08-13 Thread GitBox


sarutak commented on pull request #28939:
URL: https://github.com/apache/spark/pull/28939#issuecomment-673751979


   retest this please.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29422: [wip][testing][dnr] Decom fixes

2020-08-13 Thread GitBox


SparkQA commented on pull request #29422:
URL: https://github.com/apache/spark/pull/29422#issuecomment-673751916


   **[Test build #127424 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127424/testReport)**
 for PR 29422 at commit 
[`6334f80`](https://github.com/apache/spark/commit/6334f80e5d593d7e8a29bcd9598ec5e19d756162).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29426: [SPARK-32610][DOCS] Fix the link to metrics.dropwizard.io in monitoring.md to refer the proper version

2020-08-13 Thread GitBox


SparkQA commented on pull request #29426:
URL: https://github.com/apache/spark/pull/29426#issuecomment-673751893


   **[Test build #127423 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127423/testReport)**
 for PR 29426 at commit 
[`ddbf11f`](https://github.com/apache/spark/commit/ddbf11f5f4073ed378dd51654c1b085afb00128e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   >