[GitHub] [spark] SparkQA commented on pull request #29473: [SPARK-32656][SQL] Repartition bucketed tables for sort merge join / shuffled hash join if applicable

2020-08-29 Thread GitBox


SparkQA commented on pull request #29473:
URL: https://github.com/apache/spark/pull/29473#issuecomment-683248360


   **[Test build #128012 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128012/testReport)**
 for PR 29473 at commit 
[`7481e36`](https://github.com/apache/spark/commit/7481e36d8781e869a0dc558e0af5d358a56ab150).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29473: [SPARK-32656][SQL] Repartition bucketed tables for sort merge join / shuffled hash join if applicable

2020-08-29 Thread GitBox


SparkQA removed a comment on pull request #29473:
URL: https://github.com/apache/spark/pull/29473#issuecomment-683227251


   **[Test build #128012 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128012/testReport)**
 for PR 29473 at commit 
[`7481e36`](https://github.com/apache/spark/commit/7481e36d8781e869a0dc558e0af5d358a56ab150).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29473: [SPARK-32656][SQL] Repartition bucketed tables for sort merge join / shuffled hash join if applicable

2020-08-29 Thread GitBox


AmplabJenkins removed a comment on pull request #29473:
URL: https://github.com/apache/spark/pull/29473#issuecomment-683248506


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29473: [SPARK-32656][SQL] Repartition bucketed tables for sort merge join / shuffled hash join if applicable

2020-08-29 Thread GitBox


AmplabJenkins commented on pull request #29473:
URL: https://github.com/apache/spark/pull/29473#issuecomment-683248506







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29473: [SPARK-32656][SQL] Repartition bucketed tables for sort merge join / shuffled hash join if applicable

2020-08-29 Thread GitBox


AmplabJenkins removed a comment on pull request #29473:
URL: https://github.com/apache/spark/pull/29473#issuecomment-683248507


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/128012/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28885: [SPARK-29375][SPARK-28940][SPARK-32041][SQL] Whole plan exchange and subquery reuse

2020-08-29 Thread GitBox


SparkQA commented on pull request #28885:
URL: https://github.com/apache/spark/pull/28885#issuecomment-683252192


   **[Test build #128013 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128013/testReport)**
 for PR 28885 at commit 
[`cd064e4`](https://github.com/apache/spark/commit/cd064e460f90984d57397bac499058a261bc7205).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28885: [SPARK-29375][SPARK-28940][SPARK-32041][SQL] Whole plan exchange and subquery reuse

2020-08-29 Thread GitBox


AmplabJenkins removed a comment on pull request #28885:
URL: https://github.com/apache/spark/pull/28885#issuecomment-683252321







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28885: [SPARK-29375][SPARK-28940][SPARK-32041][SQL] Whole plan exchange and subquery reuse

2020-08-29 Thread GitBox


AmplabJenkins commented on pull request #28885:
URL: https://github.com/apache/spark/pull/28885#issuecomment-683252321







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] peter-toth commented on a change in pull request #29572: [SPARK-32730][SQL] Improve LeftSemi SortMergeJoin right side buffering

2020-08-29 Thread GitBox


peter-toth commented on a change in pull request #29572:
URL: https://github.com/apache/spark/pull/29572#discussion_r479625983



##
File path: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala
##
@@ -749,6 +749,15 @@ class JoinSuite extends QueryTest with SharedSparkSession 
with AdaptiveSparkPlan
 )
   }
 
+  // LEFT SEMI JOIN without bound condition does not use 
[[ExternalAppendOnlyUnsafeRowArray]]
+  // so should not cause any spill
+  assertNotSpilled(sparkContext, "left semi join") {

Review comment:
   Without this fix this UT fails.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29572: [SPARK-32730][SQL] Improve LeftSemi SortMergeJoin right side buffering

2020-08-29 Thread GitBox


SparkQA commented on pull request #29572:
URL: https://github.com/apache/spark/pull/29572#issuecomment-683257761


   **[Test build #128014 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128014/testReport)**
 for PR 29572 at commit 
[`acc6646`](https://github.com/apache/spark/commit/acc6646e7e83fa9e3f082b1aaa8c6227e3d8a7cf).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29572: [SPARK-32730][SQL] Improve LeftSemi SortMergeJoin right side buffering

2020-08-29 Thread GitBox


AmplabJenkins removed a comment on pull request #29572:
URL: https://github.com/apache/spark/pull/29572#issuecomment-683257875







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29572: [SPARK-32730][SQL] Improve LeftSemi SortMergeJoin right side buffering

2020-08-29 Thread GitBox


AmplabJenkins commented on pull request #29572:
URL: https://github.com/apache/spark/pull/29572#issuecomment-683257875







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn opened a new pull request #29577: [SPARK-32733][SQL] Add extended information - arguments/examples/since/notes of expressions to the remarks field of GetFunctionsOperation

2020-08-29 Thread GitBox


yaooqinn opened a new pull request #29577:
URL: https://github.com/apache/spark/pull/29577


   
   
   ### What changes were proposed in this pull request?
   
   This PR adds extended information of a function including arguments, 
examples, notes and the since field to the SparkGetFunctionOperation
   
   
   
   
   
   ### Why are the changes needed?
   
   better user experience, it will help JDBC users to have a better 
understanding of our builtin functions
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   Yes, BI tools and JDBC users will get full information on a spark function 
instead of only fragmentary usage info.
   
   
   e.g. date_part
   
   
    before
   
   ```
   date_part(field, source) - Extracts a part of the date/timestamp or interval 
source.
   ```
    after
   
   ```
   Usage:
 date_part(field, source) - Extracts a part of the date/timestamp or 
interval source.
   
   Arguments:
 * field - selects which part of the source should be extracted, and 
supported string values are as same as the fields of the equivalent function 
`EXTRACT`.
 * source - a date/timestamp or interval column from where `field` 
should be extracted
 
   Examples:
 > SELECT date_part('YEAR', TIMESTAMP '2019-08-12 01:00:00.123456');
  2019
 > SELECT date_part('week', timestamp'2019-08-12 01:00:00.123456');
  33
 > SELECT date_part('doy', DATE'2019-08-12');
  224
 > SELECT date_part('SECONDS', timestamp'2019-10-01 00:00:01.01');
  1.01
 > SELECT date_part('days', interval 1 year 10 months 5 days);
  5
 > SELECT date_part('seconds', interval 5 hours 30 seconds 1 
milliseconds 1 microseconds);
  30.001001
 
   Note:
 The date_part function is equivalent to the SQL-standard function 
`EXTRACT(field FROM source)`
   
   Since: 3.0.0
   
   ```
   
   
   ### How was this patch tested?
   
   
   New tests
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29577: [SPARK-32733][SQL] Add extended information - arguments/examples/since/notes of expressions to the remarks field of GetFunctionsOperati

2020-08-29 Thread GitBox


AmplabJenkins commented on pull request #29577:
URL: https://github.com/apache/spark/pull/29577#issuecomment-683258812







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dbtsai commented on pull request #29567: [SPARK-32721][SQL] Simplify if clauses with null and boolean

2020-08-29 Thread GitBox


dbtsai commented on pull request #29567:
URL: https://github.com/apache/spark/pull/29567#issuecomment-683259420


   This will be a very useful optimization to turn if causes into expressions 
that can be pushdown. For `CaseWhen` if there is only one branch, we can 
convert it into `If`. A PR before https://github.com/apache/spark/pull/21850 
tried this optimization but didn't end up merging into master because at that 
time, `CaseWhen` and `If` have not much performance difference. Maybe we can 
revisit it again.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29577: [SPARK-32733][SQL] Add extended information - arguments/examples/since/notes of expressions to the remarks field of GetFunctionsOperation

2020-08-29 Thread GitBox


SparkQA commented on pull request #29577:
URL: https://github.com/apache/spark/pull/29577#issuecomment-683259527


   **[Test build #128015 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128015/testReport)**
 for PR 29577 at commit 
[`fd94b09`](https://github.com/apache/spark/commit/fd94b090a2f1b0d9a816c6dfb6f7206db87b5633).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29577: [SPARK-32733][SQL] Add extended information - arguments/examples/since/notes of expressions to the remarks field of GetFunction

2020-08-29 Thread GitBox


AmplabJenkins removed a comment on pull request #29577:
URL: https://github.com/apache/spark/pull/29577#issuecomment-683258812







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29577: [SPARK-32733][SQL] Add extended information - arguments/examples/since/notes of expressions to the remarks field of GetFunction

2020-08-29 Thread GitBox


AmplabJenkins removed a comment on pull request #29577:
URL: https://github.com/apache/spark/pull/29577#issuecomment-683259678







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29577: [SPARK-32733][SQL] Add extended information - arguments/examples/since/notes of expressions to the remarks field of GetFunctionsOperati

2020-08-29 Thread GitBox


AmplabJenkins commented on pull request #29577:
URL: https://github.com/apache/spark/pull/29577#issuecomment-683259678







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on pull request #29577: [SPARK-32733][SQL] Add extended information - arguments/examples/since/notes of expressions to the remarks field of GetFunctionsOperation

2020-08-29 Thread GitBox


yaooqinn commented on pull request #29577:
URL: https://github.com/apache/spark/pull/29577#issuecomment-683259747


   cc @cloud-fan @juliuszsompolski @wangyum @bogdanghit @maropu thanks a lot 
and truely sorry to bother you on weekends



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dbtsai edited a comment on pull request #29567: [SPARK-32721][SQL] Simplify if clauses with null and boolean

2020-08-29 Thread GitBox


dbtsai edited a comment on pull request #29567:
URL: https://github.com/apache/spark/pull/29567#issuecomment-683259420


   This will be a very useful optimization to turn `if causes` into expressions 
that can be pushdown. For `CaseWhen` if there is only one branch, we can 
convert it into `If`. A PR awhile ago 
https://github.com/apache/spark/pull/21850 implemented this optimization but 
didn't end up merging into master because at that time, `CaseWhen` and `If` 
have not much performance difference. Maybe we can revisit that again.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dbtsai opened a new pull request #21850: [SPARK-24892] [SQL] Simplify `CaseWhen` to `If` when there is only one branch

2020-08-29 Thread GitBox


dbtsai opened a new pull request #21850:
URL: https://github.com/apache/spark/pull/21850


   ## What changes were proposed in this pull request?
   
   After the rule of removing the unreachable branches, it could be only one 
branch left. In this situation, `CaseWhen` can be converted to `If` to do 
further optimization.
   
   ## How was this patch tested?
   
   Tests added.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #21850: [SPARK-24892] [SQL] Simplify `CaseWhen` to `If` when there is only one branch

2020-08-29 Thread GitBox


SparkQA commented on pull request #21850:
URL: https://github.com/apache/spark/pull/21850#issuecomment-683260464


   **[Test build #128016 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128016/testReport)**
 for PR 21850 at commit 
[`e2b0e96`](https://github.com/apache/spark/commit/e2b0e963bd48e3b9361be3d6291f7fcfca4afea7).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #21850: [SPARK-24892] [SQL] Simplify `CaseWhen` to `If` when there is only one branch

2020-08-29 Thread GitBox


SparkQA removed a comment on pull request #21850:
URL: https://github.com/apache/spark/pull/21850#issuecomment-683260464


   **[Test build #128016 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128016/testReport)**
 for PR 21850 at commit 
[`e2b0e96`](https://github.com/apache/spark/commit/e2b0e963bd48e3b9361be3d6291f7fcfca4afea7).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #21850: [SPARK-24892] [SQL] Simplify `CaseWhen` to `If` when there is only one branch

2020-08-29 Thread GitBox


AmplabJenkins commented on pull request #21850:
URL: https://github.com/apache/spark/pull/21850#issuecomment-683260614







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #21850: [SPARK-24892] [SQL] Simplify `CaseWhen` to `If` when there is only one branch

2020-08-29 Thread GitBox


SparkQA commented on pull request #21850:
URL: https://github.com/apache/spark/pull/21850#issuecomment-683260610


   **[Test build #128016 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128016/testReport)**
 for PR 21850 at commit 
[`e2b0e96`](https://github.com/apache/spark/commit/e2b0e963bd48e3b9361be3d6291f7fcfca4afea7).
* This patch **fails Scala style tests**.
* This patch **does not merge cleanly**.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #21850: [SPARK-24892] [SQL] Simplify `CaseWhen` to `If` when there is only one branch

2020-08-29 Thread GitBox


AmplabJenkins removed a comment on pull request #21850:
URL: https://github.com/apache/spark/pull/21850#issuecomment-683260614







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #21850: [SPARK-24892] [SQL] Simplify `CaseWhen` to `If` when there is only one branch

2020-08-29 Thread GitBox


AmplabJenkins removed a comment on pull request #21850:
URL: https://github.com/apache/spark/pull/21850#issuecomment-683260616


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/128016/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29577: [SPARK-32733][SQL] Add extended information - arguments/examples/since/notes of expressions to the remarks field of GetFunctionsOperation

2020-08-29 Thread GitBox


SparkQA commented on pull request #29577:
URL: https://github.com/apache/spark/pull/29577#issuecomment-683264041


   **[Test build #128017 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128017/testReport)**
 for PR 29577 at commit 
[`980eaa2`](https://github.com/apache/spark/commit/980eaa2f88e953da85dfd32f0680d77da2a35270).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29577: [SPARK-32733][SQL] Add extended information - arguments/examples/since/notes of expressions to the remarks field of GetFunctionsOperati

2020-08-29 Thread GitBox


AmplabJenkins commented on pull request #29577:
URL: https://github.com/apache/spark/pull/29577#issuecomment-683264200







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29577: [SPARK-32733][SQL] Add extended information - arguments/examples/since/notes of expressions to the remarks field of GetFunction

2020-08-29 Thread GitBox


AmplabJenkins removed a comment on pull request #29577:
URL: https://github.com/apache/spark/pull/29577#issuecomment-683264200







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tanelk closed pull request #29570: [SPARK-32727][SQL] Replace CaseWhen with If when there is only one case

2020-08-29 Thread GitBox


tanelk closed pull request #29570:
URL: https://github.com/apache/spark/pull/29570


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tanelk commented on pull request #29570: [SPARK-32727][SQL] Replace CaseWhen with If when there is only one case

2020-08-29 Thread GitBox


tanelk commented on pull request #29570:
URL: https://github.com/apache/spark/pull/29570#issuecomment-683273844


   Duplicates #21850



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] WinkerDu commented on pull request #29000: [SPARK-27194][SPARK-29302][SQL] Fix commit collision in dynamic partition overwrite mode

2020-08-29 Thread GitBox


WinkerDu commented on pull request #29000:
URL: https://github.com/apache/spark/pull/29000#issuecomment-683275178


   gentle ping @Ngone51 for further review, or involving other committer to 
review this patch, thanks :)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28885: [SPARK-29375][SPARK-28940][SPARK-32041][SQL] Whole plan exchange and subquery reuse

2020-08-29 Thread GitBox


SparkQA commented on pull request #28885:
URL: https://github.com/apache/spark/pull/28885#issuecomment-683281617


   **[Test build #128013 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128013/testReport)**
 for PR 28885 at commit 
[`cd064e4`](https://github.com/apache/spark/commit/cd064e460f90984d57397bac499058a261bc7205).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28885: [SPARK-29375][SPARK-28940][SPARK-32041][SQL] Whole plan exchange and subquery reuse

2020-08-29 Thread GitBox


SparkQA removed a comment on pull request #28885:
URL: https://github.com/apache/spark/pull/28885#issuecomment-683252192


   **[Test build #128013 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128013/testReport)**
 for PR 28885 at commit 
[`cd064e4`](https://github.com/apache/spark/commit/cd064e460f90984d57397bac499058a261bc7205).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28885: [SPARK-29375][SPARK-28940][SPARK-32041][SQL] Whole plan exchange and subquery reuse

2020-08-29 Thread GitBox


AmplabJenkins removed a comment on pull request #28885:
URL: https://github.com/apache/spark/pull/28885#issuecomment-683281814







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28885: [SPARK-29375][SPARK-28940][SPARK-32041][SQL] Whole plan exchange and subquery reuse

2020-08-29 Thread GitBox


AmplabJenkins commented on pull request #28885:
URL: https://github.com/apache/spark/pull/28885#issuecomment-683281814







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] peter-toth commented on pull request #28885: [SPARK-29375][SPARK-28940][SPARK-32041][SQL] Whole plan exchange and subquery reuse

2020-08-29 Thread GitBox


peter-toth commented on pull request #28885:
URL: https://github.com/apache/spark/pull/28885#issuecomment-683283412


   @cloud-fan, @maropu, @viirya, can you please help me how to move forward 
with this PR?
   The latest commit updates expected plans of PlanStability suites where you 
can see the new reuse nodes this PR adds to TPCDS queries. My benchmarks showed 
that this PR brings ~30% improvement to some of the queries. Please let me know 
if you have any concerns with this PR.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29572: [SPARK-32730][SQL] Improve LeftSemi SortMergeJoin right side buffering

2020-08-29 Thread GitBox


SparkQA commented on pull request #29572:
URL: https://github.com/apache/spark/pull/29572#issuecomment-683286225


   **[Test build #128014 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128014/testReport)**
 for PR 29572 at commit 
[`acc6646`](https://github.com/apache/spark/commit/acc6646e7e83fa9e3f082b1aaa8c6227e3d8a7cf).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29572: [SPARK-32730][SQL] Improve LeftSemi SortMergeJoin right side buffering

2020-08-29 Thread GitBox


SparkQA removed a comment on pull request #29572:
URL: https://github.com/apache/spark/pull/29572#issuecomment-683257761


   **[Test build #128014 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128014/testReport)**
 for PR 29572 at commit 
[`acc6646`](https://github.com/apache/spark/commit/acc6646e7e83fa9e3f082b1aaa8c6227e3d8a7cf).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29572: [SPARK-32730][SQL] Improve LeftSemi SortMergeJoin right side buffering

2020-08-29 Thread GitBox


AmplabJenkins removed a comment on pull request #29572:
URL: https://github.com/apache/spark/pull/29572#issuecomment-683286448







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29572: [SPARK-32730][SQL] Improve LeftSemi SortMergeJoin right side buffering

2020-08-29 Thread GitBox


AmplabJenkins commented on pull request #29572:
URL: https://github.com/apache/spark/pull/29572#issuecomment-683286448







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tooptoop4 commented on pull request #29516: [SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV

2020-08-29 Thread GitBox


tooptoop4 commented on pull request #29516:
URL: https://github.com/apache/spark/pull/29516#issuecomment-683287962


   @srowen I would of but my last PR 
(https://github.com/apache/spark/pull/27697) got shot down for no reason



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29577: [SPARK-32733][SQL] Add extended information - arguments/examples/since/notes of expressions to the remarks field of GetFunctionsOpera

2020-08-29 Thread GitBox


SparkQA removed a comment on pull request #29577:
URL: https://github.com/apache/spark/pull/29577#issuecomment-683259527


   **[Test build #128015 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128015/testReport)**
 for PR 29577 at commit 
[`fd94b09`](https://github.com/apache/spark/commit/fd94b090a2f1b0d9a816c6dfb6f7206db87b5633).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29577: [SPARK-32733][SQL] Add extended information - arguments/examples/since/notes of expressions to the remarks field of GetFunctionsOperation

2020-08-29 Thread GitBox


SparkQA commented on pull request #29577:
URL: https://github.com/apache/spark/pull/29577#issuecomment-683288320


   **[Test build #128015 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128015/testReport)**
 for PR 29577 at commit 
[`fd94b09`](https://github.com/apache/spark/commit/fd94b090a2f1b0d9a816c6dfb6f7206db87b5633).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29577: [SPARK-32733][SQL] Add extended information - arguments/examples/since/notes of expressions to the remarks field of GetFunctionsOperati

2020-08-29 Thread GitBox


AmplabJenkins commented on pull request #29577:
URL: https://github.com/apache/spark/pull/29577#issuecomment-683288538







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29577: [SPARK-32733][SQL] Add extended information - arguments/examples/since/notes of expressions to the remarks field of GetFunction

2020-08-29 Thread GitBox


AmplabJenkins removed a comment on pull request #29577:
URL: https://github.com/apache/spark/pull/29577#issuecomment-683288538







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29577: [SPARK-32733][SQL] Add extended information - arguments/examples/since/notes of expressions to the remarks field of GetFunctionsOperation

2020-08-29 Thread GitBox


SparkQA commented on pull request #29577:
URL: https://github.com/apache/spark/pull/29577#issuecomment-683292221


   **[Test build #128017 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128017/testReport)**
 for PR 29577 at commit 
[`980eaa2`](https://github.com/apache/spark/commit/980eaa2f88e953da85dfd32f0680d77da2a35270).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29577: [SPARK-32733][SQL] Add extended information - arguments/examples/since/notes of expressions to the remarks field of GetFunctionsOpera

2020-08-29 Thread GitBox


SparkQA removed a comment on pull request #29577:
URL: https://github.com/apache/spark/pull/29577#issuecomment-683264041


   **[Test build #128017 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128017/testReport)**
 for PR 29577 at commit 
[`980eaa2`](https://github.com/apache/spark/commit/980eaa2f88e953da85dfd32f0680d77da2a35270).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29577: [SPARK-32733][SQL] Add extended information - arguments/examples/since/notes of expressions to the remarks field of GetFunctionsOperati

2020-08-29 Thread GitBox


AmplabJenkins commented on pull request #29577:
URL: https://github.com/apache/spark/pull/29577#issuecomment-683292455







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29577: [SPARK-32733][SQL] Add extended information - arguments/examples/since/notes of expressions to the remarks field of GetFunction

2020-08-29 Thread GitBox


AmplabJenkins removed a comment on pull request #29577:
URL: https://github.com/apache/spark/pull/29577#issuecomment-683292455







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on pull request #29516: [SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV

2020-08-29 Thread GitBox


srowen commented on pull request #29516:
URL: https://github.com/apache/spark/pull/29516#issuecomment-683297203


   @tooptoop4 that change looks unrelated? There was also quite a bit of reason 
given.
   I don't think it's relevant to creating a back-port, which I am saying is 
worth evaluating.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Olwn opened a new pull request #29578: Fix batch submission delay caused by actions in dstream transform

2020-08-29 Thread GitBox


Olwn opened a new pull request #29578:
URL: https://github.com/apache/spark/pull/29578


   ### What changes were proposed in this pull request?
   Currently dstream.getOrCompute runs at JobGenerator, which has a single 
thread event loop.
   This pull request moves that to JobScheduler.
   
   
   ### Why are the changes needed?
   Some of our spark applications have batch creation delay after running for 
some time. For instance, Batch 10:03 is submitted at 10:06. In spark UI, the 
latest batch doesn't match current time.
   We observe such applications share a commonality that rdd actions exist in 
dstream.transfrom. Those actions will be executed in dstream.compute, which is 
called by JobGenerator. JobGenerator runs with a single thread event loop so 
any synchronized operations will block event processing.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Added two tests
   
   1. ForEachDStreamSuite to make sure batch execution doesn't block batch 
submission
   2. JobSchedulerSuite to make sure all jobs in a batch can be associated with 
the BatchTime and display at Spark UI
   
   ### JIRAs
   https://issues.apache.org/jira/browse/SPARK-32734
   https://issues.apache.org/jira/browse/SPARK-32735
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29228: [SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framework to support apply specified spark configurations.

2020-08-29 Thread GitBox


beliefer commented on a change in pull request #29228:
URL: https://github.com/apache/spark/pull/29228#discussion_r479660053



##
File path: core/src/test/scala/org/apache/spark/LocalSC.scala
##
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import _root_.io.netty.util.internal.logging.{InternalLoggerFactory, 
Slf4JLoggerFactory}
+import org.scalatest.BeforeAndAfterAll
+import org.scalatest.BeforeAndAfterEach
+import org.scalatest.Suite
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.resource.ResourceProfile
+
+/**
+ * Manages a local `sc` `SparkContext` variable, correctly stopping it after 
each test.
+ *
+ * Note: this class is a copy of [[LocalSparkContext]]. Why copy it? Reduce 
conflict. Because
+ * many test suites use [[LocalSparkContext]] and overwrite some variable or 
function (e.g.
+ * sc of LocalSparkContext), there occurs conflict when we refactor the `sc` 
as a new function.
+ * After migrating all test suites that use [[LocalSparkContext]] to use 
[[LocalSC]], we will
+ * delete the original [[LocalSparkContext]] and rename [[LocalSC]] to 
[[LocalSparkContext]].
+ */
+trait LocalSC extends BeforeAndAfterEach

Review comment:
   OK





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29578: Fix batch submission delay caused by actions in dstream transform

2020-08-29 Thread GitBox


AmplabJenkins commented on pull request #29578:
URL: https://github.com/apache/spark/pull/29578#issuecomment-683305035


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29578: Fix batch submission delay caused by actions in dstream transform

2020-08-29 Thread GitBox


AmplabJenkins removed a comment on pull request #29578:
URL: https://github.com/apache/spark/pull/29578#issuecomment-683305035


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29578: Fix batch submission delay caused by actions in dstream transform

2020-08-29 Thread GitBox


AmplabJenkins commented on pull request #29578:
URL: https://github.com/apache/spark/pull/29578#issuecomment-683305195


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29228: [SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framework to support apply specified spark configurations.

2020-08-29 Thread GitBox


SparkQA commented on pull request #29228:
URL: https://github.com/apache/spark/pull/29228#issuecomment-683305936


   **[Test build #128018 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128018/testReport)**
 for PR 29228 at commit 
[`1029d26`](https://github.com/apache/spark/commit/1029d2658fa18cfa773fef5ede283a3e7184f438).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29228: [SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framework to support apply specified spark configurations.

2020-08-29 Thread GitBox


AmplabJenkins commented on pull request #29228:
URL: https://github.com/apache/spark/pull/29228#issuecomment-683306071







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29228: [SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framework to support apply specified spark configurations.

2020-08-29 Thread GitBox


AmplabJenkins removed a comment on pull request #29228:
URL: https://github.com/apache/spark/pull/29228#issuecomment-683306071







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 opened a new pull request #29579: [SPARK-32736][CORE] Avoid caching the removed decommissioned executors in TaskSchedulerImpl

2020-08-29 Thread GitBox


Ngone51 opened a new pull request #29579:
URL: https://github.com/apache/spark/pull/29579


   
   
   ### What changes were proposed in this pull request?
   
   
   The motivation of this PR is to avoid caching the removed decommissioned 
executors in `TaskSchedulerImpl`. The cache is introduced in 
https://github.com/apache/spark/pull/29422. The cache will hold the 
`isHostDecommissioned` info for a while. So if the task `FetchFailure` event 
comes after the executor loss event, `DAGScheduler` can still get the 
`isHostDecommissioned` from the cache and unregister the host shuffle map 
status when the host is decommissioned too.
   
   This PR tries to achieve the same goal without the cache. Instead of saving 
the `workerLost` in `ExecutorUpdated` / `ExecutorDecommissionInfo` / 
`ExecutorDecommissionState`, we could save the `hostOpt` directly. When the 
host is decommissioned or lost too, the `hostOpt` can be a specific host 
address. Otherwise, it's `None` to indicate that only the executor is 
decommissioned or lost.
   
   Now that we have the host info, we can also unregister the host shuffle map 
status when `executorLost` is triggered for the decommissioned executor.
   
   Besides, this PR also includes a few cleanups around the touched code.
   
   
   ### Why are the changes needed?
   
   
   It helps to unregister the shuffle map status earlier for both decommission 
and normal executor lost cases.
   
   It also saves memory in  `TaskSchedulerImpl` and simplifies the code a 
little bit.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No.
   
   ### How was this patch tested?
   
   
   This PR only refactor the code. The original behaviour should be covered by 
`DecommissionWorkerSuite`. 
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29579: [SPARK-32736][CORE] Avoid caching the removed decommissioned executors in TaskSchedulerImpl

2020-08-29 Thread GitBox


SparkQA commented on pull request #29579:
URL: https://github.com/apache/spark/pull/29579#issuecomment-683307623


   **[Test build #128019 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128019/testReport)**
 for PR 29579 at commit 
[`d5bc756`](https://github.com/apache/spark/commit/d5bc7560f32d59861885c6f40ec2597d680e4612).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29579: [SPARK-32736][CORE] Avoid caching the removed decommissioned executors in TaskSchedulerImpl

2020-08-29 Thread GitBox


AmplabJenkins removed a comment on pull request #29579:
URL: https://github.com/apache/spark/pull/29579#issuecomment-683307753







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29579: [SPARK-32736][CORE] Avoid caching the removed decommissioned executors in TaskSchedulerImpl

2020-08-29 Thread GitBox


AmplabJenkins commented on pull request #29579:
URL: https://github.com/apache/spark/pull/29579#issuecomment-683307753







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29579: [SPARK-32736][CORE] Avoid caching the removed decommissioned executors in TaskSchedulerImpl

2020-08-29 Thread GitBox


AmplabJenkins removed a comment on pull request #29579:
URL: https://github.com/apache/spark/pull/29579#issuecomment-683308453


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29579: [SPARK-32736][CORE] Avoid caching the removed decommissioned executors in TaskSchedulerImpl

2020-08-29 Thread GitBox


SparkQA removed a comment on pull request #29579:
URL: https://github.com/apache/spark/pull/29579#issuecomment-683307623


   **[Test build #128019 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128019/testReport)**
 for PR 29579 at commit 
[`d5bc756`](https://github.com/apache/spark/commit/d5bc7560f32d59861885c6f40ec2597d680e4612).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29579: [SPARK-32736][CORE] Avoid caching the removed decommissioned executors in TaskSchedulerImpl

2020-08-29 Thread GitBox


AmplabJenkins commented on pull request #29579:
URL: https://github.com/apache/spark/pull/29579#issuecomment-683308453







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29579: [SPARK-32736][CORE] Avoid caching the removed decommissioned executors in TaskSchedulerImpl

2020-08-29 Thread GitBox


SparkQA commented on pull request #29579:
URL: https://github.com/apache/spark/pull/29579#issuecomment-683308444


   **[Test build #128019 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128019/testReport)**
 for PR 29579 at commit 
[`d5bc756`](https://github.com/apache/spark/commit/d5bc7560f32d59861885c6f40ec2597d680e4612).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `case class ExecutorDecommissionInfo(message: String, hostOpt: 
Option[String] = None)`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29579: [SPARK-32736][CORE] Avoid caching the removed decommissioned executors in TaskSchedulerImpl

2020-08-29 Thread GitBox


AmplabJenkins removed a comment on pull request #29579:
URL: https://github.com/apache/spark/pull/29579#issuecomment-683308454


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/128019/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29556: [WIP][SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

2020-08-29 Thread GitBox


SparkQA commented on pull request #29556:
URL: https://github.com/apache/spark/pull/29556#issuecomment-683310056


   **[Test build #128020 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128020/testReport)**
 for PR 29556 at commit 
[`4c214a9`](https://github.com/apache/spark/commit/4c214a9cd7d5baf74cbb463a4e33d321ba19826e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29556: [WIP][SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

2020-08-29 Thread GitBox


AmplabJenkins removed a comment on pull request #29556:
URL: https://github.com/apache/spark/pull/29556#issuecomment-683310240







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29556: [WIP][SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

2020-08-29 Thread GitBox


AmplabJenkins commented on pull request #29556:
URL: https://github.com/apache/spark/pull/29556#issuecomment-683310240







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a change in pull request #29577: [SPARK-32733][SQL] Add extended information - arguments/examples/since/notes of expressions to the remarks field of GetFunctionsO

2020-08-29 Thread GitBox


wangyum commented on a change in pull request #29577:
URL: https://github.com/apache/spark/pull/29577#discussion_r479665506



##
File path: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetFunctionsOperation.scala
##
@@ -91,7 +88,7 @@ private[hive] class SparkGetFunctionsOperation(
   DEFAULT_HIVE_CATALOG, // FUNCTION_CAT
   db, // FUNCTION_SCHEM
   funcIdentifier.funcName, // FUNCTION_NAME
-  info.getUsage, // REMARKS
+  "Usage:\n  " + info.getUsage.trim + "\n" + 
info.getExtended, // REMARKS

Review comment:
   ```s"Usage: ${info.getUsage}\nExtended Usage:${info.getExtended}"```?  
In order to match `DescribeFunctionCommand`: 
https://github.com/apache/spark/blob/7048fff2304bd44c3c6b7e57a485200a8959203d/sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala#L144-L148





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #29567: [SPARK-32721][SQL] Simplify if clauses with null and boolean

2020-08-29 Thread GitBox


viirya commented on a change in pull request #29567:
URL: https://github.com/apache/spark/pull/29567#discussion_r479665954



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##
@@ -463,6 +463,8 @@ object SimplifyConditionals extends Rule[LogicalPlan] with 
PredicateHelper {
   case If(Literal(null, _), _, falseValue) => falseValue
   case If(cond, trueValue, falseValue)
 if cond.deterministic && trueValue.semanticEquals(falseValue) => 
trueValue
+  case If(p, l @ Literal(null, _), FalseLiteral) if !p.nullable => And(p, 
l)
+  case If(p, l @ Literal(null, _), TrueLiteral) if !p.nullable => 
Or(Not(p), l)

Review comment:
   Hm, I think for expression rewriting, this won't be much faster. This is 
like @HyukjinKwon said, should be a rare case. The most likely usage, I think, 
is to turn a predicate to a form being able to push down to data source.
   
   Adding a rule is too much. As part of existing rule looks more proper.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Udbhav30 commented on pull request #29552: [SPARK-32481][CORE][SQL][test-hadoop2.7][test-hive1.2] Support truncate table to move data to trash

2020-08-29 Thread GitBox


Udbhav30 commented on pull request #29552:
URL: https://github.com/apache/spark/pull/29552#issuecomment-683312518


   cc @viirya @dongjoon-hyun 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #29567: [SPARK-32721][SQL] Simplify if clauses with null and boolean

2020-08-29 Thread GitBox


viirya commented on a change in pull request #29567:
URL: https://github.com/apache/spark/pull/29567#discussion_r479668552



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##
@@ -463,6 +463,8 @@ object SimplifyConditionals extends Rule[LogicalPlan] with 
PredicateHelper {
   case If(Literal(null, _), _, falseValue) => falseValue
   case If(cond, trueValue, falseValue)
 if cond.deterministic && trueValue.semanticEquals(falseValue) => 
trueValue
+  case If(p, l @ Literal(null, _), FalseLiteral) if !p.nullable => And(p, 
l)
+  case If(p, l @ Literal(null, _), TrueLiteral) if !p.nullable => 
Or(Not(p), l)

Review comment:
   Hmm, I'm rethinking this. Is a predicate `And(p, null)` be pushed down? 
Predicate pushdown won't rewrite null literals.
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tanelk commented on a change in pull request #29565: [SPARK-24994][SQL] Simplify casts for literal types

2020-08-29 Thread GitBox


tanelk commented on a change in pull request #29565:
URL: https://github.com/apache/spark/pull/29565#discussion_r479672380



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCast.scala
##
@@ -0,0 +1,214 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.Literal.FalseLiteral
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.types._
+
+/**
+ * Unwrap casts in binary comparison operations with patterns like following:
+ *
+ * `BinaryComparison(Cast(fromExp, toType), Literal(value, toType))`
+ *   or
+ * `BinaryComparison(Literal(value, toType), Cast(fromExp, toType))`
+ *
+ * This rule optimizes expressions with the above pattern by either replacing 
the cast with simpler
+ * constructs, or moving the cast from the expression side to the literal 
side, which enables them
+ * to be optimized away later and pushed down to data sources.
+ *
+ * Currently this only handles cases where `fromType` (of `fromExp`) and 
`toType` are of integral
+ * types (i.e., byte, short, int and long). The rule checks to see if the 
literal `value` is
+ * within range `(min, max)`, where `min` and `max` are the minimum and 
maximum value of
+ * `fromType`, respectively. If this is true then it means we can safely cast 
`value` to `fromType`
+ * and thus able to move the cast to the literal side.
+ *
+ * If the `value` is not within range `(min, max)`, the rule breaks the 
scenario into different
+ * cases and try to replace each with simpler constructs.
+ *
+ * if `value > max`, the cases are of following:
+ *  - `cast(exp, ty) > value` ==> if(isnull(exp), null, false)
+ *  - `cast(exp, ty) >= value` ==> if(isnull(exp), null, false)
+ *  - `cast(exp, ty) === value` ==> if(isnull(exp), null, false)
+ *  - `cast(exp, ty) <=> value` ==> false
+ *  - `cast(exp, ty) <= value` ==> if(isnull(exp), null, true)
+ *  - `cast(exp, ty) < value` ==> if(isnull(exp), null, true)
+ *
+ * if `value == max`, the cases are of following:
+ *  - `cast(exp, ty) > value` ==> if(isnull(exp), null, false)
+ *  - `cast(exp, ty) >= value` ==> exp == max
+ *  - `cast(exp, ty) === value` ==> exp == max
+ *  - `cast(exp, ty) <=> value` ==> exp == max
+ *  - `cast(exp, ty) <= value` ==> if(isnull(exp), null, true)
+ *  - `cast(exp, ty) < value` ==> exp =!= max
+ *
+ * Similarly for the cases when `value == min` and `value < min`.
+ *
+ * Further, the above `if(isnull(exp), null, false)` is represented using 
conjunction
+ * `and(isnull(exp), null)`, to enable further optimization and filter 
pushdown to data sources.
+ * Similarly, `if(isnull(exp), null, true)` is represented with 
`or(isnotnull(exp), null)`.
+ */
+object UnwrapCast extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case l: LogicalPlan => l transformExpressionsUp {
+  case e @ BinaryComparison(_, _) => unwrapCast(e)
+}
+  }
+
+  private def unwrapCast(exp: Expression): Expression = exp match {
+case BinaryComparison(Literal(_, _), Cast(_, _, _)) =>
+  // Not a canonical form. In this case we first canonicalize the 
expression by swapping the
+  // literal and cast side, then process the result and swap the literal 
and cast again to
+  // restore the original order.
+  def swap(e: Expression): Expression = e match {
+case GreaterThan(left, right) => LessThan(right, left)
+case GreaterThanOrEqual(left, right) => LessThanOrEqual(right, left)
+case EqualTo(left, right) => EqualTo(right, left)
+case EqualNullSafe(left, right) => EqualNullSafe(right, left)
+case LessThanOrEqual(left, right) => GreaterThanOrEqual(right, left)
+case LessThan(left, right) => GreaterThan(right, left)
+case _ => e
+  }
+
+  swap(unwrapCast(swap(exp)))
+
+case BinaryComparison(Cast(fromExp, _, _), Literal(value, toType))
+  if canImplicitlyCast(fromExp, toType) =>
+
+  // In case both sides have integral type, op

[GitHub] [spark] tanelk commented on pull request #29565: [SPARK-24994][SQL] Simplify casts for literal types

2020-08-29 Thread GitBox


tanelk commented on pull request #29565:
URL: https://github.com/apache/spark/pull/29565#issuecomment-683320988


   I wondered how it handles null literal, might want to add a test case like 
this:
   
   ```
 test("unwrap casts when literal is null") {
   val intLit = Literal.create(null, IntegerType)
   val shortLit = Literal.create(null, ShortType)
   assertEquivalent('a > intLit, 'a > shortLit)
   assertEquivalent('a >= intLit, 'a >= shortLit)
   assertEquivalent('a === intLit, 'a === shortLit)
   assertEquivalent('a <=> intLit, 'a <=> shortLit)
   assertEquivalent('a <= intLit, 'a <= shortLit)
   assertEquivalent('a < intLit, 'a < shortLit)
 }
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29228: [SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framework to support apply specified spark configurations.

2020-08-29 Thread GitBox


SparkQA commented on pull request #29228:
URL: https://github.com/apache/spark/pull/29228#issuecomment-683323329


   **[Test build #128018 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128018/testReport)**
 for PR 29228 at commit 
[`1029d26`](https://github.com/apache/spark/commit/1029d2658fa18cfa773fef5ede283a3e7184f438).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `class DAGSchedulerSuite extends SparkFunSuite with 
TempLocalSparkContext with TimeLimits `



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29228: [SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framework to support apply specified spark configurations.

2020-08-29 Thread GitBox


SparkQA removed a comment on pull request #29228:
URL: https://github.com/apache/spark/pull/29228#issuecomment-683305936


   **[Test build #128018 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128018/testReport)**
 for PR 29228 at commit 
[`1029d26`](https://github.com/apache/spark/commit/1029d2658fa18cfa773fef5ede283a3e7184f438).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29228: [SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framework to support apply specified spark configurations.

2020-08-29 Thread GitBox


AmplabJenkins commented on pull request #29228:
URL: https://github.com/apache/spark/pull/29228#issuecomment-683323517







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29228: [SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framework to support apply specified spark configurations.

2020-08-29 Thread GitBox


AmplabJenkins removed a comment on pull request #29228:
URL: https://github.com/apache/spark/pull/29228#issuecomment-683323517







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29556: [WIP][SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

2020-08-29 Thread GitBox


SparkQA commented on pull request #29556:
URL: https://github.com/apache/spark/pull/29556#issuecomment-683323573


   **[Test build #128020 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128020/testReport)**
 for PR 29556 at commit 
[`4c214a9`](https://github.com/apache/spark/commit/4c214a9cd7d5baf74cbb463a4e33d321ba19826e).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29556: [WIP][SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

2020-08-29 Thread GitBox


SparkQA removed a comment on pull request #29556:
URL: https://github.com/apache/spark/pull/29556#issuecomment-683310056


   **[Test build #128020 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128020/testReport)**
 for PR 29556 at commit 
[`4c214a9`](https://github.com/apache/spark/commit/4c214a9cd7d5baf74cbb463a4e33d321ba19826e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29556: [WIP][SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

2020-08-29 Thread GitBox


AmplabJenkins removed a comment on pull request #29556:
URL: https://github.com/apache/spark/pull/29556#issuecomment-683323619


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29556: [WIP][SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

2020-08-29 Thread GitBox


AmplabJenkins commented on pull request #29556:
URL: https://github.com/apache/spark/pull/29556#issuecomment-683323619







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao commented on a change in pull request #29567: [SPARK-32721][SQL] Simplify if clauses with null and boolean

2020-08-29 Thread GitBox


sunchao commented on a change in pull request #29567:
URL: https://github.com/apache/spark/pull/29567#discussion_r479674614



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##
@@ -463,6 +463,8 @@ object SimplifyConditionals extends Rule[LogicalPlan] with 
PredicateHelper {
   case If(Literal(null, _), _, falseValue) => falseValue
   case If(cond, trueValue, falseValue)
 if cond.deterministic && trueValue.semanticEquals(falseValue) => 
trueValue
+  case If(p, l @ Literal(null, _), FalseLiteral) if !p.nullable => And(p, 
l)
+  case If(p, l @ Literal(null, _), TrueLiteral) if !p.nullable => 
Or(Not(p), l)

Review comment:
   I'm yet to find a case where `If(p, null, false)` gets populated to 
datasource. I think most of the time they just get replaced by `false` so 
transforming them to `And(p, null)` doesn't seem to help much.
   
   On the other hand, `Or(Not(p), null)` does get pushed down to datasources.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29556: [WIP][SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

2020-08-29 Thread GitBox


AmplabJenkins removed a comment on pull request #29556:
URL: https://github.com/apache/spark/pull/29556#issuecomment-683323625


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/128020/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Fokko commented on a change in pull request #29563: [SPARK-32719][PYTHON] Add Flake8 check missing imports

2020-08-29 Thread GitBox


Fokko commented on a change in pull request #29563:
URL: https://github.com/apache/spark/pull/29563#discussion_r479676930



##
File path: python/pyspark/sql/readwriter.py
##
@@ -1225,7 +1226,6 @@ def overwrite(self, condition):
 Overwrite rows matching the given filter condition with the contents 
of the data frame in
 the output table.
 """
-condition = _to_java_column(column)

Review comment:
   Thanks for letting me know





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Fokko commented on a change in pull request #29563: [SPARK-32719][PYTHON] Add Flake8 check missing imports

2020-08-29 Thread GitBox


Fokko commented on a change in pull request #29563:
URL: https://github.com/apache/spark/pull/29563#discussion_r479677025



##
File path: dev/tox.ini
##
@@ -19,6 +19,6 @@ max-line-length=100
 
exclude=python/pyspark/cloudpickle/*.py,shared.py,python/docs/source/conf.py,work/*/*.py,python/.eggs/*,dist/*,.git/*
 
 [flake8]
-select = E901,E999,F821,F822,F823,F401
-exclude = 
python/pyspark/cloudpickle/*.py,shared.py,python/docs/source/conf.py,work/*/*.py,python/.eggs/*,dist/*,.git/*
+select = E901,E999,F821,F822,F823,F401,F405
+exclude = 
python/pyspark/cloudpickle/*.py,shared.py,python/docs/source/conf.py,work/*/*.py,python/.eggs/*,dist/*,.git/*,dev/*

Review comment:
   I've reverted the change and fixed the violations in `dev/*` as well 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Fokko commented on a change in pull request #29563: [SPARK-32719][PYTHON] Add Flake8 check missing imports

2020-08-29 Thread GitBox


Fokko commented on a change in pull request #29563:
URL: https://github.com/apache/spark/pull/29563#discussion_r479677282



##
File path: dev/create-release/translate-contributors.py
##
@@ -31,7 +31,10 @@
 import os
 import sys
 
-from releaseutils import *
+import unidecode

Review comment:
   Good one! Updated it in the PR :)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29563: [SPARK-32719][PYTHON] Add Flake8 check missing imports

2020-08-29 Thread GitBox


SparkQA commented on pull request #29563:
URL: https://github.com/apache/spark/pull/29563#issuecomment-683327186


   **[Test build #128021 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128021/testReport)**
 for PR 29563 at commit 
[`06480a7`](https://github.com/apache/spark/commit/06480a7b4e8c5106ac7f7ea0fa9cecd6ea09e0bb).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29563: [SPARK-32719][PYTHON] Add Flake8 check missing imports

2020-08-29 Thread GitBox


AmplabJenkins commented on pull request #29563:
URL: https://github.com/apache/spark/pull/29563#issuecomment-683327341







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29563: [SPARK-32719][PYTHON] Add Flake8 check missing imports

2020-08-29 Thread GitBox


AmplabJenkins removed a comment on pull request #29563:
URL: https://github.com/apache/spark/pull/29563#issuecomment-683327341







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-08-29 Thread GitBox


cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-683336845


   @maropu @HeartSaVioR @cloud-fan @gengliangwang @dongjoon-hyun 
   Gentle ping to confirm whether all looks good.  Thanks! 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #29567: [SPARK-32721][SQL] Simplify if clauses with null and boolean

2020-08-29 Thread GitBox


viirya commented on a change in pull request #29567:
URL: https://github.com/apache/spark/pull/29567#discussion_r479687170



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##
@@ -463,6 +463,8 @@ object SimplifyConditionals extends Rule[LogicalPlan] with 
PredicateHelper {
   case If(Literal(null, _), _, falseValue) => falseValue
   case If(cond, trueValue, falseValue)
 if cond.deterministic && trueValue.semanticEquals(falseValue) => 
trueValue
+  case If(p, l @ Literal(null, _), FalseLiteral) if !p.nullable => And(p, 
l)
+  case If(p, l @ Literal(null, _), TrueLiteral) if !p.nullable => 
Or(Not(p), l)

Review comment:
   Pushdown predicate is somehow different to general predicate. For 
example, IIUC we won't push down `Or(Not(p), null)` because of the null 
literal. Predicate pushdown only rewrites predicates applied to a field column, 
e.g. col > 1.
   
   Just for predicate pushdown, maybe we can transfer `Or(Not(p), null)` to 
just `Not(p)`? Because if `p` is true, the predicate evaluates to null, then we 
filter out the row. If `p` is false, the predicate value is true. So it 
semantically equals to `Not(p)` for predicate used in pushdown.
   
   I'm not sure if I miss anything or edge case, and this also doesn't work for 
nested predicate. Besides, this is unlikely case too I think, we can consider 
if it's worth the changing.
   
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29563: [SPARK-32719][PYTHON] Add Flake8 check missing imports

2020-08-29 Thread GitBox


SparkQA commented on pull request #29563:
URL: https://github.com/apache/spark/pull/29563#issuecomment-683341600


   **[Test build #128021 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128021/testReport)**
 for PR 29563 at commit 
[`06480a7`](https://github.com/apache/spark/commit/06480a7b4e8c5106ac7f7ea0fa9cecd6ea09e0bb).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `class PlanChangeLogger[TreeType <: TreeNode[_]] extends Logging `
 * `class AQEOptimizer(conf: SQLConf) extends RuleExecutor[LogicalPlan] `



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29563: [SPARK-32719][PYTHON] Add Flake8 check missing imports

2020-08-29 Thread GitBox


SparkQA removed a comment on pull request #29563:
URL: https://github.com/apache/spark/pull/29563#issuecomment-683327186


   **[Test build #128021 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128021/testReport)**
 for PR 29563 at commit 
[`06480a7`](https://github.com/apache/spark/commit/06480a7b4e8c5106ac7f7ea0fa9cecd6ea09e0bb).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29563: [SPARK-32719][PYTHON] Add Flake8 check missing imports

2020-08-29 Thread GitBox


AmplabJenkins removed a comment on pull request #29563:
URL: https://github.com/apache/spark/pull/29563#issuecomment-683341799







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   >