[GitHub] [spark] dongjoon-hyun commented on pull request #29755: [SPARK-32884][TESTS] Mark TPCDSQuery*Suite as ExtendedSQLTest

2020-09-14 Thread GitBox


dongjoon-hyun commented on pull request #29755:
URL: https://github.com/apache/spark/pull/29755#issuecomment-692478640


   Thank you, @HyukjinKwon !



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29743: [SPARK-32870][DOCS][SQL] Make sure that all expressions have their ExpressionDescription filled

2020-09-14 Thread GitBox


AmplabJenkins commented on pull request #29743:
URL: https://github.com/apache/spark/pull/29743#issuecomment-692478417







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29743: [SPARK-32870][DOCS][SQL] Make sure that all expressions have their ExpressionDescription filled

2020-09-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29743:
URL: https://github.com/apache/spark/pull/29743#issuecomment-692478417







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29755: [SPARK-32884][TESTS] Mark TPCDSQuery*Suite as ExtendedSQLTest

2020-09-14 Thread GitBox


HyukjinKwon commented on pull request #29755:
URL: https://github.com/apache/spark/pull/29755#issuecomment-692476778


   This is nice. We should mark more tests. Merged to master.
   
   I manually checked the logs and see it's properly excluded/included.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #29755: [SPARK-32884][TESTS] Mark TPCDSQuery*Suite as ExtendedSQLTest

2020-09-14 Thread GitBox


HyukjinKwon closed pull request #29755:
URL: https://github.com/apache/spark/pull/29755


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29734: [SPARK-32861][SQL] GenerateExec should require column ordering

2020-09-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29734:
URL: https://github.com/apache/spark/pull/29734#issuecomment-692475975


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/128675/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29734: [SPARK-32861][SQL] GenerateExec should require column ordering

2020-09-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29734:
URL: https://github.com/apache/spark/pull/29734#issuecomment-692475971


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29755: [SPARK-32884][TESTS] Mark TPCDSQuery*Suite as ExtendedSQLTest

2020-09-14 Thread GitBox


SparkQA commented on pull request #29755:
URL: https://github.com/apache/spark/pull/29755#issuecomment-692476130


   **[Test build #128695 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128695/testReport)**
 for PR 29755 at commit 
[`0b2483d`](https://github.com/apache/spark/commit/0b2483d37c8f74ab76d2bad2deb2ac992ee1d1a3).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29755: [SPARK-32884][TESTS] Mark TPCDSQuery*Suite as ExtendedSQLTest

2020-09-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29755:
URL: https://github.com/apache/spark/pull/29755#issuecomment-692455851







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29734: [SPARK-32861][SQL] GenerateExec should require column ordering

2020-09-14 Thread GitBox


AmplabJenkins commented on pull request #29734:
URL: https://github.com/apache/spark/pull/29734#issuecomment-692475971







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29734: [SPARK-32861][SQL] GenerateExec should require column ordering

2020-09-14 Thread GitBox


SparkQA removed a comment on pull request #29734:
URL: https://github.com/apache/spark/pull/29734#issuecomment-692377261


   **[Test build #128675 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128675/testReport)**
 for PR 29734 at commit 
[`06ca9c1`](https://github.com/apache/spark/commit/06ca9c17da310c88c67204366bf15c1e80057719).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29734: [SPARK-32861][SQL] GenerateExec should require column ordering

2020-09-14 Thread GitBox


SparkQA commented on pull request #29734:
URL: https://github.com/apache/spark/pull/29734#issuecomment-692474596


   **[Test build #128675 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128675/testReport)**
 for PR 29734 at commit 
[`06ca9c1`](https://github.com/apache/spark/commit/06ca9c17da310c88c67204366bf15c1e80057719).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29750: [SPARK-32878][CORE] Avoid scheduling TaskSetManager which has no pending tasks

2020-09-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29750:
URL: https://github.com/apache/spark/pull/29750#issuecomment-692469923


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/128689/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29750: [SPARK-32878][CORE] Avoid scheduling TaskSetManager which has no pending tasks

2020-09-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29750:
URL: https://github.com/apache/spark/pull/29750#issuecomment-692469917


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29754: [WIP][SPARK-32875][CORE][TEST] TaskSchedulerImplSuite: For the pattern of submitTasks + resourceOffers + assert, extract the ge

2020-09-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29754:
URL: https://github.com/apache/spark/pull/29754#issuecomment-692451753







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29754: [WIP][SPARK-32875][CORE][TEST] TaskSchedulerImplSuite: For the pattern of submitTasks + resourceOffers + assert, extract the general method.

2020-09-14 Thread GitBox


SparkQA commented on pull request #29754:
URL: https://github.com/apache/spark/pull/29754#issuecomment-692470147


   **[Test build #128694 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128694/testReport)**
 for PR 29754 at commit 
[`4240990`](https://github.com/apache/spark/commit/42409905619a5035d962a207f5702e1e2a63c739).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29750: [SPARK-32878][CORE] Avoid scheduling TaskSetManager which has no pending tasks

2020-09-14 Thread GitBox


AmplabJenkins commented on pull request #29750:
URL: https://github.com/apache/spark/pull/29750#issuecomment-692469917







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29749: [SPARK-32877][SQL] Fix Hive UDF not support decimal type in complex type

2020-09-14 Thread GitBox


AmplabJenkins commented on pull request #29749:
URL: https://github.com/apache/spark/pull/29749#issuecomment-692469529







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29749: [SPARK-32877][SQL] Fix Hive UDF not support decimal type in complex type

2020-09-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29749:
URL: https://github.com/apache/spark/pull/29749#issuecomment-692469529







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29750: [SPARK-32878][CORE] Avoid scheduling TaskSetManager which has no pending tasks

2020-09-14 Thread GitBox


SparkQA removed a comment on pull request #29750:
URL: https://github.com/apache/spark/pull/29750#issuecomment-692418147


   **[Test build #128689 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128689/testReport)**
 for PR 29750 at commit 
[`718590d`](https://github.com/apache/spark/commit/718590dc2beafcafd570ff13b9a09d06b7cc3326).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29750: [SPARK-32878][CORE] Avoid scheduling TaskSetManager which has no pending tasks

2020-09-14 Thread GitBox


SparkQA commented on pull request #29750:
URL: https://github.com/apache/spark/pull/29750#issuecomment-692469274


   **[Test build #128689 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128689/testReport)**
 for PR 29750 at commit 
[`718590d`](https://github.com/apache/spark/commit/718590dc2beafcafd570ff13b9a09d06b7cc3326).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29722: [SPARK-32850][CORE] Simplify the RPC message flow of decommission

2020-09-14 Thread GitBox


SparkQA commented on pull request #29722:
URL: https://github.com/apache/spark/pull/29722#issuecomment-692455812


   **[Test build #128693 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128693/testReport)**
 for PR 29722 at commit 
[`f64063e`](https://github.com/apache/spark/commit/f64063ed7f3ac2851ddfbdf92b4961fbd1a192c6).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29695: [SPARK-32833][SQL] [WIP]JDBC V2 Datasource aggregate push down

2020-09-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29695:
URL: https://github.com/apache/spark/pull/29695#issuecomment-692455714







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29755: [SPARK-32884][TESTS] Mark TPCDSQuery*Suite as ExtendedSQLTest

2020-09-14 Thread GitBox


AmplabJenkins commented on pull request #29755:
URL: https://github.com/apache/spark/pull/29755#issuecomment-692455851







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29695: [SPARK-32833][SQL] [WIP]JDBC V2 Datasource aggregate push down

2020-09-14 Thread GitBox


AmplabJenkins commented on pull request #29695:
URL: https://github.com/apache/spark/pull/29695#issuecomment-692455714







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29695: [SPARK-32833][SQL] [WIP]JDBC V2 Datasource aggregate push down

2020-09-14 Thread GitBox


SparkQA commented on pull request #29695:
URL: https://github.com/apache/spark/pull/29695#issuecomment-692454729


   **[Test build #128673 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128673/testReport)**
 for PR 29695 at commit 
[`c45a2b6`](https://github.com/apache/spark/commit/c45a2b643f2e83af14f6b0584c5c76cf1e5af4c0).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29695: [SPARK-32833][SQL] [WIP]JDBC V2 Datasource aggregate push down

2020-09-14 Thread GitBox


SparkQA removed a comment on pull request #29695:
URL: https://github.com/apache/spark/pull/29695#issuecomment-692343374


   **[Test build #128673 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128673/testReport)**
 for PR 29695 at commit 
[`c45a2b6`](https://github.com/apache/spark/commit/c45a2b643f2e83af14f6b0584c5c76cf1e5af4c0).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun opened a new pull request #29755: [SPARK-32884][TESTS] Mark TPCDSQuery*Suite as ExtendedSQLTest

2020-09-14 Thread GitBox


dongjoon-hyun opened a new pull request #29755:
URL: https://github.com/apache/spark/pull/29755


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29750: [SPARK-32878][CORE] Avoid scheduling TaskSetManager which has no pending tasks

2020-09-14 Thread GitBox


dongjoon-hyun commented on pull request #29750:
URL: https://github.com/apache/spark/pull/29750#issuecomment-692452696


   Thank you, @Ngone51 and all!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #29750: [SPARK-32878][CORE] Avoid scheduling TaskSetManager which has no pending tasks

2020-09-14 Thread GitBox


dongjoon-hyun closed pull request #29750:
URL: https://github.com/apache/spark/pull/29750


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29750: [SPARK-32878][CORE] Avoid scheduling TaskSetManager which has no pending tasks

2020-09-14 Thread GitBox


dongjoon-hyun commented on pull request #29750:
URL: https://github.com/apache/spark/pull/29750#issuecomment-692452395


   Merged to master for Apache Spark 3.1.0 on December 2020.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29558: [SPARK-32715][CORE] Fix memory leak when failed to store pieces of broadcast

2020-09-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29558:
URL: https://github.com/apache/spark/pull/29558#issuecomment-692451876







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29743: [SPARK-32870][DOCS][SQL] Make sure that all expressions have their ExpressionDescription filled

2020-09-14 Thread GitBox


SparkQA commented on pull request #29743:
URL: https://github.com/apache/spark/pull/29743#issuecomment-692452000


   **[Test build #128692 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128692/testReport)**
 for PR 29743 at commit 
[`499c32c`](https://github.com/apache/spark/commit/499c32c16bdbf6e08b792f027c6e8c943c68eacb).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29754: [WIP][SPARK-32875][CORE][TEST] TaskSchedulerImplSuite: For the pattern of submitTasks + resourceOffers + assert, extract the general me

2020-09-14 Thread GitBox


AmplabJenkins commented on pull request #29754:
URL: https://github.com/apache/spark/pull/29754#issuecomment-692451753







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29558: [SPARK-32715][CORE] Fix memory leak when failed to store pieces of broadcast

2020-09-14 Thread GitBox


AmplabJenkins commented on pull request #29558:
URL: https://github.com/apache/spark/pull/29558#issuecomment-692451876







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29558: [SPARK-32715][CORE] Fix memory leak when failed to store pieces of broadcast

2020-09-14 Thread GitBox


SparkQA commented on pull request #29558:
URL: https://github.com/apache/spark/pull/29558#issuecomment-692451008


   **[Test build #128684 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128684/testReport)**
 for PR 29558 at commit 
[`b565c33`](https://github.com/apache/spark/commit/b565c338390b9c54404bf55572b13e6a9c656e7d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29558: [SPARK-32715][CORE] Fix memory leak when failed to store pieces of broadcast

2020-09-14 Thread GitBox


SparkQA removed a comment on pull request #29558:
URL: https://github.com/apache/spark/pull/29558#issuecomment-692391473


   **[Test build #128684 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128684/testReport)**
 for PR 29558 at commit 
[`b565c33`](https://github.com/apache/spark/commit/b565c338390b9c54404bf55572b13e6a9c656e7d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer opened a new pull request #29754: [WIP][SPARK-32875][CORE][TEST] TaskSchedulerImplSuite: For the pattern of submitTasks + resourceOffers + assert, extract the general method.

2020-09-14 Thread GitBox


beliefer opened a new pull request #29754:
URL: https://github.com/apache/spark/pull/29754


   ### What changes were proposed in this pull request?
   TaskSchedulerImplSuite always check the results show below:
   ```
  val zeroCoreWorkerOffers = IndexedSeq(new WorkerOffer("executor0", 
"host0", 0),
 new WorkerOffer("executor1", "host1", 0))
   val taskSet = FakeTask.createTaskSet(1)
   taskScheduler.submitTasks(taskSet)
   var taskDescriptions = 
taskScheduler.resourceOffers(zeroCoreWorkerOffers).flatten
   assert(0 === taskDescriptions.length)
   ```
   We can extract it as a generic method.
   
   ### Why are the changes needed?
   Extract a generic method.
   
   
   ### Does this PR introduce _any_ user-facing change?
   'No'.
   
   
   ### How was this patch tested?
   Jenkins test
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29722: [SPARK-32850][CORE] Simplify the RPC message flow of decommission

2020-09-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29722:
URL: https://github.com/apache/spark/pull/29722#issuecomment-692446530







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29722: [SPARK-32850][CORE] Simplify the RPC message flow of decommission

2020-09-14 Thread GitBox


AmplabJenkins commented on pull request #29722:
URL: https://github.com/apache/spark/pull/29722#issuecomment-692446530







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29743: [SPARK-32870][DOCS][SQL] Make sure that all expressions have their ExpressionDescription filled

2020-09-14 Thread GitBox


AmplabJenkins commented on pull request #29743:
URL: https://github.com/apache/spark/pull/29743#issuecomment-692444719







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29743: [SPARK-32870][DOCS][SQL] Make sure that all expressions have their ExpressionDescription filled

2020-09-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29743:
URL: https://github.com/apache/spark/pull/29743#issuecomment-692444719







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #29734: [SPARK-32861][SQL] GenerateExec should require column ordering

2020-09-14 Thread GitBox


viirya commented on a change in pull request #29734:
URL: https://github.com/apache/spark/pull/29734#discussion_r488359058



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/RemoveRedundantProjectsSuite.scala
##
@@ -115,9 +119,40 @@ class RemoveRedundantProjectsSuite extends QueryTest with 
SharedSparkSession wit
 assertProjectExec(query, 1, 2)
   }
 
-  test("generate") {
-val query = "select a, key, explode(d) from testView where a > 10"
-assertProjectExec(query, 0, 1)
+  test("generate should require column ordering") {
+withTempView("testData") {
+  spark.range(0, 10, 1)
+.selectExpr("id as key", "id * 2 as a", "id * 3 as b")
+.createOrReplaceTempView("testData")
+
+  val data = sql("select key, a, b, count(*) from testData group by key, 
a, b limit 2")
+  val df = data.selectExpr("a", "b", "key", "explode(array(key, a, b)) as 
d").filter("d > 0")
+  df.collect()
+  val plan = df.queryExecution.executedPlan
+  val numProjects = collectWithSubqueries(plan) { case p: ProjectExec => p 
}.length
+
+  // Create a new plan that reverse the GenerateExec output and add a new 
ProjectExec between
+  // GenerateExec and its child. This is to test if the ProjectExec is 
removed, the output of
+  // the query will be incorrect.
+  val newPlan = stripAQEPlan(plan) transform {
+case g @ GenerateExec(_, requiredChildOutput, _, _, child) =>
+  g.copy(requiredChildOutput = requiredChildOutput.reverse,
+child = ProjectExec(requiredChildOutput.reverse, child))
+  }
+
+  // Re-apply remove redundant project rule.
+  val rule = RemoveRedundantProjects(spark.sessionState.conf)
+  val newExecutedPlan = rule.apply(newPlan)
+  // The manually added ProjectExec node shouldn't be removed.
+  assert(collectWithSubqueries(newExecutedPlan) {
+case p: ProjectExec => p }.size == numProjects + 1)

Review comment:
   The style looks weird.
   
   ```scala
   assert(collectWithSubqueries(newExecutedPlan) {
 case p: ProjectExec => p
   }.size == numProjects + 1)
   
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tanelk commented on a change in pull request #29743: [SPARK-32870][DOCS][SQL] Make sure that all expressions have their ExpressionDescription filled

2020-09-14 Thread GitBox


tanelk commented on a change in pull request #29743:
URL: https://github.com/apache/spark/pull/29743#discussion_r488356430



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/expressions/ExpressionInfoSuite.scala
##
@@ -105,11 +105,34 @@ class ExpressionInfoSuite extends SparkFunSuite with 
SharedSparkSession {
 }
   }
 
+  test("SPARK-32870: Default expressions in FunctionRegistry should have their 
" +
+"usage, examples and since filled") {
+val ignoreSet = Set(
+  "org.apache.spark.sql.catalyst.expressions.TimeWindow")
+
+spark.sessionState.functionRegistry.listFunction().foreach { funcId =>
+  val info = spark.sessionState.catalog.lookupFunctionInfo(funcId)
+  if (!ignoreSet.contains(info.getClassName)) {
+withClue(s"Function '${info.getName}', Expression class 
'${info.getClassName}'") {
+  assert(info.getUsage.nonEmpty)
+  assert(info.getExamples.startsWith("\nExamples:\n"))
+  assert(info.getExamples.endsWith("\n  "))
+  assert(info.getSince.matches("[0-9]+\\.[0-9]+\\.[0-9]+"))

Review comment:
   These ones are a bit more strict. 
   For example this extra space made the Example for `typeof` not appear in 
http://spark.apache.org/docs/latest/api/sql/#typeof
   
   
https://github.com/apache/spark/pull/29743/files#diff-25282ab1377a3d87999d1d0d7a8ec270R208-R214
   
   I didn't want to add these checks to the constructor, because I'm afraid 
that they might break some UDFs and also I have no way to exclude some from the 
checks.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tanelk commented on pull request #29515: [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double

2020-09-14 Thread GitBox


tanelk commented on pull request #29515:
URL: https://github.com/apache/spark/pull/29515#issuecomment-692432165


   > > @cloud-fan, now that #29647 is merged, can this be merged also?
   > 
   > Are all the bugs that this PR found already fixed now?
   
   I believe, that they were the manifestation of the `0.0 != -0.0`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29739: [SPARK-32867][SQL] When explain, HiveTableRelation show limited message

2020-09-14 Thread GitBox


SparkQA commented on pull request #29739:
URL: https://github.com/apache/spark/pull/29739#issuecomment-692428920


   **[Test build #128691 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128691/testReport)**
 for PR 29739 at commit 
[`099c232`](https://github.com/apache/spark/commit/099c232c0cf3904654589b4a7b2c51c52c5e000b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #29739: [SPARK-32867][SQL] When explain, HiveTableRelation show limited message

2020-09-14 Thread GitBox


AngersZh commented on pull request #29739:
URL: https://github.com/apache/spark/pull/29739#issuecomment-692428424


   > Please update the PR description about how-to-fix.
   
   Done



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #29688: [SPARK-32827][SQL] Add spark.sql.maxMetadataStringLength config

2020-09-14 Thread GitBox


AngersZh commented on pull request #29688:
URL: https://github.com/apache/spark/pull/29688#issuecomment-692427476


   > @ulysses-you thanks for the work.
   > It seems that the PR only changes the file source related metadata. Is 
there other places we can use the new config as well?
   
   I will use this config in my pr https://github.com/apache/spark/pull/29739.
   Similar work in my pr and reverted waiting this.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29739: [SPARK-32867][SQL] When explain, HiveTableRelation show limited message

2020-09-14 Thread GitBox


AmplabJenkins commented on pull request #29739:
URL: https://github.com/apache/spark/pull/29739#issuecomment-692427206







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29739: [SPARK-32867][SQL] When explain, HiveTableRelation show limited message

2020-09-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29739:
URL: https://github.com/apache/spark/pull/29739#issuecomment-692426663







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29739: [SPARK-32867][SQL] When explain, HiveTableRelation show limited message

2020-09-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29739:
URL: https://github.com/apache/spark/pull/29739#issuecomment-692426652


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29739: [SPARK-32867][SQL] When explain, HiveTableRelation show limited message

2020-09-14 Thread GitBox


SparkQA removed a comment on pull request #29739:
URL: https://github.com/apache/spark/pull/29739#issuecomment-692420232


   **[Test build #128690 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128690/testReport)**
 for PR 29739 at commit 
[`543b853`](https://github.com/apache/spark/commit/543b85347d09bf1d254486c40eba6809df9881f6).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29739: [SPARK-32867][SQL] When explain, HiveTableRelation show limited message

2020-09-14 Thread GitBox


AmplabJenkins commented on pull request #29739:
URL: https://github.com/apache/spark/pull/29739#issuecomment-692426652







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29739: [SPARK-32867][SQL] When explain, HiveTableRelation show limited message

2020-09-14 Thread GitBox


SparkQA commented on pull request #29739:
URL: https://github.com/apache/spark/pull/29739#issuecomment-692426576


   **[Test build #128690 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128690/testReport)**
 for PR 29739 at commit 
[`543b853`](https://github.com/apache/spark/commit/543b85347d09bf1d254486c40eba6809df9881f6).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #29722: [SPARK-32850][CORE] Simplify the RPC message flow of decommission

2020-09-14 Thread GitBox


Ngone51 commented on a change in pull request #29722:
URL: https://github.com/apache/spark/pull/29722#discussion_r488343168



##
File path: 
core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
##
@@ -79,12 +79,23 @@ private[spark] class CoarseGrainedExecutorBackend(
*/
   private[executor] val taskResources = new mutable.HashMap[Long, Map[String, 
ResourceInformation]]
 
-  @volatile private var decommissioned = false
+  private var decommissioned = false
 
   override def onStart(): Unit = {
-logInfo("Registering PWR handler.")
-SignalUtils.register("PWR", "Failed to register SIGPWR handler - " +
-  "disabling decommission feature.")(decommissionSelf)
+if (env.conf.get(DECOMMISSION_ENABLED)) {
+  logInfo("Registering PWR handler to trigger decommissioning.")
+  SignalUtils.register("PWR", "Failed to register SIGPWR handler - " +
+  "disabling executor decommission feature.") {
+self.send(DecommissionExecutor)

Review comment:
   I'm ok with adding a new mesage.
   
   WDYT? @cloud-fan 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29739: [SPARK-32867][SQL] When explain, HiveTableRelation show limited message

2020-09-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29739:
URL: https://github.com/apache/spark/pull/29739#issuecomment-692420551







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29739: [SPARK-32867][SQL] When explain, HiveTableRelation show limited message

2020-09-14 Thread GitBox


AmplabJenkins commented on pull request #29739:
URL: https://github.com/apache/spark/pull/29739#issuecomment-692420551







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29739: [SPARK-32867][SQL] When explain, HiveTableRelation show limited message

2020-09-14 Thread GitBox


SparkQA commented on pull request #29739:
URL: https://github.com/apache/spark/pull/29739#issuecomment-692420232


   **[Test build #128690 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128690/testReport)**
 for PR 29739 at commit 
[`543b853`](https://github.com/apache/spark/commit/543b85347d09bf1d254486c40eba6809df9881f6).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #29739: [SPARK-32867][SQL] When explain, HiveTableRelation show limited message

2020-09-14 Thread GitBox


AngersZh commented on pull request #29739:
URL: https://github.com/apache/spark/pull/29739#issuecomment-692420213


   > Could you add tests, too?
   
   Sure, just check HiveTableScanExec's simpleString will be ok?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #29739: [SPARK-32867][SQL] When explain, HiveTableRelation show limited message

2020-09-14 Thread GitBox


maropu commented on pull request #29739:
URL: https://github.com/apache/spark/pull/29739#issuecomment-692419605


   Could you add tests, too?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #29739: [SPARK-32867][SQL] When explain, HiveTableRelation show limited message

2020-09-14 Thread GitBox


maropu commented on a change in pull request #29739:
URL: https://github.com/apache/spark/pull/29739#discussion_r488340537



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
##
@@ -55,7 +55,7 @@ trait DataSourceScanExec extends LeafExecNode {
   // Metadata that describes more details of this scan.
   protected def metadata: Map[String, String]
 
-  protected val maxMetadataValueLength = 100
+  protected val maxMetadataValueLength = conf.maxMetadataValueLength

Review comment:
   link: https://github.com/apache/spark/pull/29688





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29739: [SPARK-32867][SQL] When explain, HiveTableRelation show limited message

2020-09-14 Thread GitBox


AngersZh commented on a change in pull request #29739:
URL: https://github.com/apache/spark/pull/29739#discussion_r488340103



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
##
@@ -55,7 +55,7 @@ trait DataSourceScanExec extends LeafExecNode {
   // Metadata that describes more details of this scan.
   protected def metadata: Map[String, String]
 
-  protected val maxMetadataValueLength = 100
+  protected val maxMetadataValueLength = conf.maxMetadataValueLength

Review comment:
   > Have you checked the related PR? 
https://github.com/apache/spark/pull/29688/files#diff-2a91a9a59953aa82fa132aaf45bd731bR58
   
   Sorry, don't notice that pr. I have similar ideal like that pr too.  I'm 
going to restore this change and deal with the two PR conflicts  finally.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29750: [SPARK-32878][CORE] Avoid scheduling TaskSetManager which has no pending tasks

2020-09-14 Thread GitBox


SparkQA commented on pull request #29750:
URL: https://github.com/apache/spark/pull/29750#issuecomment-692418147


   **[Test build #128689 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128689/testReport)**
 for PR 29750 at commit 
[`718590d`](https://github.com/apache/spark/commit/718590dc2beafcafd570ff13b9a09d06b7cc3326).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #29739: [SPARK-32867][SQL] When explain, HiveTableRelation show limited message

2020-09-14 Thread GitBox


maropu commented on pull request #29739:
URL: https://github.com/apache/spark/pull/29739#issuecomment-692417507


   > As mentioned above, I use similar way to construct HiveTableRelation's 
simpleString.
   
   Please update the PR description about how-to-fix.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #29739: [SPARK-32867][SQL] When explain, HiveTableRelation show limited message

2020-09-14 Thread GitBox


maropu commented on a change in pull request #29739:
URL: https://github.com/apache/spark/pull/29739#discussion_r488338816



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
##
@@ -55,7 +55,7 @@ trait DataSourceScanExec extends LeafExecNode {
   // Metadata that describes more details of this scan.
   protected def metadata: Map[String, String]
 
-  protected val maxMetadataValueLength = 100
+  protected val maxMetadataValueLength = conf.maxMetadataValueLength

Review comment:
   Have you checked the related PR? 
https://github.com/apache/spark/pull/29688/files#diff-2a91a9a59953aa82fa132aaf45bd731bR58





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29750: [SPARK-32878][CORE] Avoid scheduling TaskSetManager which has no pending tasks

2020-09-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29750:
URL: https://github.com/apache/spark/pull/29750#issuecomment-692416281







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29750: [SPARK-32878][CORE] Avoid scheduling TaskSetManager which has no pending tasks

2020-09-14 Thread GitBox


AmplabJenkins commented on pull request #29750:
URL: https://github.com/apache/spark/pull/29750#issuecomment-692416281







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on pull request #29750: [SPARK-32878][CORE] Avoid scheduling TaskSetManager which has no pending tasks

2020-09-14 Thread GitBox


Ngone51 commented on pull request #29750:
URL: https://github.com/apache/spark/pull/29750#issuecomment-692416287


   Thanks all! I've addressed the comment.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29515: [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double

2020-09-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29515:
URL: https://github.com/apache/spark/pull/29515#issuecomment-692411971







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #29720: [SPARK-32849][PYSPARK] Add default values for non-required keys when creating StructType

2020-09-14 Thread GitBox


HyukjinKwon commented on a change in pull request #29720:
URL: https://github.com/apache/spark/pull/29720#discussion_r488334221



##
File path: python/pyspark/sql/types.py
##
@@ -305,7 +305,7 @@ def jsonValue(self):
 @classmethod
 def fromJson(cls, json):

Review comment:
   I don't think the format is meant to be exposed from end users. It's 
just for internal ser/de purpose that is supposed to be roundtrip between 
`jsonValue` and `fromJson` to workaround Py4J limitation.
   
   BTW, here isn't only place where you should handle JSON ser/de. For example, 
you should also change Scala side at `DataType.parseDataType`. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29515: [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double

2020-09-14 Thread GitBox


AmplabJenkins commented on pull request #29515:
URL: https://github.com/apache/spark/pull/29515#issuecomment-692411971







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #29739: [SPARK-32867][SQL] When explain, HiveTableRelation show limited message

2020-09-14 Thread GitBox


AngersZh commented on pull request #29739:
URL: https://github.com/apache/spark/pull/29739#issuecomment-692411754


   @cloud-fan As mentioned above, I use similar way to construct 
HiveTableRelation's simpleString.  Compared to what we had before,I decrease 
the detail metadata of each partition and only retain the partSpec to show each 
partition was pruned.
   since for detail information, we always don't see this in Plan but to use 
`DESC EXTENDED` statement.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29515: [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double

2020-09-14 Thread GitBox


SparkQA commented on pull request #29515:
URL: https://github.com/apache/spark/pull/29515#issuecomment-692411732


   **[Test build #128688 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128688/testReport)**
 for PR 29515 at commit 
[`529aba8`](https://github.com/apache/spark/commit/529aba8e7207a22039aebbe379d9d27035a143c1).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #29515: [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double

2020-09-14 Thread GitBox


maropu commented on pull request #29515:
URL: https://github.com/apache/spark/pull/29515#issuecomment-692410536


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #29739: [SPARK-32867][SQL] When explain, HiveTableRelation show limited message

2020-09-14 Thread GitBox


AngersZh commented on pull request #29739:
URL: https://github.com/apache/spark/pull/29739#issuecomment-692410347


   For native file source, such as FileSourceScanExec, it will collect metadata 
of needed information:
   ```
 override lazy val metadata: Map[String, String] = {
   def seqToString(seq: Seq[Any]) = seq.mkString("[", ", ", "]")
   val location = relation.location
   val locationDesc =
 location.getClass.getSimpleName +
   Utils.buildLocationMetadata(location.rootPaths, 
maxMetadataValueLength)
   val metadata =
 Map(
   "Format" -> relation.fileFormat.toString,
   "ReadSchema" -> requiredSchema.catalogString,
   "Batched" -> supportsColumnar.toString,
   "PartitionFilters" -> seqToString(partitionFilters),
   "PushedFilters" -> seqToString(pushedDownFilters),
   "DataFilters" -> seqToString(dataFilters),
   "Location" -> locationDesc)
   
   val withSelectedBucketsCount = relation.bucketSpec.map { spec =>
 val numSelectedBuckets = optionalBucketSet.map { b =>
   b.cardinality()
 } getOrElse {
   spec.numBuckets
 }
 metadata + ("SelectedBucketsCount" ->
   (s"$numSelectedBuckets out of ${spec.numBuckets}" +
 optionalNumCoalescedBuckets.map { b => s" (Coalesced to 
$b)"}.getOrElse("")))
   } getOrElse {
 metadata
   }
   
   withSelectedBucketsCount
 }
   ```
   Then in his parent class  `DataSourceScanExec`, it will construct 
`simpleString` with `metadata`. In `simpleString`
   ```
 override def simpleString(maxFields: Int): String = {
   val metadataEntries = metadata.toSeq.sorted.map {
 case (key, value) =>
   key + ": " + StringUtils.abbreviate(redact(value), 
maxMetadataValueLength)
   }
   val metadataStr = truncatedString(metadataEntries, " ", ", ", "", 
maxFields)
   redact(
 s"$nodeNamePrefix$nodeName${truncatedString(output, "[", ",", "]", 
maxFields)}$metadataStr")
 }
   ```
   each info will be abbreviated to length of 100. So FileSourceScan's  
FileIndex message will be hided because of length of 100.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #29515: [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double

2020-09-14 Thread GitBox


maropu commented on pull request #29515:
URL: https://github.com/apache/spark/pull/29515#issuecomment-692410205


   > @cloud-fan, now that #29647 is merged, can this be merged also?
   
   Are all the bugs that this PR found already fixed now?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29749: [SPARK-32877][SQL] Fix Hive UDF not support decimal type in complex type

2020-09-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29749:
URL: https://github.com/apache/spark/pull/29749#issuecomment-692409521


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/128674/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29749: [SPARK-32877][SQL] Fix Hive UDF not support decimal type in complex type

2020-09-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29749:
URL: https://github.com/apache/spark/pull/29749#issuecomment-692409518


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29749: [SPARK-32877][SQL] Fix Hive UDF not support decimal type in complex type

2020-09-14 Thread GitBox


AmplabJenkins commented on pull request #29749:
URL: https://github.com/apache/spark/pull/29749#issuecomment-692409518







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #29655: [SPARK-32806][SQL] SortMergeJoin with partial hash distribution can be optimized to remove shuffle

2020-09-14 Thread GitBox


maropu commented on pull request #29655:
URL: https://github.com/apache/spark/pull/29655#issuecomment-692409151


   > we added #19054 in our internal fork and don't see much OOM issues.
   
   Even so, I think removing shuffles in the middles of stages (e.g., many join 
cases) can make the prob. of OOM higher in theory in case of data skew. Since 
we can control input distributions somewhat, e.g.,  by the bucketing technique, 
it might be worth trying the restrictive approach that @imback82 suggested 
above, I think.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29749: [SPARK-32877][SQL] Fix Hive UDF not support decimal type in complex type

2020-09-14 Thread GitBox


SparkQA removed a comment on pull request #29749:
URL: https://github.com/apache/spark/pull/29749#issuecomment-692373281


   **[Test build #128674 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128674/testReport)**
 for PR 29749 at commit 
[`e73ccbf`](https://github.com/apache/spark/commit/e73ccbf3be4b29714c3da1cd0ddefe2b51095b59).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29749: [SPARK-32877][SQL] Fix Hive UDF not support decimal type in complex type

2020-09-14 Thread GitBox


SparkQA commented on pull request #29749:
URL: https://github.com/apache/spark/pull/29749#issuecomment-692408690


   **[Test build #128674 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128674/testReport)**
 for PR 29749 at commit 
[`e73ccbf`](https://github.com/apache/spark/commit/e73ccbf3be4b29714c3da1cd0ddefe2b51095b59).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29750: [SPARK-32878][CORE] Avoid scheduling TaskSetManager which has no pending tasks

2020-09-14 Thread GitBox


dongjoon-hyun commented on a change in pull request #29750:
URL: https://github.com/apache/spark/pull/29750#discussion_r488330538



##
File path: core/src/main/scala/org/apache/spark/scheduler/Pool.scala
##
@@ -107,7 +109,7 @@ private[spark] class Pool(
 for (schedulable <- sortedSchedulableQueue) {
   sortedTaskSetQueue ++= schedulable.getSortedTaskSetQueue
 }
-sortedTaskSetQueue
+sortedTaskSetQueue.filter(_.isSchedulable)

Review comment:
   +1





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] jroof88 commented on a change in pull request #29720: [SPARK-32849][PYSPARK] Add default values for non-required keys when creating StructType

2020-09-14 Thread GitBox


jroof88 commented on a change in pull request #29720:
URL: https://github.com/apache/spark/pull/29720#discussion_r488327546



##
File path: python/pyspark/sql/types.py
##
@@ -305,7 +305,7 @@ def jsonValue(self):
 @classmethod
 def fromJson(cls, json):

Review comment:
   Correct we have a use case where we build up the JSON elsewhere and we 
don't want to have to require the default keys. It drives down complexity when 
defining schemas in external JSON files





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] jroof88 commented on a change in pull request #29720: [SPARK-32849][PYSPARK] Add default values for non-required keys when creating StructType

2020-09-14 Thread GitBox


jroof88 commented on a change in pull request #29720:
URL: https://github.com/apache/spark/pull/29720#discussion_r488327546



##
File path: python/pyspark/sql/types.py
##
@@ -305,7 +305,7 @@ def jsonValue(self):
 @classmethod
 def fromJson(cls, json):

Review comment:
   Correct we have a use case where we build up the JSON elsewhere and we 
don't want to have to require the default keys.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #29558: [SPARK-32715][CORE] Fix memory leak when failed to store pieces of broadcast

2020-09-14 Thread GitBox


dongjoon-hyun closed pull request #29558:
URL: https://github.com/apache/spark/pull/29558


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] jroof88 commented on a change in pull request #29720: [SPARK-32849][PYSPARK] Add default values for non-required keys when creating StructType

2020-09-14 Thread GitBox


jroof88 commented on a change in pull request #29720:
URL: https://github.com/apache/spark/pull/29720#discussion_r488326118



##
File path: python/pyspark/sql/types.py
##
@@ -305,7 +305,7 @@ def jsonValue(self):
 @classmethod
 def fromJson(cls, json):

Review comment:
   Added!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29587: [SPARK-32376][SQL] Make unionByName null-filling behavior work with struct columns

2020-09-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29587:
URL: https://github.com/apache/spark/pull/29587#issuecomment-692403734







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #29720: [SPARK-32849][PYSPARK] Add default values for non-required keys when creating StructType

2020-09-14 Thread GitBox


HyukjinKwon commented on a change in pull request #29720:
URL: https://github.com/apache/spark/pull/29720#discussion_r488326022



##
File path: python/pyspark/sql/types.py
##
@@ -305,7 +305,7 @@ def jsonValue(self):
 @classmethod
 def fromJson(cls, json):

Review comment:
   I mean I am trying to understand why you want this change. For example, 
does it affect anything in the roundtrip between `jsonValue` and `fromJson`, or 
are you trying to build up the JSON by yourself somewhere?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29587: [SPARK-32376][SQL] Make unionByName null-filling behavior work with struct columns

2020-09-14 Thread GitBox


AmplabJenkins commented on pull request #29587:
URL: https://github.com/apache/spark/pull/29587#issuecomment-692403734







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29587: [SPARK-32376][SQL] Make unionByName null-filling behavior work with struct columns

2020-09-14 Thread GitBox


SparkQA commented on pull request #29587:
URL: https://github.com/apache/spark/pull/29587#issuecomment-692403386


   **[Test build #128687 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128687/testReport)**
 for PR 29587 at commit 
[`b4270f4`](https://github.com/apache/spark/commit/b4270f4f8879f3225399c93456beb30e2a2c78e9).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] jroof88 commented on a change in pull request #29720: [SPARK-32849][PYSPARK] Add default values for non-required keys when creating StructType

2020-09-14 Thread GitBox


jroof88 commented on a change in pull request #29720:
URL: https://github.com/apache/spark/pull/29720#discussion_r488324986



##
File path: python/pyspark/sql/types.py
##
@@ -305,7 +305,7 @@ def jsonValue(self):
 @classmethod
 def fromJson(cls, json):

Review comment:
   Right so the default value for `containsNull` for `ArrayType` is `True` 
so this test shows that without supplying it in the JSON or Constructor you get 
the same result. I will add another `assert` for the resulting JSON.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29558: [SPARK-32715][CORE] Fix memory leak when failed to store pieces of broadcast

2020-09-14 Thread GitBox


dongjoon-hyun commented on pull request #29558:
URL: https://github.com/apache/spark/pull/29558#issuecomment-692402763


   Thank you for update, @LantaoJin .



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #29587: [SPARK-32376][SQL] Make unionByName null-filling behavior work with struct columns

2020-09-14 Thread GitBox


viirya commented on a change in pull request #29587:
URL: https://github.com/apache/spark/pull/29587#discussion_r488324577



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameSetOperationsSuite.scala
##
@@ -507,33 +507,156 @@ class DataFrameSetOperationsSuite extends QueryTest with 
SharedSparkSession {
   }
 
   test("SPARK-29358: Make unionByName optionally fill missing columns with 
nulls") {
-var df1 = Seq(1, 2, 3).toDF("a")
-var df2 = Seq(3, 1, 2).toDF("b")
-val df3 = Seq(2, 3, 1).toDF("c")
-val unionDf = df1.unionByName(df2.unionByName(df3, true), true)
-checkAnswer(unionDf,
-  Row(1, null, null) :: Row(2, null, null) :: Row(3, null, null) :: // df1
-Row(null, 3, null) :: Row(null, 1, null) :: Row(null, 2, null) :: // 
df2
-Row(null, null, 2) :: Row(null, null, 3) :: Row(null, null, 1) :: Nil 
// df3
-)
+Seq("true", "false").foreach { config =>
+  withSQLConf(SQLConf.UNION_BYNAME_STRUCT_SUPPORT_ENABLED.key -> config) {
+var df1 = Seq(1, 2, 3).toDF("a")
+var df2 = Seq(3, 1, 2).toDF("b")
+val df3 = Seq(2, 3, 1).toDF("c")
+val unionDf = df1.unionByName(df2.unionByName(df3, true), true)
+checkAnswer(unionDf,
+  Row(1, null, null) :: Row(2, null, null) :: Row(3, null, null) :: // 
df1
+Row(null, 3, null) :: Row(null, 1, null) :: Row(null, 2, null) :: 
// df2
+Row(null, null, 2) :: Row(null, null, 3) :: Row(null, null, 1) :: 
Nil // df3
+)
+
+df1 = Seq((1, 2)).toDF("a", "c")
+df2 = Seq((3, 4, 5)).toDF("a", "b", "c")
+checkAnswer(df1.unionByName(df2, true),
+  Row(1, 2, null) :: Row(3, 5, 4) :: Nil)
+checkAnswer(df2.unionByName(df1, true),
+  Row(3, 4, 5) :: Row(1, null, 2) :: Nil)
+
+withSQLConf(SQLConf.CASE_SENSITIVE.key -> "true") {
+  df2 = Seq((3, 4, 5)).toDF("a", "B", "C")
+  val union1 = df1.unionByName(df2, true)
+  val union2 = df2.unionByName(df1, true)
+
+  checkAnswer(union1, Row(1, 2, null, null) :: Row(3, null, 4, 5) :: 
Nil)
+  checkAnswer(union2, Row(3, 4, 5, null) :: Row(1, null, null, 2) :: 
Nil)
+
+  assert(union1.schema.fieldNames === Array("a", "c", "B", "C"))
+  assert(union2.schema.fieldNames === Array("a", "B", "C", "c"))
+}
+  }
+}
+  }
 
-df1 = Seq((1, 2)).toDF("a", "c")
-df2 = Seq((3, 4, 5)).toDF("a", "b", "c")
-checkAnswer(df1.unionByName(df2, true),
-  Row(1, 2, null) :: Row(3, 5, 4) :: Nil)
-checkAnswer(df2.unionByName(df1, true),
-  Row(3, 4, 5) :: Row(1, null, 2) :: Nil)
+  test("SPARK-32376: Make unionByName null-filling behavior work with struct 
columns - simple") {
+withSQLConf(SQLConf.UNION_BYNAME_STRUCT_SUPPORT_ENABLED.key -> "true") {
+  val df1 = Seq(((1, 2, 3), 0), ((2, 3, 4), 1), ((3, 4, 5), 2)).toDF("a", 
"idx")
+  val df2 = Seq(((3, 4), 0), ((1, 2), 1), ((2, 3), 2)).toDF("a", "idx")
+  val df3 = Seq(((100, 101, 102, 103), 0), ((110, 111, 112, 113), 1), 
((120, 121, 122, 123), 2))
+.toDF("a", "idx")
+
+  var unionDf = df1.unionByName(df2, true)
+
+  checkAnswer(unionDf,
+Row(Row(1, 2, 3), 0) :: Row(Row(2, 3, 4), 1) :: Row(Row(3, 4, 5), 2) ::
+  Row(Row(3, 4, null), 0) :: Row(Row(1, 2, null), 1) :: Row(Row(2, 3, 
null), 2) :: Nil
+  )
+
+  assert(unionDf.schema.toDDL == "`a` STRUCT<`_1`: INT, `_2`: INT, `_3`: 
INT>,`idx` INT")
+
+  unionDf = df1.unionByName(df2, true).unionByName(df3, true)
+
+  checkAnswer(unionDf,
+Row(Row(1, 2, 3, null), 0) ::
+  Row(Row(2, 3, 4, null), 1) ::
+  Row(Row(3, 4, 5, null), 2) :: // df1
+  Row(Row(3, 4, null, null), 0) ::
+  Row(Row(1, 2, null, null), 1) ::
+  Row(Row(2, 3, null, null), 2) :: // df2
+  Row(Row(100, 101, 102, 103), 0) ::
+  Row(Row(110, 111, 112, 113), 1) ::
+  Row(Row(120, 121, 122, 123), 2) :: Nil // df3
+  )
+  assert(unionDf.schema.toDDL ==
+"`a` STRUCT<`_1`: INT, `_2`: INT, `_3`: INT, `_4`: INT>,`idx` INT")
+}
+  }
 
-withSQLConf(SQLConf.CASE_SENSITIVE.key -> "true") {
-  df2 = Seq((3, 4, 5)).toDF("a", "B", "C")
-  val union1 = df1.unionByName(df2, true)
-  val union2 = df2.unionByName(df1, true)
+  test("SPARK-32376: Make unionByName null-filling behavior work with struct 
columns - nested") {
+withSQLConf(SQLConf.UNION_BYNAME_STRUCT_SUPPORT_ENABLED.key -> "true") {
+  val df1 = Seq((0, UnionClass1a(0, 1L, UnionClass2(1, "2".toDF("id", 
"a")
+  val df2 = Seq((1, UnionClass1b(1, 2L, UnionClass3(2, 3L.toDF("id", 
"a")
+
+  val expectedSchema = "`id` INT,`a` STRUCT<`a`: INT, `b`: BIGINT, " +
+"`nested`: STRUCT<`a`: INT, `b`: BIGINT, `c`: STRING>>"
+
+  var unionDf = df1.unionByName(df2, true)
+  checkAnswer(unionDf,
+Row(0, Row(0, 1, Row(1, null, "2"))) ::
+  Row(1, Row(1, 2, 

[GitHub] [spark] AmplabJenkins commented on pull request #29753: [SPARK-32872][CORE][2.4] Prevent BytesToBytesMap at MAX_CAPACITY from exceeding growth threshold

2020-09-14 Thread GitBox


AmplabJenkins commented on pull request #29753:
URL: https://github.com/apache/spark/pull/29753#issuecomment-692401227







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29753: [SPARK-32872][CORE][2.4] Prevent BytesToBytesMap at MAX_CAPACITY from exceeding growth threshold

2020-09-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29753:
URL: https://github.com/apache/spark/pull/29753#issuecomment-692401227







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29753: [SPARK-32872][CORE][2.4] Prevent BytesToBytesMap at MAX_CAPACITY from exceeding growth threshold

2020-09-14 Thread GitBox


dongjoon-hyun commented on pull request #29753:
URL: https://github.com/apache/spark/pull/29753#issuecomment-692401041


   Merged to branch-2.4. Thanks, @ankurdave !



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29753: [SPARK-32872][CORE][2.4] Prevent BytesToBytesMap at MAX_CAPACITY from exceeding growth threshold

2020-09-14 Thread GitBox


SparkQA removed a comment on pull request #29753:
URL: https://github.com/apache/spark/pull/29753#issuecomment-692338389


   **[Test build #128672 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128672/testReport)**
 for PR 29753 at commit 
[`56b7bca`](https://github.com/apache/spark/commit/56b7bca38d9952484ef1030a2a2058d97169c223).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #29753: [SPARK-32872][CORE][2.4] Prevent BytesToBytesMap at MAX_CAPACITY from exceeding growth threshold

2020-09-14 Thread GitBox


dongjoon-hyun closed pull request #29753:
URL: https://github.com/apache/spark/pull/29753


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #29720: [SPARK-32849][PYSPARK] Add default values for non-required keys when creating StructType

2020-09-14 Thread GitBox


HyukjinKwon commented on a change in pull request #29720:
URL: https://github.com/apache/spark/pull/29720#discussion_r488323152



##
File path: python/pyspark/sql/types.py
##
@@ -305,7 +305,7 @@ def jsonValue(self):
 @classmethod
 def fromJson(cls, json):

Review comment:
   It has:
   
   ```python
   >>> ArrayType(StringType()).jsonValue()
   {'type': 'array', 'elementType': 'string', 'containsNull': True}
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   >