[GitHub] [spark] Udbhav30 commented on pull request #29387: [SPARK-32481] Support truncate table to move data to trash

2020-08-07 Thread GitBox


Udbhav30 commented on pull request #29387:
URL: https://github.com/apache/spark/pull/29387#issuecomment-670832233


   cc @dongjoon-hyun please review



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Udbhav30 commented on pull request #29319: [SPARK-32480] Support insert overwrite to move data to trash

2020-08-07 Thread GitBox


Udbhav30 commented on pull request #29319:
URL: https://github.com/apache/spark/pull/29319#issuecomment-670832164


   cc @dongjoon-hyun 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29387: [SPARK-32481] Support truncate table to move data to trash

2020-08-07 Thread GitBox


AmplabJenkins removed a comment on pull request #29387:
URL: https://github.com/apache/spark/pull/29387#issuecomment-670831142


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29125: [SPARK-32018][SQL][3.0] UnsafeRow.setDecimal should set null with overflowed value

2020-08-07 Thread GitBox


dongjoon-hyun commented on pull request #29125:
URL: https://github.com/apache/spark/pull/29125#issuecomment-670831201


   cc @ScrapCodes 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29387: [SPARK-32481] Support truncate table to move data to trash

2020-08-07 Thread GitBox


AmplabJenkins commented on pull request #29387:
URL: https://github.com/apache/spark/pull/29387#issuecomment-670831244


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29125: [SPARK-32018][SQL][3.0] UnsafeRow.setDecimal should set null with overflowed value

2020-08-07 Thread GitBox


dongjoon-hyun edited a comment on pull request #29125:
URL: https://github.com/apache/spark/pull/29125#issuecomment-670831201


   cc @ScrapCodes since he is a release manager for Apache Spark 2.4.7.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29387: [SPARK-32481] Support truncate table to move data to trash

2020-08-07 Thread GitBox


AmplabJenkins commented on pull request #29387:
URL: https://github.com/apache/spark/pull/29387#issuecomment-670831142


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Udbhav30 opened a new pull request #29387: [SPARK-32481] Support truncate table to move data to trash

2020-08-07 Thread GitBox


Udbhav30 opened a new pull request #29387:
URL: https://github.com/apache/spark/pull/29387


   ### What changes were proposed in this pull request?
   Instead of deleting the data, we can move the data to trash.
   Based on the configuration provided by the user it will be deleted 
permanently from the trash.
   
   
   ### Why are the changes needed?
   Instead of directly deleting the data, we can provide flexibility to move 
data to the trash and then delete it permanently.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, After truncate table the data is not permanently deleted now.
   It is first moved to the trash and then after the given time deleted 
permanently;
   
   ### How was this patch tested?
   Manually



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29331: [SPARK-32517][CORE] Add StorageLevel.DISK_ONLY_3

2020-08-07 Thread GitBox


dongjoon-hyun commented on pull request #29331:
URL: https://github.com/apache/spark/pull/29331#issuecomment-670830535


   Thank you, @HyukjinKwon .



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29386: [SPARK-32568][BUILD][SS] Upgrade Kafka to 2.6.0

2020-08-07 Thread GitBox


dongjoon-hyun commented on pull request #29386:
URL: https://github.com/apache/spark/pull/29386#issuecomment-670830282


   Thank you for review and merging, @HyukjinKwon !



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29339: [Spark-32512][SQL] add alter table add/drop partition command for datasourcev2

2020-08-07 Thread GitBox


AmplabJenkins removed a comment on pull request #29339:
URL: https://github.com/apache/spark/pull/29339#issuecomment-670827006







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29339: [Spark-32512][SQL] add alter table add/drop partition command for datasourcev2

2020-08-07 Thread GitBox


AmplabJenkins commented on pull request #29339:
URL: https://github.com/apache/spark/pull/29339#issuecomment-670827006







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29339: [Spark-32512][SQL] add alter table add/drop partition command for datasourcev2

2020-08-07 Thread GitBox


SparkQA removed a comment on pull request #29339:
URL: https://github.com/apache/spark/pull/29339#issuecomment-670801837


   **[Test build #127219 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127219/testReport)**
 for PR 29339 at commit 
[`800c51a`](https://github.com/apache/spark/commit/800c51a486845647b22af6e7beb91f2d279fca18).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29339: [Spark-32512][SQL] add alter table add/drop partition command for datasourcev2

2020-08-07 Thread GitBox


SparkQA commented on pull request #29339:
URL: https://github.com/apache/spark/pull/29339#issuecomment-670826865


   **[Test build #127219 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127219/testReport)**
 for PR 29339 at commit 
[`800c51a`](https://github.com/apache/spark/commit/800c51a486845647b22af6e7beb91f2d279fca18).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec

2020-08-07 Thread GitBox


cloud-fan commented on a change in pull request #29031:
URL: https://github.com/apache/spark/pull/29031#discussion_r467349934



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/RemoveRedundantProjects.scala
##
@@ -78,9 +86,11 @@ case class RemoveRedundantProjects(conf: SQLConf) extends 
Rule[SparkPlan] {
   case d: DataSourceV2ScanExecBase if !d.supportsColumnar => false
   case _ =>
 if (requireOrdering) {
-  project.output.map(_.exprId.id) == child.output.map(_.exprId.id)
+  project.output.map(_.exprId.id) == child.output.map(_.exprId.id) &&
+checkNullability(project.output, child.output)
 } else {
-  project.output.map(_.exprId.id).sorted == 
child.output.map(_.exprId.id).sorted
+  project.output.map(_.exprId.id).sorted == 
child.output.map(_.exprId.id).sorted &&
+checkNullability(project.output, child.output)

Review comment:
   it should be
   ```
   val orderedProjectOutput = project.output.sortBy(_.exprId.id)
   val orderedChildOutput = child.output.sortBy(_.exprId.id)
   orderedProjectOutput.map(_.expr.id) == orderedChildOutput.map(_.exprId.id) &&
 checkNullability(orderedProjectOutput, orderedChildOutput)
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox


AmplabJenkins removed a comment on pull request #28804:
URL: https://github.com/apache/spark/pull/28804#issuecomment-670807750







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox


AmplabJenkins commented on pull request #28804:
URL: https://github.com/apache/spark/pull/28804#issuecomment-670807750







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox


SparkQA removed a comment on pull request #28804:
URL: https://github.com/apache/spark/pull/28804#issuecomment-670725493


   **[Test build #127218 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127218/testReport)**
 for PR 28804 at commit 
[`0a186f0`](https://github.com/apache/spark/commit/0a186f0eb5d71732ec3abc6b42a12dae6594277f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox


SparkQA commented on pull request #28804:
URL: https://github.com/apache/spark/pull/28804#issuecomment-670807546


   **[Test build #127218 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127218/testReport)**
 for PR 28804 at commit 
[`0a186f0`](https://github.com/apache/spark/commit/0a186f0eb5d71732ec3abc6b42a12dae6594277f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #29386: [SPARK-32568][BUILD][SS] Upgrade Kafka to 2.6.0

2020-08-07 Thread GitBox


HyukjinKwon closed pull request #29386:
URL: https://github.com/apache/spark/pull/29386


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29386: [SPARK-32568][BUILD][SS] Upgrade Kafka to 2.6.0

2020-08-07 Thread GitBox


HyukjinKwon commented on pull request #29386:
URL: https://github.com/apache/spark/pull/29386#issuecomment-670804169


   Merged to master.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29331: [SPARK-32517][CORE] Add StorageLevel.DISK_ONLY_3

2020-08-07 Thread GitBox


HyukjinKwon commented on pull request #29331:
URL: https://github.com/apache/spark/pull/29331#issuecomment-670803903


   cc @tgravescs too



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29339: [Spark-32512][SQL] add alter table add/drop partition command for datasourcev2

2020-08-07 Thread GitBox


AmplabJenkins removed a comment on pull request #29339:
URL: https://github.com/apache/spark/pull/29339#issuecomment-670802009







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29339: [Spark-32512][SQL] add alter table add/drop partition command for datasourcev2

2020-08-07 Thread GitBox


AmplabJenkins commented on pull request #29339:
URL: https://github.com/apache/spark/pull/29339#issuecomment-670802009







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29339: [Spark-32512][SQL] add alter table add/drop partition command for datasourcev2

2020-08-07 Thread GitBox


SparkQA commented on pull request #29339:
URL: https://github.com/apache/spark/pull/29339#issuecomment-670801837


   **[Test build #127219 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127219/testReport)**
 for PR 29339 at commit 
[`800c51a`](https://github.com/apache/spark/commit/800c51a486845647b22af6e7beb91f2d279fca18).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #29385: [SPARK-32191][PySpark][DOC] Migration Guide for PySpark docs

2020-08-07 Thread GitBox


HyukjinKwon commented on a change in pull request #29385:
URL: https://github.com/apache/spark/pull/29385#discussion_r467342735



##
File path: python/docs/source/migration_guide/index.rst
##
@@ -20,3 +20,14 @@
 Migration Guide
 ===
 
+Migration Guide: PySpark (Python on Spark)
+
+Note that this migration guide describes the items specific to PySpark.
+Many items of SQL migration can be applied when migrating PySpark to higher 
versions.
+Please refer `Migration Guide: SQL, Datasets and DataFrame 
`_.
+
+.. toctree::
+   :maxdepth: 2
+
+   pyspark_2.4_to_3.0

Review comment:
   Oh yeah, I was thinking just completely move all, and just leave one 
link to redirect. I haven't checked how difficult to add a link back. If that's 
technically difficult, we can just guide users to read PySpark documentation 
without a link.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29385: [SPARK-32191][PySpark][DOC] Migration Guide for PySpark docs

2020-08-07 Thread GitBox


HyukjinKwon commented on pull request #29385:
URL: https://github.com/apache/spark/pull/29385#issuecomment-670801380


   Nice, thank you @viirya! I will take a look next Monday!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] stczwd commented on pull request #29339: [Spark-32512][SQL] add alter table add/drop partition command for datasourcev2

2020-08-07 Thread GitBox


stczwd commented on pull request #29339:
URL: https://github.com/apache/spark/pull/29339#issuecomment-670801145


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

2020-08-07 Thread GitBox


dongjoon-hyun commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670799920


   Got it. I updated my previous comment by marking my misunderstanding.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

2020-08-07 Thread GitBox


dongjoon-hyun edited a comment on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670794507


   Hi, @Fokko . ~It seems that @srowen clearly gave -1 on this approach. 
Although I approved this, we cannot merge your PR if there is a -1.~
   
   ~I must admit that I didn't notice that Sean has been such a strong -1 for 
two weeks while you are working in this way.~ As the one who proposed the 
original direction strongly, sorry about that.
   
   Could you adjust your PR according to @srowen 's comment?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

2020-08-07 Thread GitBox


srowen commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670797950


   No -1 here. I don't object to the current change.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] closed pull request #27026: [SPARK-29596][Web UI] Task duration not updating for running tasks

2020-08-07 Thread GitBox


github-actions[bot] closed pull request #27026:
URL: https://github.com/apache/spark/pull/27026


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

2020-08-07 Thread GitBox


dongjoon-hyun commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670794888


   While I wrote the above my comment, I didn't notice the last comment from 
@srowen . However, I believe it's better to adjust now.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

2020-08-07 Thread GitBox


dongjoon-hyun edited a comment on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670794888


   Oh, while I wrote the above my comment, I didn't notice the last comment 
from @srowen . However, I believe it's better to adjust now.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

2020-08-07 Thread GitBox


dongjoon-hyun commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670794507


   Hi, @Fokko . It seems that @srowen clearly gave -1 on this approach. 
Although I approved this, we cannot merge your PR if there is a -1.
   
   I must admit that I didn't notice that Sean has been such a strong -1 for 
two weeks while you are working in this way. As the one who proposed the 
original direction strongly, sorry about that.
   
   Could you adjust your PR according to @srowen 's comment?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

2020-08-07 Thread GitBox


srowen commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670794000


   No, I don't feel that way. If others would like to merge, go ahead. I would 
merge the narrower change myself, but do not object to the broader one.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

2020-08-07 Thread GitBox


dongjoon-hyun commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670791274


   Got it. So, you decided to give -1 for the enforcing for that reason.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

2020-08-07 Thread GitBox


srowen commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670786844


   I did. If most of the changes are exceptions, that makes me wonder how much 
this rule will just trigger false positives. I am not sure how worth it is to 
enforce this if mostly it turns up false positives.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

2020-08-07 Thread GitBox


dongjoon-hyun edited a comment on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670783658







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

2020-08-07 Thread GitBox


dongjoon-hyun commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670783658


   Hey, @srowen . Did you see my first comment, "Without the rule enforcement, 
unused import always happens again"?
   - https://github.com/apache/spark/pull/29121#issuecomment-658818917
   
   I'm wondering if you disagreed my first comment or just didn't see the it.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu edited a comment on pull request #29384: [SPARK-32564][SQL][TEST] Inject data statistics to simulate plan generation on actual TPCDS data

2020-08-07 Thread GitBox


maropu edited a comment on pull request #29384:
URL: https://github.com/apache/spark/pull/29384#issuecomment-670779886


   Thanks, all!
   
   @dongjoon-hyun Sure!
   @cloud-fan okay, I'll do follow-up.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #29384: [SPARK-32564][SQL][TEST] Inject data statistics to simulate plan generation on actual TPCDS data

2020-08-07 Thread GitBox


maropu commented on pull request #29384:
URL: https://github.com/apache/spark/pull/29384#issuecomment-670779886


   @dongjoon-hyun Sure!
   @cloud-fan okay, I'll do follow-up.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

2020-08-07 Thread GitBox


AmplabJenkins removed a comment on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670778245







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

2020-08-07 Thread GitBox


AmplabJenkins commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670778245







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

2020-08-07 Thread GitBox


SparkQA removed a comment on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670707796


   **[Test build #127217 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127217/testReport)**
 for PR 29121 at commit 
[`9f37113`](https://github.com/apache/spark/commit/9f3711335716e5fdd2b528c258fef00b27d4c670).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

2020-08-07 Thread GitBox


SparkQA commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-67074


   **[Test build #127217 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127217/testReport)**
 for PR 29121 at commit 
[`9f37113`](https://github.com/apache/spark/commit/9f3711335716e5fdd2b528c258fef00b27d4c670).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec

2020-08-07 Thread GitBox


AmplabJenkins removed a comment on pull request #29031:
URL: https://github.com/apache/spark/pull/29031#issuecomment-670772420







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec

2020-08-07 Thread GitBox


AmplabJenkins commented on pull request #29031:
URL: https://github.com/apache/spark/pull/29031#issuecomment-670772420







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec

2020-08-07 Thread GitBox


SparkQA commented on pull request #29031:
URL: https://github.com/apache/spark/pull/29031#issuecomment-670771944


   **[Test build #127212 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127212/testReport)**
 for PR 29031 at commit 
[`feabc1f`](https://github.com/apache/spark/commit/feabc1fb0406c739602ee0683bdd03e29dc7ea71).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec

2020-08-07 Thread GitBox


SparkQA removed a comment on pull request #29031:
URL: https://github.com/apache/spark/pull/29031#issuecomment-670642619


   **[Test build #127212 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127212/testReport)**
 for PR 29031 at commit 
[`feabc1f`](https://github.com/apache/spark/commit/feabc1fb0406c739602ee0683bdd03e29dc7ea71).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29074: [SPARK-32282][SQL] Improve EnsureRquirement.reorderJoinKeys to handle more scenarios such as PartitioningCollection

2020-08-07 Thread GitBox


AmplabJenkins commented on pull request #29074:
URL: https://github.com/apache/spark/pull/29074#issuecomment-670766308







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29074: [SPARK-32282][SQL] Improve EnsureRquirement.reorderJoinKeys to handle more scenarios such as PartitioningCollection

2020-08-07 Thread GitBox


AmplabJenkins removed a comment on pull request #29074:
URL: https://github.com/apache/spark/pull/29074#issuecomment-670766308







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29074: [SPARK-32282][SQL] Improve EnsureRquirement.reorderJoinKeys to handle more scenarios such as PartitioningCollection

2020-08-07 Thread GitBox


SparkQA removed a comment on pull request #29074:
URL: https://github.com/apache/spark/pull/29074#issuecomment-670645783


   **[Test build #127213 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127213/testReport)**
 for PR 29074 at commit 
[`89ad6ef`](https://github.com/apache/spark/commit/89ad6ef4d9ca67fb4dc06bd66b7db9d7430f9fd0).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29074: [SPARK-32282][SQL] Improve EnsureRquirement.reorderJoinKeys to handle more scenarios such as PartitioningCollection

2020-08-07 Thread GitBox


SparkQA commented on pull request #29074:
URL: https://github.com/apache/spark/pull/29074#issuecomment-670763604


   **[Test build #127213 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127213/testReport)**
 for PR 29074 at commit 
[`89ad6ef`](https://github.com/apache/spark/commit/89ad6ef4d9ca67fb4dc06bd66b7db9d7430f9fd0).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join

2020-08-07 Thread GitBox


AmplabJenkins removed a comment on pull request #29342:
URL: https://github.com/apache/spark/pull/29342#issuecomment-670751473







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join

2020-08-07 Thread GitBox


AmplabJenkins commented on pull request #29342:
URL: https://github.com/apache/spark/pull/29342#issuecomment-670751473







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join

2020-08-07 Thread GitBox


SparkQA removed a comment on pull request #29342:
URL: https://github.com/apache/spark/pull/29342#issuecomment-670632896


   **[Test build #127211 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127211/testReport)**
 for PR 29342 at commit 
[`d4e0084`](https://github.com/apache/spark/commit/d4e0084f904ec99294f91bb7bcfb68b4ad13ccfc).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join

2020-08-07 Thread GitBox


SparkQA commented on pull request #29342:
URL: https://github.com/apache/spark/pull/29342#issuecomment-670748063


   **[Test build #127211 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127211/testReport)**
 for PR 29342 at commit 
[`d4e0084`](https://github.com/apache/spark/commit/d4e0084f904ec99294f91bb7bcfb68b4ad13ccfc).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox


AmplabJenkins removed a comment on pull request #28804:
URL: https://github.com/apache/spark/pull/28804#issuecomment-670725860







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox


AmplabJenkins commented on pull request #28804:
URL: https://github.com/apache/spark/pull/28804#issuecomment-670725860







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox


SparkQA commented on pull request #28804:
URL: https://github.com/apache/spark/pull/28804#issuecomment-670725493


   **[Test build #127218 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127218/testReport)**
 for PR 28804 at commit 
[`0a186f0`](https://github.com/apache/spark/commit/0a186f0eb5d71732ec3abc6b42a12dae6594277f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29386: [SPARK-32568][BUILD][SS] Upgrade Kafka to 2.6.0

2020-08-07 Thread GitBox


AmplabJenkins commented on pull request #29386:
URL: https://github.com/apache/spark/pull/29386#issuecomment-670715442







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29386: [SPARK-32568][BUILD][SS] Upgrade Kafka to 2.6.0

2020-08-07 Thread GitBox


AmplabJenkins removed a comment on pull request #29386:
URL: https://github.com/apache/spark/pull/29386#issuecomment-670715442







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29386: [SPARK-32568][BUILD][SS] Upgrade Kafka to 2.6.0

2020-08-07 Thread GitBox


SparkQA removed a comment on pull request #29386:
URL: https://github.com/apache/spark/pull/29386#issuecomment-670654894


   **[Test build #127215 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127215/testReport)**
 for PR 29386 at commit 
[`3001355`](https://github.com/apache/spark/commit/3001355177ff581e25b8cded79d4ef79a0e48200).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29386: [SPARK-32568][BUILD][SS] Upgrade Kafka to 2.6.0

2020-08-07 Thread GitBox


SparkQA commented on pull request #29386:
URL: https://github.com/apache/spark/pull/29386#issuecomment-670714635


   **[Test build #127215 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127215/testReport)**
 for PR 29386 at commit 
[`3001355`](https://github.com/apache/spark/commit/3001355177ff581e25b8cded79d4ef79a0e48200).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

2020-08-07 Thread GitBox


srowen commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670711092


   My last comment was, why do we need to add the rule and then a ton of 
exclusions? just remove the unused imports. That's a much narrower change



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on pull request #24554: [SPARK-27622][Core] Avoiding the network when block manager fetches disk persisted RDD blocks from the same host

2020-08-07 Thread GitBox


mridulm commented on pull request #24554:
URL: https://github.com/apache/spark/pull/24554#issuecomment-670710590


   Catching up on PR's ... this essentially means all executors on same host 
have effectively same preferred locality (modulo concurrent block removal) - 
did we update the preferred locality for the block with this mind ? (here or in 
a follow up PR) Thx.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox


AmplabJenkins removed a comment on pull request #28804:
URL: https://github.com/apache/spark/pull/28804#issuecomment-670707738


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127216/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox


AmplabJenkins removed a comment on pull request #28804:
URL: https://github.com/apache/spark/pull/28804#issuecomment-670707728


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox


SparkQA removed a comment on pull request #28804:
URL: https://github.com/apache/spark/pull/28804#issuecomment-670694343


   **[Test build #127216 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127216/testReport)**
 for PR 28804 at commit 
[`2ae5525`](https://github.com/apache/spark/commit/2ae5525294718ea43cee419bc095a3d382b6c085).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

2020-08-07 Thread GitBox


SparkQA commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670707796


   **[Test build #127217 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127217/testReport)**
 for PR 29121 at commit 
[`9f37113`](https://github.com/apache/spark/commit/9f3711335716e5fdd2b528c258fef00b27d4c670).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox


SparkQA commented on pull request #28804:
URL: https://github.com/apache/spark/pull/28804#issuecomment-670707640


   **[Test build #127216 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127216/testReport)**
 for PR 28804 at commit 
[`2ae5525`](https://github.com/apache/spark/commit/2ae5525294718ea43cee419bc095a3d382b6c085).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox


AmplabJenkins commented on pull request #28804:
URL: https://github.com/apache/spark/pull/28804#issuecomment-670707728







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

2020-08-07 Thread GitBox


AmplabJenkins removed a comment on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670705663







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

2020-08-07 Thread GitBox


AmplabJenkins commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670705663







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

2020-08-07 Thread GitBox


dongjoon-hyun commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670705302


   Retest this please.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] allisonwang-db commented on pull request #29137: [SPARK-32337][SQL] Show initial plan in AQE plan tree string

2020-08-07 Thread GitBox


allisonwang-db commented on pull request #29137:
URL: https://github.com/apache/spark/pull/29137#issuecomment-670699341


   I've updated the PR description. Please let me know if it makes sense.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join

2020-08-07 Thread GitBox


viirya commented on a change in pull request #29342:
URL: https://github.com/apache/spark/pull/29342#discussion_r467252553



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala
##
@@ -71,8 +88,122 @@ case class ShuffledHashJoinExec(
 val numOutputRows = longMetric("numOutputRows")
 streamedPlan.execute().zipPartitions(buildPlan.execute()) { (streamIter, 
buildIter) =>
   val hashed = buildHashedRelation(buildIter)
-  join(streamIter, hashed, numOutputRows)
+  joinType match {
+case FullOuter => fullOuterJoin(streamIter, hashed, numOutputRows)
+case _ => join(streamIter, hashed, numOutputRows)
+  }
+}
+  }
+
+  /**
+   * Full outer shuffled hash join has three steps:
+   * 1. Construct hash relation from build side,
+   *with extra boolean value at the end of row to track look up information
+   *(done in `buildHashedRelation`).
+   * 2. Process rows from stream side by looking up hash relation,
+   *and mark the matched rows from build side be looked up.
+   * 3. Process rows from build side by iterating hash relation,
+   *and filter out rows from build side being looked up already.
+   */
+  private def fullOuterJoin(
+  streamIter: Iterator[InternalRow],
+  hashedRelation: HashedRelation,
+  numOutputRows: SQLMetric): Iterator[InternalRow] = {
+val joinRow = new JoinedRow
+val (joinRowWithStream, joinRowWithBuild) = {
+  buildSide match {
+case BuildLeft => (joinRow.withRight _, joinRow.withLeft _)
+case BuildRight => (joinRow.withLeft _, joinRow.withRight _)
+  }
+}
+val joinKeys = streamSideKeyGenerator()
+val buildRowGenerator = UnsafeProjection.create(buildOutput, buildOutput)
+val buildNullRow = new GenericInternalRow(buildOutput.length)
+val streamNullRow = new GenericInternalRow(streamedOutput.length)
+
+def markRowLookedUp(row: UnsafeRow): Unit =
+  row.setBoolean(row.numFields() - 1, true)
+
+// Process stream side with looking up hash relation
+val streamResultIter =
+  if (hashedRelation.keyIsUnique) {
+streamIter.map { srow =>
+  joinRowWithStream(srow)
+  val keys = joinKeys(srow)
+  if (keys.anyNull) {
+joinRowWithBuild(buildNullRow)
+  } else {
+val matched = hashedRelation.getValue(keys)
+if (matched != null) {
+  val buildRow = buildRowGenerator(matched)
+  if (boundCondition(joinRowWithBuild(buildRow))) {
+markRowLookedUp(matched.asInstanceOf[UnsafeRow])
+joinRow
+  } else {
+joinRowWithBuild(buildNullRow)
+  }
+} else {
+  joinRowWithBuild(buildNullRow)
+}
+  }
+}
+  } else {
+streamIter.flatMap { srow =>
+  joinRowWithStream(srow)
+  val keys = joinKeys(srow)
+  if (keys.anyNull) {
+Iterator.single(joinRowWithBuild(buildNullRow))
+  } else {
+val buildIter = hashedRelation.get(keys)
+new RowIterator {
+  private var found = false
+  override def advanceNext(): Boolean = {
+while (buildIter != null && buildIter.hasNext) {
+  val matched = buildIter.next()
+  val buildRow = buildRowGenerator(matched)
+  if (boundCondition(joinRowWithBuild(buildRow))) {
+markRowLookedUp(matched.asInstanceOf[UnsafeRow])
+found = true
+return true
+  }
+}
+if (!found) {
+  joinRowWithBuild(buildNullRow)
+  found = true
+  return true
+}
+false
+  }
+  override def getRow: InternalRow = joinRow
+}.toScala
+  }
+}
+  }
+
+// Process build side with filtering out rows looked up already
+val buildResultIter = hashedRelation.values().flatMap { brow =>
+  val unsafebrow = brow.asInstanceOf[UnsafeRow]
+  val isLookup = unsafebrow.getBoolean(unsafebrow.numFields() - 1)
+  if (!isLookup) {
+val buildRow = buildRowGenerator(unsafebrow)
+joinRowWithBuild(buildRow)
+joinRowWithStream(streamNullRow)

Review comment:
   When we reach `buildResultIter`, I think we only need do 
`joinRowWithStream` once? Stream row should not be changed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, 

[GitHub] [spark] viirya commented on a change in pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join

2020-08-07 Thread GitBox


viirya commented on a change in pull request #29342:
URL: https://github.com/apache/spark/pull/29342#discussion_r467251404



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala
##
@@ -71,8 +88,122 @@ case class ShuffledHashJoinExec(
 val numOutputRows = longMetric("numOutputRows")
 streamedPlan.execute().zipPartitions(buildPlan.execute()) { (streamIter, 
buildIter) =>
   val hashed = buildHashedRelation(buildIter)
-  join(streamIter, hashed, numOutputRows)
+  joinType match {
+case FullOuter => fullOuterJoin(streamIter, hashed, numOutputRows)
+case _ => join(streamIter, hashed, numOutputRows)
+  }
+}
+  }
+
+  /**
+   * Full outer shuffled hash join has three steps:
+   * 1. Construct hash relation from build side,
+   *with extra boolean value at the end of row to track look up information
+   *(done in `buildHashedRelation`).
+   * 2. Process rows from stream side by looking up hash relation,
+   *and mark the matched rows from build side be looked up.
+   * 3. Process rows from build side by iterating hash relation,
+   *and filter out rows from build side being looked up already.
+   */
+  private def fullOuterJoin(
+  streamIter: Iterator[InternalRow],
+  hashedRelation: HashedRelation,
+  numOutputRows: SQLMetric): Iterator[InternalRow] = {
+val joinRow = new JoinedRow
+val (joinRowWithStream, joinRowWithBuild) = {
+  buildSide match {
+case BuildLeft => (joinRow.withRight _, joinRow.withLeft _)
+case BuildRight => (joinRow.withLeft _, joinRow.withRight _)
+  }
+}
+val joinKeys = streamSideKeyGenerator()
+val buildRowGenerator = UnsafeProjection.create(buildOutput, buildOutput)
+val buildNullRow = new GenericInternalRow(buildOutput.length)
+val streamNullRow = new GenericInternalRow(streamedOutput.length)
+
+def markRowLookedUp(row: UnsafeRow): Unit =
+  row.setBoolean(row.numFields() - 1, true)
+
+// Process stream side with looking up hash relation
+val streamResultIter =
+  if (hashedRelation.keyIsUnique) {
+streamIter.map { srow =>
+  joinRowWithStream(srow)
+  val keys = joinKeys(srow)
+  if (keys.anyNull) {
+joinRowWithBuild(buildNullRow)
+  } else {
+val matched = hashedRelation.getValue(keys)
+if (matched != null) {
+  val buildRow = buildRowGenerator(matched)
+  if (boundCondition(joinRowWithBuild(buildRow))) {
+markRowLookedUp(matched.asInstanceOf[UnsafeRow])
+joinRow
+  } else {
+joinRowWithBuild(buildNullRow)
+  }
+} else {
+  joinRowWithBuild(buildNullRow)
+}
+  }
+}
+  } else {
+streamIter.flatMap { srow =>
+  joinRowWithStream(srow)
+  val keys = joinKeys(srow)
+  if (keys.anyNull) {
+Iterator.single(joinRowWithBuild(buildNullRow))
+  } else {
+val buildIter = hashedRelation.get(keys)
+new RowIterator {
+  private var found = false
+  override def advanceNext(): Boolean = {
+while (buildIter != null && buildIter.hasNext) {
+  val matched = buildIter.next()
+  val buildRow = buildRowGenerator(matched)
+  if (boundCondition(joinRowWithBuild(buildRow))) {
+markRowLookedUp(matched.asInstanceOf[UnsafeRow])
+found = true
+return true
+  }
+}
+if (!found) {
+  joinRowWithBuild(buildNullRow)
+  found = true
+  return true
+}
+false
+  }
+  override def getRow: InternalRow = joinRow
+}.toScala
+  }
+}
+  }
+
+// Process build side with filtering out rows looked up already

Review comment:
   not only looked up, but also passed join condition?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join

2020-08-07 Thread GitBox


viirya commented on a change in pull request #29342:
URL: https://github.com/apache/spark/pull/29342#discussion_r467250468



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala
##
@@ -71,8 +88,122 @@ case class ShuffledHashJoinExec(
 val numOutputRows = longMetric("numOutputRows")
 streamedPlan.execute().zipPartitions(buildPlan.execute()) { (streamIter, 
buildIter) =>
   val hashed = buildHashedRelation(buildIter)
-  join(streamIter, hashed, numOutputRows)
+  joinType match {
+case FullOuter => fullOuterJoin(streamIter, hashed, numOutputRows)
+case _ => join(streamIter, hashed, numOutputRows)
+  }
+}
+  }
+
+  /**
+   * Full outer shuffled hash join has three steps:
+   * 1. Construct hash relation from build side,
+   *with extra boolean value at the end of row to track look up information
+   *(done in `buildHashedRelation`).
+   * 2. Process rows from stream side by looking up hash relation,
+   *and mark the matched rows from build side be looked up.
+   * 3. Process rows from build side by iterating hash relation,
+   *and filter out rows from build side being looked up already.
+   */
+  private def fullOuterJoin(
+  streamIter: Iterator[InternalRow],
+  hashedRelation: HashedRelation,
+  numOutputRows: SQLMetric): Iterator[InternalRow] = {
+val joinRow = new JoinedRow
+val (joinRowWithStream, joinRowWithBuild) = {
+  buildSide match {
+case BuildLeft => (joinRow.withRight _, joinRow.withLeft _)
+case BuildRight => (joinRow.withLeft _, joinRow.withRight _)
+  }
+}
+val joinKeys = streamSideKeyGenerator()
+val buildRowGenerator = UnsafeProjection.create(buildOutput, buildOutput)
+val buildNullRow = new GenericInternalRow(buildOutput.length)
+val streamNullRow = new GenericInternalRow(streamedOutput.length)
+
+def markRowLookedUp(row: UnsafeRow): Unit =
+  row.setBoolean(row.numFields() - 1, true)
+
+// Process stream side with looking up hash relation
+val streamResultIter =
+  if (hashedRelation.keyIsUnique) {
+streamIter.map { srow =>
+  joinRowWithStream(srow)
+  val keys = joinKeys(srow)
+  if (keys.anyNull) {
+joinRowWithBuild(buildNullRow)
+  } else {
+val matched = hashedRelation.getValue(keys)
+if (matched != null) {
+  val buildRow = buildRowGenerator(matched)
+  if (boundCondition(joinRowWithBuild(buildRow))) {
+markRowLookedUp(matched.asInstanceOf[UnsafeRow])
+joinRow
+  } else {
+joinRowWithBuild(buildNullRow)
+  }
+} else {
+  joinRowWithBuild(buildNullRow)
+}
+  }
+}
+  } else {
+streamIter.flatMap { srow =>
+  joinRowWithStream(srow)
+  val keys = joinKeys(srow)
+  if (keys.anyNull) {
+Iterator.single(joinRowWithBuild(buildNullRow))
+  } else {
+val buildIter = hashedRelation.get(keys)
+new RowIterator {
+  private var found = false
+  override def advanceNext(): Boolean = {

Review comment:
   `found` can be moved into `advanceNext`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join

2020-08-07 Thread GitBox


viirya commented on a change in pull request #29342:
URL: https://github.com/apache/spark/pull/29342#discussion_r467250293



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala
##
@@ -71,8 +88,122 @@ case class ShuffledHashJoinExec(
 val numOutputRows = longMetric("numOutputRows")
 streamedPlan.execute().zipPartitions(buildPlan.execute()) { (streamIter, 
buildIter) =>
   val hashed = buildHashedRelation(buildIter)
-  join(streamIter, hashed, numOutputRows)
+  joinType match {
+case FullOuter => fullOuterJoin(streamIter, hashed, numOutputRows)
+case _ => join(streamIter, hashed, numOutputRows)
+  }
+}
+  }
+
+  /**
+   * Full outer shuffled hash join has three steps:
+   * 1. Construct hash relation from build side,
+   *with extra boolean value at the end of row to track look up information
+   *(done in `buildHashedRelation`).
+   * 2. Process rows from stream side by looking up hash relation,
+   *and mark the matched rows from build side be looked up.
+   * 3. Process rows from build side by iterating hash relation,
+   *and filter out rows from build side being looked up already.
+   */
+  private def fullOuterJoin(
+  streamIter: Iterator[InternalRow],
+  hashedRelation: HashedRelation,
+  numOutputRows: SQLMetric): Iterator[InternalRow] = {
+val joinRow = new JoinedRow
+val (joinRowWithStream, joinRowWithBuild) = {
+  buildSide match {
+case BuildLeft => (joinRow.withRight _, joinRow.withLeft _)
+case BuildRight => (joinRow.withLeft _, joinRow.withRight _)
+  }
+}
+val joinKeys = streamSideKeyGenerator()
+val buildRowGenerator = UnsafeProjection.create(buildOutput, buildOutput)
+val buildNullRow = new GenericInternalRow(buildOutput.length)
+val streamNullRow = new GenericInternalRow(streamedOutput.length)
+
+def markRowLookedUp(row: UnsafeRow): Unit =
+  row.setBoolean(row.numFields() - 1, true)
+
+// Process stream side with looking up hash relation
+val streamResultIter =
+  if (hashedRelation.keyIsUnique) {
+streamIter.map { srow =>
+  joinRowWithStream(srow)
+  val keys = joinKeys(srow)
+  if (keys.anyNull) {
+joinRowWithBuild(buildNullRow)
+  } else {
+val matched = hashedRelation.getValue(keys)
+if (matched != null) {
+  val buildRow = buildRowGenerator(matched)
+  if (boundCondition(joinRowWithBuild(buildRow))) {
+markRowLookedUp(matched.asInstanceOf[UnsafeRow])
+joinRow
+  } else {
+joinRowWithBuild(buildNullRow)
+  }
+} else {
+  joinRowWithBuild(buildNullRow)
+}
+  }
+}
+  } else {
+streamIter.flatMap { srow =>
+  joinRowWithStream(srow)
+  val keys = joinKeys(srow)
+  if (keys.anyNull) {
+Iterator.single(joinRowWithBuild(buildNullRow))
+  } else {
+val buildIter = hashedRelation.get(keys)
+new RowIterator {
+  private var found = false
+  override def advanceNext(): Boolean = {
+while (buildIter != null && buildIter.hasNext) {
+  val matched = buildIter.next()
+  val buildRow = buildRowGenerator(matched)
+  if (boundCondition(joinRowWithBuild(buildRow))) {
+markRowLookedUp(matched.asInstanceOf[UnsafeRow])
+found = true
+return true
+  }
+}
+if (!found) {
+  joinRowWithBuild(buildNullRow)
+  found = true

Review comment:
   no need to set `found` here.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox


AmplabJenkins removed a comment on pull request #28804:
URL: https://github.com/apache/spark/pull/28804#issuecomment-670694952







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox


AmplabJenkins commented on pull request #28804:
URL: https://github.com/apache/spark/pull/28804#issuecomment-670694952







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox


SparkQA commented on pull request #28804:
URL: https://github.com/apache/spark/pull/28804#issuecomment-670694343


   **[Test build #127216 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127216/testReport)**
 for PR 28804 at commit 
[`2ae5525`](https://github.com/apache/spark/commit/2ae5525294718ea43cee419bc095a3d382b6c085).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] karuppayya commented on a change in pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox


karuppayya commented on a change in pull request #28804:
URL: https://github.com/apache/spark/pull/28804#discussion_r467173370



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala
##
@@ -353,4 +353,8 @@ object AggUtils {
 
 finalAndCompleteAggregate :: Nil
   }
+
+  def areAggExpressionsPartial(modes: Seq[AggregateMode]): Boolean = {
+modes.nonEmpty && modes.forall(_ == Partial)

Review comment:
   In this case, the reducer side also does not have any aggregate function 
and we might end up not aggregating the data





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Fokko commented on pull request #29121: [SPARK-32319][PYSPARK] Disallow the use of unused imports

2020-08-07 Thread GitBox


Fokko commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-670690427


   Would it be possible to move this forward? :)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join

2020-08-07 Thread GitBox


viirya commented on a change in pull request #29342:
URL: https://github.com/apache/spark/pull/29342#discussion_r467244022



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
##
@@ -314,7 +343,13 @@ private[joins] object UnsafeHashedRelation {
   key: Seq[Expression],
   sizeEstimate: Int,
   taskMemoryManager: TaskMemoryManager,
-  isNullAware: Boolean = false): HashedRelation = {
+  isNullAware: Boolean = false,
+  isLookupAware: Boolean = false,
+  value: Option[Seq[Expression]] = None): HashedRelation = {

Review comment:
   `value` -> `valueExprs`? You also use `value` below for other variable, 
it is easier to confuse code reader.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join

2020-08-07 Thread GitBox


viirya commented on a change in pull request #29342:
URL: https://github.com/apache/spark/pull/29342#discussion_r467244022



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
##
@@ -314,7 +343,13 @@ private[joins] object UnsafeHashedRelation {
   key: Seq[Expression],
   sizeEstimate: Int,
   taskMemoryManager: TaskMemoryManager,
-  isNullAware: Boolean = false): HashedRelation = {
+  isNullAware: Boolean = false,
+  isLookupAware: Boolean = false,
+  value: Option[Seq[Expression]] = None): HashedRelation = {

Review comment:
   value -> valueExprs? You also use value below for other variable, it is 
easier to confuse code reader.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox


AmplabJenkins removed a comment on pull request #28804:
URL: https://github.com/apache/spark/pull/28804#issuecomment-670659771


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127214/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29331: [SPARK-32517][CORE] Add StorageLevel.DISK_ONLY_3

2020-08-07 Thread GitBox


AmplabJenkins removed a comment on pull request #29331:
URL: https://github.com/apache/spark/pull/29331#issuecomment-670659548







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox


SparkQA removed a comment on pull request #28804:
URL: https://github.com/apache/spark/pull/28804#issuecomment-670645866


   **[Test build #127214 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127214/testReport)**
 for PR 28804 at commit 
[`ceaa4e5`](https://github.com/apache/spark/commit/ceaa4e52d558d21964e5ea84a236519680202115).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox


AmplabJenkins removed a comment on pull request #28804:
URL: https://github.com/apache/spark/pull/28804#issuecomment-670659760


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox


AmplabJenkins commented on pull request #28804:
URL: https://github.com/apache/spark/pull/28804#issuecomment-670659760







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28804: [SPARK-31973][SQL] Skip partial aggregates if grouping keys have high cardinality

2020-08-07 Thread GitBox


SparkQA commented on pull request #28804:
URL: https://github.com/apache/spark/pull/28804#issuecomment-670659564


   **[Test build #127214 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127214/testReport)**
 for PR 28804 at commit 
[`ceaa4e5`](https://github.com/apache/spark/commit/ceaa4e52d558d21964e5ea84a236519680202115).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29331: [SPARK-32517][CORE] Add StorageLevel.DISK_ONLY_3

2020-08-07 Thread GitBox


AmplabJenkins commented on pull request #29331:
URL: https://github.com/apache/spark/pull/29331#issuecomment-670659548







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29331: [SPARK-32517][CORE] Add StorageLevel.DISK_ONLY_3

2020-08-07 Thread GitBox


SparkQA removed a comment on pull request #29331:
URL: https://github.com/apache/spark/pull/29331#issuecomment-670596918


   **[Test build #127208 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127208/testReport)**
 for PR 29331 at commit 
[`cc1a7a3`](https://github.com/apache/spark/commit/cc1a7a3379a620f32d835230eebda5980ca7f7d3).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29331: [SPARK-32517][CORE] Add StorageLevel.DISK_ONLY_3

2020-08-07 Thread GitBox


SparkQA commented on pull request #29331:
URL: https://github.com/apache/spark/pull/29331#issuecomment-670658671


   **[Test build #127208 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127208/testReport)**
 for PR 29331 at commit 
[`cc1a7a3`](https://github.com/apache/spark/commit/cc1a7a3379a620f32d835230eebda5980ca7f7d3).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29326: [WIP][SPARK-32502][BUILD] Upgrade Guava to 27.0-jre and Hadoop to 3.2.1

2020-08-07 Thread GitBox


dongjoon-hyun commented on pull request #29326:
URL: https://github.com/apache/spark/pull/29326#issuecomment-670656860


   Thank you so much. Yes. I'm looking forward to seeing that~



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29386: [SPARK-32568][BUILD][SS] Upgrade Kafka to 2.6.0

2020-08-07 Thread GitBox


dongjoon-hyun commented on a change in pull request #29386:
URL: https://github.com/apache/spark/pull/29386#discussion_r467209211



##
File path: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala
##
@@ -395,7 +395,7 @@ class KafkaTestUtils(
   }
 
   def getAllTopicsAndPartitionSize(): Seq[(String, Int)] = {
-
zkClient.getPartitionsForTopics(zkClient.getAllTopicsInCluster).mapValues(_.size).toSeq
+
zkClient.getPartitionsForTopics(zkClient.getAllTopicsInCluster()).mapValues(_.size).toSeq

Review comment:
   This change is required to compile.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29386: [SPARK-32568][BUILD][SS] Upgrade Kafka to 2.6.0

2020-08-07 Thread GitBox


AmplabJenkins removed a comment on pull request #29386:
URL: https://github.com/apache/spark/pull/29386#issuecomment-670655440







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >