Github user glentakahashi commented on the issue:
https://github.com/apache/spark/pull/20372
No worries. Can you shed some more light onto the performance regressions?
Are the benchmark code/results public for me to peruse? If not, could you post
a high level summary? I'd love
Github user glentakahashi commented on the issue:
https://github.com/apache/spark/pull/20372
@gatorsmile can you link the ticket about the perf regression? I imagine
you would be seeing perf regressions in cases where partition counts are less
than total cluster capacity, as this has
Github user glentakahashi commented on the issue:
https://github.com/apache/spark/pull/20372
What are the remaining steps to get this merged? Just checking that I don't
need to do anything else from my end
Github user glentakahashi commented on the issue:
https://github.com/apache/spark/pull/20372
Created https://issues.apache.org/jira/browse/SPARK-23249
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user glentakahashi commented on the issue:
https://github.com/apache/spark/pull/20372
The large non-splittable files is already tested by
https://github.com/glentakahashi/spark/blob/c575977a5952bf50b605be8079c9be1e30f3bd36/sql/core/src/test/scala/org/apache/spark/sql/execution
Github user glentakahashi commented on a diff in the pull request:
https://github.com/apache/spark/pull/20372#discussion_r163427261
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
---
@@ -445,16 +445,25 @@ case class FileSourceScanExec
Github user glentakahashi commented on a diff in the pull request:
https://github.com/apache/spark/pull/20372#discussion_r163426554
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
---
@@ -445,16 +445,25 @@ case class FileSourceScanExec
GitHub user glentakahashi opened a pull request:
https://github.com/apache/spark/pull/20372
Improved block merging logic for partitions
## What changes were proposed in this pull request?
Change DataSourceScanExec so that when grouping blocks together into
partitions, also