[GitHub] spark issue #20372: [SPARK-23249] [SQL] Improved block merging logic for par...

2018-02-14 Thread glentakahashi
Github user glentakahashi commented on the issue: https://github.com/apache/spark/pull/20372 No worries. Can you shed some more light onto the performance regressions? Are the benchmark code/results public for me to peruse? If not, could you post a high level summary? I'd love

[GitHub] spark issue #20372: [SPARK-23249] [SQL] Improved block merging logic for par...

2018-02-14 Thread glentakahashi
Github user glentakahashi commented on the issue: https://github.com/apache/spark/pull/20372 @gatorsmile can you link the ticket about the perf regression? I imagine you would be seeing perf regressions in cases where partition counts are less than total cluster capacity, as this has

[GitHub] spark issue #20372: [SPARK-23249] [SQL] Improved block merging logic for par...

2018-01-31 Thread glentakahashi
Github user glentakahashi commented on the issue: https://github.com/apache/spark/pull/20372 What are the remaining steps to get this merged? Just checking that I don't need to do anything else from my end

[GitHub] spark issue #20372: [SPARK-23249] Improved block merging logic for partition...

2018-01-27 Thread glentakahashi
Github user glentakahashi commented on the issue: https://github.com/apache/spark/pull/20372 Created https://issues.apache.org/jira/browse/SPARK-23249 --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #20372: Improved block merging logic for partitions

2018-01-24 Thread glentakahashi
Github user glentakahashi commented on the issue: https://github.com/apache/spark/pull/20372 The large non-splittable files is already tested by https://github.com/glentakahashi/spark/blob/c575977a5952bf50b605be8079c9be1e30f3bd36/sql/core/src/test/scala/org/apache/spark/sql/execution

[GitHub] spark pull request #20372: Improved block merging logic for partitions

2018-01-23 Thread glentakahashi
Github user glentakahashi commented on a diff in the pull request: https://github.com/apache/spark/pull/20372#discussion_r163427261 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -445,16 +445,25 @@ case class FileSourceScanExec

[GitHub] spark pull request #20372: Improved block merging logic for partitions

2018-01-23 Thread glentakahashi
Github user glentakahashi commented on a diff in the pull request: https://github.com/apache/spark/pull/20372#discussion_r163426554 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -445,16 +445,25 @@ case class FileSourceScanExec

[GitHub] spark pull request #20372: Improved block merging logic for partitions

2018-01-23 Thread glentakahashi
GitHub user glentakahashi opened a pull request: https://github.com/apache/spark/pull/20372 Improved block merging logic for partitions ## What changes were proposed in this pull request? Change DataSourceScanExec so that when grouping blocks together into partitions, also