[GitHub] spark pull request #21868: [SPARK-24906][SQL] Adaptively enlarge split / par...

2018-08-17 Thread habren
Github user habren commented on a diff in the pull request: https://github.com/apache/spark/pull/21868#discussion_r210890027 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -459,6 +460,29 @@ object SQLConf { .intConf

[GitHub] spark issue #21868: [SPARK-24906][SQL] Adaptively enlarge split / partition ...

2018-08-17 Thread habren
Github user habren commented on the issue: https://github.com/apache/spark/pull/21868 @HyukjinKwon Yes this is to handle it dynamically. For ad-hoc query, the selected columns are different for different queries, and it's not convenient or event impossible for users to set

[GitHub] spark pull request #21868: [SPARK-24906][SQL] Adaptively enlarge split / par...

2018-08-17 Thread habren
Github user habren commented on a diff in the pull request: https://github.com/apache/spark/pull/21868#discussion_r210887543 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -459,6 +460,29 @@ object SQLConf { .intConf

[GitHub] spark pull request #21868: [SPARK-24906][SQL] Adaptively enlarge split / par...

2018-08-17 Thread habren
Github user habren commented on a diff in the pull request: https://github.com/apache/spark/pull/21868#discussion_r210887308 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -459,6 +460,29 @@ object SQLConf { .intConf

[GitHub] spark pull request #21868: [SPARK-24906][SQL] Adaptively enlarge split / par...

2018-08-17 Thread habren
Github user habren commented on a diff in the pull request: https://github.com/apache/spark/pull/21868#discussion_r210886442 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -25,17 +25,16 @@ import java.util.zip.Deflater import

[GitHub] spark pull request #21868: [SPARK-24906][SQL] Adaptively enlarge split / par...

2018-08-17 Thread habren
Github user habren commented on a diff in the pull request: https://github.com/apache/spark/pull/21868#discussion_r210876154 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -459,6 +460,29 @@ object SQLConf { .intConf

[GitHub] spark pull request #21868: [SPARK-24906][SQL] Adaptively enlarge split / par...

2018-08-17 Thread habren
Github user habren commented on a diff in the pull request: https://github.com/apache/spark/pull/21868#discussion_r210871055 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -425,12 +426,44 @@ case class FileSourceScanExec

[GitHub] spark pull request #21868: [SPARK-24906][SQL] Adaptively enlarge split / par...

2018-08-16 Thread habren
Github user habren commented on a diff in the pull request: https://github.com/apache/spark/pull/21868#discussion_r210793717 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -425,12 +426,44 @@ case class FileSourceScanExec

[GitHub] spark issue #21868: [SPARK-24906][SQL] Adaptively enlarge split / partition ...

2018-08-15 Thread habren
Github user habren commented on the issue: https://github.com/apache/spark/pull/21868 Hi @HyukjinKwon I moved the change to master branch just now. Please help to review --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #21868: [SPARK-24906][SQL] Adaptively enlarge split / par...

2018-08-15 Thread habren
Github user habren commented on a diff in the pull request: https://github.com/apache/spark/pull/21868#discussion_r210456342 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -401,12 +399,41 @@ case class FileSourceScanExec

[GitHub] spark issue #21868: [SPARK-24906][SQL] Adaptively enlarge split / partition ...

2018-08-10 Thread habren
Github user habren commented on the issue: https://github.com/apache/spark/pull/21868 @HyukjinKwon Thanks for your comments. I will submit it to master soon --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #21868: [SPARK-24906][SQL] Adaptively enlarge split / partition ...

2018-08-08 Thread habren
Github user habren commented on the issue: https://github.com/apache/spark/pull/21868 @maropu Thanks for your comments. ORC can also benefit from this change since ORC is also columnar file format. Do you think I should add ORC support by change the below line

[GitHub] spark pull request #22018: [SPARK-25038][SQL] Get block location in parallel

2018-08-08 Thread habren
Github user habren commented on a diff in the pull request: https://github.com/apache/spark/pull/22018#discussion_r208788059 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala --- @@ -297,7 +297,7 @@ object InMemoryFileIndex

[GitHub] spark pull request #22018: [SPARK-25038][SQL] Get block location in parallel

2018-08-08 Thread habren
Github user habren commented on a diff in the pull request: https://github.com/apache/spark/pull/22018#discussion_r208787523 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala --- @@ -297,7 +297,7 @@ object InMemoryFileIndex

[GitHub] spark pull request #22018: [SPARK-25038][SQL] Accelerate Spark Plan generati...

2018-08-08 Thread habren
Github user habren commented on a diff in the pull request: https://github.com/apache/spark/pull/22018#discussion_r208784609 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala --- @@ -297,7 +297,7 @@ object InMemoryFileIndex

[GitHub] spark issue #22018: [SPARK-25038][SQL] Accelerate Spark Plan generation when...

2018-08-08 Thread habren
Github user habren commented on the issue: https://github.com/apache/spark/pull/22018 Hi Takeshi Yamamuro Hyukjin Kwon​ and @viirya Can you take a look at this patch? --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #21868: [SPARK-24906][SQL] Adaptively enlarge split / partition ...

2018-08-08 Thread habren
Github user habren commented on the issue: https://github.com/apache/spark/pull/21868 Hi @maropu and @viirya Do you agree with the basic idea that we should take column pruning in to consideration during splitting the input files

[GitHub] spark pull request #22018: [SPARK-25038][SQL] Accelerate Spark Plan generati...

2018-08-06 Thread habren
GitHub user habren opened a pull request: https://github.com/apache/spark/pull/22018 [SPARK-25038][SQL] Accelerate Spark Plan generation when Spark SQL re… https://issues.apache.org/jira/browse/SPARK-25038 When Spark SQL read large amount of data, it take a long time

[GitHub] spark issue #21868: [SPARK-24906][SQL] Adaptively enlarge split / partition ...

2018-07-30 Thread habren
Github user habren commented on the issue: https://github.com/apache/spark/pull/21868 Hi @maropu and @viirya Do you agree with the basic idea that we should take column pruning in to consideration during splitting the input files

[GitHub] spark pull request #21868: [SPARK-24906][SQL] Adaptively enlarge split / par...

2018-07-26 Thread habren
GitHub user habren reopened a pull request: https://github.com/apache/spark/pull/21868 [SPARK-24906][SQL] Adaptively enlarge split / partition size for Parq… Please refer to https://issues.apache.org/jira/browse/SPARK-24906 for more detail and test For columnar file

[GitHub] spark issue #21868: [SPARK-24906][SQL] Adaptively enlarge split / partition ...

2018-07-26 Thread habren
Github user habren commented on the issue: https://github.com/apache/spark/pull/21868 @maropu If I understand correct, your concern is about how to calculate --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #21868: [SPARK-24906][SQL] Adaptively enlarge split / par...

2018-07-26 Thread habren
Github user habren closed the pull request at: https://github.com/apache/spark/pull/21868 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21868: [SPARK-24906][SQL] Adaptively enlarge split / par...

2018-07-26 Thread habren
Github user habren commented on a diff in the pull request: https://github.com/apache/spark/pull/21868#discussion_r205356861 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -401,12 +399,41 @@ case class FileSourceScanExec

[GitHub] spark pull request #21868: [SPARK-24906][SQL] Adaptively enlarge split / par...

2018-07-25 Thread habren
Github user habren commented on a diff in the pull request: https://github.com/apache/spark/pull/21868#discussion_r205288000 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -401,12 +399,41 @@ case class FileSourceScanExec

[GitHub] spark pull request #21868: [SPARK-24906][SQL] Adaptively enlarge split / par...

2018-07-25 Thread habren
Github user habren commented on a diff in the pull request: https://github.com/apache/spark/pull/21868#discussion_r205287123 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -381,6 +381,26 @@ object SQLConf { .booleanConf

[GitHub] spark pull request #21868: [SPARK-24906][SQL] Adaptively enlarge split / par...

2018-07-24 Thread habren
GitHub user habren opened a pull request: https://github.com/apache/spark/pull/21868 [SPARK-24906][SQL] Adaptively enlarge split / partition size for Parq… Please refer to https://issues.apache.org/jira/browse/SPARK-24906 for more detail and test For columnar file