[GitHub] spark issue #21868: [SPARK-24906][SQL] Adaptively enlarge split / partition ...

2018-11-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21868 I think we should fix this. Basically the dynamic estimation logic is too flaky, and I think we need this for the current status. Let's don't add it for now. While I am revisiting old

[GitHub] spark issue #21868: [SPARK-24906][SQL] Adaptively enlarge split / partition ...

2018-10-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21868 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21868: [SPARK-24906][SQL] Adaptively enlarge split / partition ...

2018-08-17 Thread habren
Github user habren commented on the issue: https://github.com/apache/spark/pull/21868 @HyukjinKwon Yes this is to handle it dynamically. For ad-hoc query, the selected columns are different for different queries, and it's not convenient or event impossible for users to set

[GitHub] spark issue #21868: [SPARK-24906][SQL] Adaptively enlarge split / partition ...

2018-08-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21868 @habren, BTW, just for clarification, you can set the bigger number to `spark.sql.files.maxPartitionBytes` explicitly and that resolved your issue. This one is to handle it dynamically, right?

[GitHub] spark issue #21868: [SPARK-24906][SQL] Adaptively enlarge split / partition ...

2018-08-15 Thread habren
Github user habren commented on the issue: https://github.com/apache/spark/pull/21868 Hi @HyukjinKwon I moved the change to master branch just now. Please help to review --- - To unsubscribe, e-mail:

[GitHub] spark issue #21868: [SPARK-24906][SQL] Adaptively enlarge split / partition ...

2018-08-10 Thread habren
Github user habren commented on the issue: https://github.com/apache/spark/pull/21868 @HyukjinKwon Thanks for your comments. I will submit it to master soon --- - To unsubscribe, e-mail:

[GitHub] spark issue #21868: [SPARK-24906][SQL] Adaptively enlarge split / partition ...

2018-08-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21868 ??? why does this still target branch-2.3? is this a backport? --- - To unsubscribe, e-mail:

[GitHub] spark issue #21868: [SPARK-24906][SQL] Adaptively enlarge split / partition ...

2018-08-08 Thread habren
Github user habren commented on the issue: https://github.com/apache/spark/pull/21868 @maropu Thanks for your comments. ORC can also benefit from this change since ORC is also columnar file format. Do you think I should add ORC support by change the below line `

[GitHub] spark issue #21868: [SPARK-24906][SQL] Adaptively enlarge split / partition ...

2018-08-08 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21868 Is this a parquet-specific issue? e.g., how about ORC? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #21868: [SPARK-24906][SQL] Adaptively enlarge split / partition ...

2018-08-08 Thread habren
Github user habren commented on the issue: https://github.com/apache/spark/pull/21868 Hi @maropu and @viirya Do you agree with the basic idea that we should take column pruning in to consideration during splitting the input files? ---

[GitHub] spark issue #21868: [SPARK-24906][SQL] Adaptively enlarge split / partition ...

2018-08-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21868 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21868: [SPARK-24906][SQL] Adaptively enlarge split / partition ...

2018-07-30 Thread habren
Github user habren commented on the issue: https://github.com/apache/spark/pull/21868 Hi @maropu and @viirya Do you agree with the basic idea that we should take column pruning in to consideration during splitting the input files? ---

[GitHub] spark issue #21868: [SPARK-24906][SQL] Adaptively enlarge split / partition ...

2018-07-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21868 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21868: [SPARK-24906][SQL] Adaptively enlarge split / partition ...

2018-07-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21868 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21868: [SPARK-24906][SQL] Adaptively enlarge split / partition ...

2018-07-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21868 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21868: [SPARK-24906][SQL] Adaptively enlarge split / partition ...

2018-07-26 Thread habren
Github user habren commented on the issue: https://github.com/apache/spark/pull/21868 @maropu If I understand correct, your concern is about how to calculate --- - To unsubscribe, e-mail:

[GitHub] spark issue #21868: [SPARK-24906][SQL] Adaptively enlarge split / partition ...

2018-07-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21868 BTW, why does this PR target branch-2.3? I think it should be master. --- - To unsubscribe, e-mail:

[GitHub] spark issue #21868: [SPARK-24906][SQL] Adaptively enlarge split / partition ...

2018-07-25 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21868 Thanks for the work, but, probably, we first need consensus to work on this because this part is pretty performance-sensitive... As @viirya described in the jira, I think we need more general

[GitHub] spark issue #21868: [SPARK-24906][SQL] Adaptively enlarge split / partition ...

2018-07-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21868 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21868: [SPARK-24906][SQL] Adaptively enlarge split / partition ...

2018-07-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21868 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21868: [SPARK-24906][SQL] Adaptively enlarge split / partition ...

2018-07-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21868 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional