Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21868
I think we should fix this. Basically the dynamic estimation logic is too
flaky, and I think we need this for the current status. Let's don't add it for
now.
While I am revisiting old
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21868
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user habren commented on the issue:
https://github.com/apache/spark/pull/21868
@HyukjinKwon Yes this is to handle it dynamically.
For ad-hoc query, the selected columns are different for different queries,
and it's not convenient or event impossible for users to set
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21868
@habren, BTW, just for clarification, you can set the bigger number to
`spark.sql.files.maxPartitionBytes` explicitly and that resolved your issue.
This one is to handle it dynamically, right?
Github user habren commented on the issue:
https://github.com/apache/spark/pull/21868
Hi @HyukjinKwon I moved the change to master branch just now. Please help
to review
---
-
To unsubscribe, e-mail:
Github user habren commented on the issue:
https://github.com/apache/spark/pull/21868
@HyukjinKwon Thanks for your comments. I will submit it to master soon
---
-
To unsubscribe, e-mail:
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21868
??? why does this still target branch-2.3? is this a backport?
---
-
To unsubscribe, e-mail:
Github user habren commented on the issue:
https://github.com/apache/spark/pull/21868
@maropu Thanks for your comments. ORC can also benefit from this change
since ORC is also columnar file format. Do you think I should add ORC support
by change the below line
`
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/21868
Is this a parquet-specific issue? e.g., how about ORC?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user habren commented on the issue:
https://github.com/apache/spark/pull/21868
Hi @maropu and @viirya Do you agree with the basic idea that we should take
column pruning in to consideration during splitting the input files?
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21868
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user habren commented on the issue:
https://github.com/apache/spark/pull/21868
Hi @maropu and @viirya Do you agree with the basic idea that we should
take column pruning in to consideration during splitting the input files?
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21868
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21868
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21868
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user habren commented on the issue:
https://github.com/apache/spark/pull/21868
@maropu If I understand correct, your concern is about how to calculate
---
-
To unsubscribe, e-mail:
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21868
BTW, why does this PR target branch-2.3? I think it should be master.
---
-
To unsubscribe, e-mail:
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/21868
Thanks for the work, but, probably, we first need consensus to work on this
because this part is pretty performance-sensitive... As @viirya described in
the jira, I think we need more general
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21868
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21868
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21868
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
21 matches
Mail list logo