Github user raofu commented on the issue:
https://github.com/apache/spark/pull/22157
@dongjoon-hyun, thanks lot for the pointers! I've update the PR
description. Please let me know if there is any other information you'd like me
to add
Github user raofu commented on the issue:
https://github.com/apache/spark/pull/22157
@dongjoon-hyun Title updated. Thanks for adding the test coverage! I've
merged your commit. Can you help kick off another Jenkins run? I don't think I
have the permission to do
Github user raofu commented on the issue:
https://github.com/apache/spark/pull/22157
I fixed the test by making the first file the corrupted file. @srowen, can
you help kick off a Jenkins run?
---
-
To unsubscribe
GitHub user raofu opened a pull request:
https://github.com/apache/spark/pull/22157
[SPARK-25126] Avoid creating Reader for all orc files
In OrFileOperator.ReadSchema, a Reader is created for every file
although only the first valid one is used. This uses significant
amount
Github user raofu commented on a diff in the pull request:
https://github.com/apache/spark/pull/22113#discussion_r210473687
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileOperator.scala ---
@@ -70,7 +70,7 @@ private[hive] object OrcFileOperator extends
Github user raofu closed the pull request at:
https://github.com/apache/spark/pull/22113
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
GitHub user raofu opened a pull request:
https://github.com/apache/spark/pull/22113
[SPARK-25126] Lazily create Reader for orc files
## What changes were proposed in this pull request?
Currently Reader is created for every orc file under the directory and then
the first