[GitHub] spark pull request #22113: [SPARK-25126] Lazily create Reader for orc files
Github user raofu commented on a diff in the pull request: https://github.com/apache/spark/pull/22113#discussion_r210473687 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileOperator.scala --- @@ -70,7 +70,7 @@ private[hive] object OrcFileOperator extends Logging { hdfsPath.getFileSystem(conf) } -listOrcFiles(basePath, conf).iterator.map { path => +listOrcFiles(basePath, conf).view.map { path => --- End diff -- My bad. I misread the code. Sorry about the noise. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22113: [SPARK-25126] Lazily create Reader for orc files
Github user raofu closed the pull request at: https://github.com/apache/spark/pull/22113 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22113: [SPARK-25126] Lazily create Reader for orc files
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22113#discussion_r210462023 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileOperator.scala --- @@ -70,7 +70,7 @@ private[hive] object OrcFileOperator extends Logging { hdfsPath.getFileSystem(conf) } -listOrcFiles(basePath, conf).iterator.map { path => +listOrcFiles(basePath, conf).view.map { path => --- End diff -- Do you mean `collectFirst` actually traverse `iterator` entirely? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22113: [SPARK-25126] Lazily create Reader for orc files
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22113#discussion_r210461983 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileOperator.scala --- @@ -70,7 +70,7 @@ private[hive] object OrcFileOperator extends Logging { hdfsPath.getFileSystem(conf) } -listOrcFiles(basePath, conf).iterator.map { path => +listOrcFiles(basePath, conf).view.map { path => --- End diff -- Do you mean 'iterator' and `collectFirst` actually traverse entirely? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22113: [SPARK-25126] Lazily create Reader for orc files
GitHub user raofu opened a pull request: https://github.com/apache/spark/pull/22113 [SPARK-25126] Lazily create Reader for orc files ## What changes were proposed in this pull request? Currently Reader is created for every orc file under the directory and then the first one with non-empty schema is returned. Using `view` lazily creates Reader instead. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/raofu/spark SPARK-25126 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22113.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22113 commit 9f5aad0591b9912f5186cd2da8328b348eea5425 Author: Rao Fu Date: 2018-08-15T20:20:45Z [SPARK-25126] Lazily create Reader for orc files --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org