GitHub user windpiger reopened a pull request: https://github.com/apache/spark/pull/17081
[SPARK-18726][SQL][FOLLOW-UP]resolveRelation for FileFormat DataSource don't need to listFiles twice ## What changes were proposed in this pull request? Currently when we resolveRelation for a `FileFormat DataSource` without providing user schema, it will execute `listFiles` twice in `InMemoryFileIndex` during `resolveRelation`. This PR add a `FileStatusCache` for DataSource, this can avoid listFiles twice. But there is a bug in `InMemoryFileIndex` see: [SPARK-19748](https://github.com/apache/spark/pull/17079) [SPARK-19761](https://github.com/apache/spark/pull/17093), so this pr should be after SPARK-19748/ SPARK-19761. ## How was this patch tested? unit test added You can merge this pull request into a Git repository by running: $ git pull https://github.com/windpiger/spark resolveDataSourceScanFilesTwice Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17081.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17081 ---- commit 0082b7633e8f84fe5cafa0362cd45cce4cfee459 Author: windpiger <song...@outlook.com> Date: 2017-02-27T08:04:30Z [SPAKR-18726][SQL]resolveRelation for FileFormate DataSource don't need to listFiles twice commit 6b5454ad0104459565febb520fa22ef30bdb8368 Author: windpiger <song...@outlook.com> Date: 2017-02-27T08:39:45Z add test case commit f1da0a4cf457f4efb6128beca3c08ccf95ef37a0 Author: windpiger <song...@outlook.com> Date: 2017-02-27T23:59:34Z fix a style commit f79f12c552ee1721295c347744fc5f92f048c74b Author: windpiger <song...@outlook.com> Date: 2017-03-01T22:49:13Z Merge branch 'master' into resolveDataSourceScanFilesTwice commit a8c1deab0fc8e59863bf4a3d3b551f77fbebbc6d Author: windpiger <song...@outlook.com> Date: 2017-03-02T01:50:30Z fix test failed commit 60fa03757d223f833e2fa161326a48a9015d4c6c Author: windpiger <song...@outlook.com> Date: 2017-03-02T04:49:08Z add a lazy commit 9a73947efea334ba0cfc5b5508003807a93ff806 Author: windpiger <song...@outlook.com> Date: 2017-03-02T06:49:44Z fix code style commit 850094cd3b77f6ecf33caf88532920e73de976f4 Author: windpiger <song...@outlook.com> Date: 2017-03-02T06:54:38Z Merge branch 'master' of github.com:apache/spark into resolveDataSourceScanFilesTwice commit c39eb26da38f9d92e3871814be446c8d911be890 Author: windpiger <song...@outlook.com> Date: 2017-03-02T11:03:18Z make filestatuscache local var commit f3332cb870ae2be9383969de07a07c8761230e8b Author: windpiger <song...@outlook.com> Date: 2017-03-02T11:04:55Z modify a test case commit 9cadd4168041fd859cc1e4b8396e5ed514129bff Author: windpiger <song...@outlook.com> Date: 2017-03-02T11:05:24Z modify a test case commit 28c8158a7c9d7acdbf2a07ef66ace46c1215979f Author: windpiger <song...@outlook.com> Date: 2017-03-02T11:06:40Z modify a test case commit 92618b3ad67c899e681a9923ad9abc5a7f2c7897 Author: windpiger <song...@outlook.com> Date: 2017-03-02T11:07:10Z remove an empty line ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org