GitHub user windpiger reopened a pull request:

    https://github.com/apache/spark/pull/17081

    [SPARK-18726][SQL][FOLLOW-UP]resolveRelation for FileFormat DataSource 
don't need to  listFiles twice

    ## What changes were proposed in this pull request?
    
    Currently when we resolveRelation for a `FileFormat DataSource` without 
providing user schema, it will execute `listFiles`  twice in 
`InMemoryFileIndex` during `resolveRelation`.
    
    This PR add a `FileStatusCache` for DataSource, this can avoid listFiles 
twice.
    
    But there is a bug in `InMemoryFileIndex` see:
     [SPARK-19748](https://github.com/apache/spark/pull/17079)
     [SPARK-19761](https://github.com/apache/spark/pull/17093), 
    so this pr should be after SPARK-19748/ SPARK-19761.
    
    
    ## How was this patch tested?
    unit test added

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/windpiger/spark 
resolveDataSourceScanFilesTwice

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17081.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17081
    
----
commit 0082b7633e8f84fe5cafa0362cd45cce4cfee459
Author: windpiger <song...@outlook.com>
Date:   2017-02-27T08:04:30Z

    [SPAKR-18726][SQL]resolveRelation for FileFormate DataSource don't need to  
listFiles twice

commit 6b5454ad0104459565febb520fa22ef30bdb8368
Author: windpiger <song...@outlook.com>
Date:   2017-02-27T08:39:45Z

    add test case

commit f1da0a4cf457f4efb6128beca3c08ccf95ef37a0
Author: windpiger <song...@outlook.com>
Date:   2017-02-27T23:59:34Z

    fix a style

commit f79f12c552ee1721295c347744fc5f92f048c74b
Author: windpiger <song...@outlook.com>
Date:   2017-03-01T22:49:13Z

    Merge branch 'master' into resolveDataSourceScanFilesTwice

commit a8c1deab0fc8e59863bf4a3d3b551f77fbebbc6d
Author: windpiger <song...@outlook.com>
Date:   2017-03-02T01:50:30Z

    fix test failed

commit 60fa03757d223f833e2fa161326a48a9015d4c6c
Author: windpiger <song...@outlook.com>
Date:   2017-03-02T04:49:08Z

    add a lazy

commit 9a73947efea334ba0cfc5b5508003807a93ff806
Author: windpiger <song...@outlook.com>
Date:   2017-03-02T06:49:44Z

    fix code style

commit 850094cd3b77f6ecf33caf88532920e73de976f4
Author: windpiger <song...@outlook.com>
Date:   2017-03-02T06:54:38Z

    Merge branch 'master' of github.com:apache/spark into 
resolveDataSourceScanFilesTwice

commit c39eb26da38f9d92e3871814be446c8d911be890
Author: windpiger <song...@outlook.com>
Date:   2017-03-02T11:03:18Z

    make filestatuscache local var

commit f3332cb870ae2be9383969de07a07c8761230e8b
Author: windpiger <song...@outlook.com>
Date:   2017-03-02T11:04:55Z

    modify a test case

commit 9cadd4168041fd859cc1e4b8396e5ed514129bff
Author: windpiger <song...@outlook.com>
Date:   2017-03-02T11:05:24Z

    modify a test case

commit 28c8158a7c9d7acdbf2a07ef66ace46c1215979f
Author: windpiger <song...@outlook.com>
Date:   2017-03-02T11:06:40Z

    modify a test case

commit 92618b3ad67c899e681a9923ad9abc5a7f2c7897
Author: windpiger <song...@outlook.com>
Date:   2017-03-02T11:07:10Z

    remove an empty line

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to