Ben Kietzman created ARROW-8058:
-----------------------------------

             Summary: [C++][Python][Dataset] Provide an option to skip 
validation in FileSystemDatasetFactoryOptions
                 Key: ARROW-8058
                 URL: https://issues.apache.org/jira/browse/ARROW-8058
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++ - Dataset, Python
    Affects Versions: 0.16.0
            Reporter: Ben Kietzman
            Assignee: Ben Kietzman
             Fix For: 1.0.0


This can be costly and is not always necessary.

At the same time we could move file validation into the scan tasks; currently 
all files are inspected as the dataset is constructed, which can be expensive 
if the filesystem is slow. We'll be performing the validation multiple times 
but the check will be cheap since at scan time we'll be reading the file into 
memory anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to