Ben Kietzman created ARROW-8058: ----------------------------------- Summary: [C++][Python][Dataset] Provide an option to skip validation in FileSystemDatasetFactoryOptions Key: ARROW-8058 URL: https://issues.apache.org/jira/browse/ARROW-8058 Project: Apache Arrow Issue Type: Improvement Components: C++ - Dataset, Python Affects Versions: 0.16.0 Reporter: Ben Kietzman Assignee: Ben Kietzman Fix For: 1.0.0
This can be costly and is not always necessary. At the same time we could move file validation into the scan tasks; currently all files are inspected as the dataset is constructed, which can be expensive if the filesystem is slow. We'll be performing the validation multiple times but the check will be cheap since at scan time we'll be reading the file into memory anyway. -- This message was sent by Atlassian Jira (v8.3.4#803005)