Joris Van den Bossche created ARROW-8221: --------------------------------------------
Summary: [Python][Dataset] Expose schema inference / validation options in the factory Key: ARROW-8221 URL: https://issues.apache.org/jira/browse/ARROW-8221 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Joris Van den Bossche Fix For: 0.17.0 ARROW-8058 added options related to schema inference / validation for the Dataset factory. We should expose this in Python in the {{dataset(..)}} factory function: - Add ability to pass a user-specified schema with a {{schema}} keyword, instead of inferring the schema from (one of) the files (to be passed to the factory finish method) - Add {{validate_schema}} option to toggle whether the schema is validated against the actual files or not. - Expose in some way the number of fragments to be inspected when inferring the schema. Not sure yet what the best API for this would be. -- This message was sent by Atlassian Jira (v8.3.4#803005)