[ https://issues.apache.org/jira/browse/ARROW-6341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Francois Saint-Jacques updated ARROW-6341: ------------------------------------------ Description: The following classes should be accessible from Python: * class DataSource * class DataSourceDiscovery * class Dataset * class ScanContext, ScanOptions, ScanTask * class ScannerBuilder * class Scanner The end result is reading a directory of parquet files as a single stream. One should be able to re-implement [https://github.com/apache/arrow/pull/5720] in python. was: The following classes should be accessible from Python: * class DataSource * class DataFragment * function DiscoverySource * class ScanContext, ScanOptions, ScanTask * class Dataset * class ScannerBuilder * class Scanner The end result is reading a directory of parquet files as a single stream. > [Python] Implement low-level bindings for Dataset > ------------------------------------------------- > > Key: ARROW-6341 > URL: https://issues.apache.org/jira/browse/ARROW-6341 > Project: Apache Arrow > Issue Type: New Feature > Components: Python > Reporter: Francois Saint-Jacques > Assignee: Krisztian Szucs > Priority: Major > Labels: dataset, pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > The following classes should be accessible from Python: > * class DataSource > * class DataSourceDiscovery > * class Dataset > * class ScanContext, ScanOptions, ScanTask > * class ScannerBuilder > * class Scanner > The end result is reading a directory of parquet files as a single stream. > One should be able to re-implement > [https://github.com/apache/arrow/pull/5720] in python. -- This message was sent by Atlassian Jira (v8.3.4#803005)