[ https://issues.apache.org/jira/browse/ARROW-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Todd Farmer reassigned ARROW-12311: ----------------------------------- Assignee: (was: Weston Pace) > [Python][R] Expose (hide?) ScanOptions > -------------------------------------- > > Key: ARROW-12311 > URL: https://issues.apache.org/jira/browse/ARROW-12311 > Project: Apache Arrow > Issue Type: Improvement > Components: Python, R > Reporter: Weston Pace > Priority: Major > Fix For: 10.0.0 > > > Currently R completely hides the `ScanOptions` class. > In python the class is exposed but the documentation prefers `dataset.scan` > (which hides both the scanner and the scan options). > However, there is some useful information in the `ScanOptions`. > Specifically, the projected schema (which is a product of the dataset schema > and the projection expression and not easily recreated) and the materialized > fields (the list of fields referenced by either the filter or the projection) > which might be useful for reporting purposes. > Currently R uses the projected schema to convert a list of column names into > a partition schema. Python does not rely on either field. > > Options: > - Keep the status quo > - Expose the ScanOptions object (which itself is exposed via the Scanner) > - Expose the interesting fields via the Scanner > > Currently the C++ design is halfway between the latter two (projected schema > is exposed and options). My preference would be the third option. It raises > a further question about how to expose the scanner itself in Python? Should > the user be using ScannerBuilder? Should they use NewScan? Should they use > the scanner directly at all or should it be hidden? -- This message was sent by Atlassian Jira (v8.20.10#820010)