[jira] [Updated] (ARROW-13436) [Python][Doc] Clarify what should be expected if read_table is passed an empty list of columns
[ https://issues.apache.org/jira/browse/ARROW-13436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-13436: --- Labels: good-first-issue pull-request-available (was: good-first-issue) > [Python][Doc] Clarify what should be expected if read_table is passed an > empty list of columns > -- > > Key: ARROW-13436 > URL: https://issues.apache.org/jira/browse/ARROW-13436 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Weston Pace >Assignee: Sasha Krassovsky >Priority: Major > Labels: good-first-issue, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The documentation for pyarrow.parquet.read_table states: > > * *columns* (_list_) – If not None, only these columns will be read from the > file. A column name may be a prefix of a nested field, e.g. ‘a’ will select > ‘a.b’, ‘a.c’, and ‘a.d.e’. > > It is not clear what should be the expected result if columns is an empty > list. In pyarrow 3.0 this read in all columns (as long as > use_legacy_dataset=False). In pyarrow 4.0 this doesn't read in any columns. > I think this behavior (not reading in any columns) is the correct behavior > (since None can be used for all columns) but we should clarify that in the > docs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-13436) [Python][Doc] Clarify what should be expected if read_table is passed an empty list of columns
[ https://issues.apache.org/jira/browse/ARROW-13436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-13436: -- Labels: good-first-issue (was: ) > [Python][Doc] Clarify what should be expected if read_table is passed an > empty list of columns > -- > > Key: ARROW-13436 > URL: https://issues.apache.org/jira/browse/ARROW-13436 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Weston Pace >Assignee: Weston Pace >Priority: Major > Labels: good-first-issue > > The documentation for pyarrow.parquet.read_table states: > > * *columns* (_list_) – If not None, only these columns will be read from the > file. A column name may be a prefix of a nested field, e.g. ‘a’ will select > ‘a.b’, ‘a.c’, and ‘a.d.e’. > > It is not clear what should be the expected result if columns is an empty > list. In pyarrow 3.0 this read in all columns (as long as > use_legacy_dataset=False). In pyarrow 4.0 this doesn't read in any columns. > I think this behavior (not reading in any columns) is the correct behavior > (since None can be used for all columns) but we should clarify that in the > docs. -- This message was sent by Atlassian Jira (v8.3.4#803005)