[ https://issues.apache.org/jira/browse/ARROW-13436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joris Van den Bossche resolved ARROW-13436. ------------------------------------------- Fix Version/s: 6.0.0 Resolution: Fixed Issue resolved by pull request 11451 [https://github.com/apache/arrow/pull/11451] > [Python][Doc] Clarify what should be expected if read_table is passed an > empty list of columns > ---------------------------------------------------------------------------------------------- > > Key: ARROW-13436 > URL: https://issues.apache.org/jira/browse/ARROW-13436 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Reporter: Weston Pace > Assignee: Sasha Krassovsky > Priority: Major > Labels: good-first-issue, pull-request-available > Fix For: 6.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > The documentation for pyarrow.parquet.read_table states: > > * *columns* (_list_) – If not None, only these columns will be read from the > file. A column name may be a prefix of a nested field, e.g. ‘a’ will select > ‘a.b’, ‘a.c’, and ‘a.d.e’. > > It is not clear what should be the expected result if columns is an empty > list. In pyarrow 3.0 this read in all columns (as long as > use_legacy_dataset=False). In pyarrow 4.0 this doesn't read in any columns. > I think this behavior (not reading in any columns) is the correct behavior > (since None can be used for all columns) but we should clarify that in the > docs. -- This message was sent by Atlassian Jira (v8.3.4#803005)