[ https://issues.apache.org/jira/browse/ARROW-10344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17230091#comment-17230091 ]
Joris Van den Bossche commented on ARROW-10344: ----------------------------------------------- bq. .. or are there plans to fix that soon? See my new comment on that issue. TLDR I don't think it is solvable in general. bq. Could reading the metadata schema for Feather v1 also be supported I think that is technically certainly possible. The C++ Reader interface already exposes a {{schema()}} function, but this is not exposed in Python. I suppose also for V2 this would be nice to have in the {{pyarrow.feather}} module. bq. We need the final data to be readable from Python and R, so Feather looked like a good choice. That's indeed one of the selling points of Feather, and I also didn't find any up to date R interface for zarr. I think it might still be worth looking for other options (giving the inherent limitation for V2 mentioned above). I don't have any experience with it myself, but might also be worth taking a look at TileDB. If you want to stay with arrow/feather files, one other alternative is to use a "trick" of putting all columns (of the same type) in a FixedSizeList column (the data under the hood is then stored in a contiguous array, which can be easily "viewed" as a 2D array). However, then you can no longer read only a subset of the columns, which might be an important use case. > [Python] Get all columns names (or schema) from Feather file, before loading > whole Feather file > ------------------------------------------------------------------------------------------------ > > Key: ARROW-10344 > URL: https://issues.apache.org/jira/browse/ARROW-10344 > Project: Apache Arrow > Issue Type: New Feature > Components: Python > Affects Versions: 1.0.1 > Reporter: Gert Hulselmans > Priority: Major > > Is there a way to get all column names (or schema) from a Feather file before > loading the full Feather file? > My Feather files are big (like 100GB) and the names of the columns are > different per analysis and can't be hard coded. > {code:python} > import pyarrow.feather as feather > # Code here to check which columns are in the feather file. > ... > my_columns = ... > # Result is pandas.DataFrame > read_df = feather.read_feather('/path/to/file', columns=my_columns) > # Result is pyarrow.Table > read_arrow = feather.read_table('/path/to/file', columns=my_columns) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)