[ https://issues.apache.org/jira/browse/ARROW-10344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17217629#comment-17217629 ]
Gert Hulselmans commented on ARROW-10344: ----------------------------------------- Thanks. I can still do the filtering afterwards, so that is not a big problem. Is there also something similar than pyarrow.dataset that allows combining multiple feather files with one common column (index, which is in the same order in all feather files) , while the other columns are different. It seems only appending rows is supported by pyarrow.dataset). {noformat} feather1: index col1 col2 1 ... n feather2: index col3 col4 col5 1 ... n feather3: index col6 col7 col 8 1 ... n read feather1,2,3 as one combined table: index col1 col2 col3 col4 col5 col6 col7 col 8 1 ... n {noformat} > [Python] Get all columns names (or schema) from Feather file, before loading > whole Feather file > ------------------------------------------------------------------------------------------------ > > Key: ARROW-10344 > URL: https://issues.apache.org/jira/browse/ARROW-10344 > Project: Apache Arrow > Issue Type: New Feature > Components: Python > Affects Versions: 1.0.1 > Reporter: Gert Hulselmans > Priority: Major > > Is there a way to get all column names (or schema) from a Feather file before > loading the full Feather file? > My Feather files are big (like 100GB) and the names of the columns are > different per analysis and can't be hard coded. > {code:python} > import pyarrow.feather as feather > # Code here to check which columns are in the feather file. > ... > my_columns = ... > # Result is pandas.DataFrame > read_df = feather.read_feather('/path/to/file', columns=my_columns) > # Result is pyarrow.Table > read_arrow = feather.read_table('/path/to/file', columns=my_columns) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)