[ https://issues.apache.org/jira/browse/ARROW-10344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17217594#comment-17217594 ]
Joris Van den Bossche commented on ARROW-10344: ----------------------------------------------- bq. For filtering the data, is there an easy way to create a filter for thousands of columns? e.g.: only return rows for which at least one column has a value < 5000? I don't think we provide a direct way, but with some python utilities, you could construct such a filter. Eg with: {code} In [13]: import pyarrow.dataset as ds In [14]: import operator In [15]: import functools In [16]: expr = functools.reduce(operator.or_, [ds.field(f"col{i}") < 5000 for i in range(5000)]) {code} But I have _no_ idea how that will perform if you use this as a filter (I don't think we really considered such usecase for filtering up to now, so not sure the expressions/filtering code are optimized for a filter with that many columns) > [Python] Get all columns names (or schema) from Feather file, before loading > whole Feather file > ------------------------------------------------------------------------------------------------ > > Key: ARROW-10344 > URL: https://issues.apache.org/jira/browse/ARROW-10344 > Project: Apache Arrow > Issue Type: New Feature > Components: Python > Affects Versions: 1.0.1 > Reporter: Gert Hulselmans > Priority: Major > > Is there a way to get all column names (or schema) from a Feather file before > loading the full Feather file? > My Feather files are big (like 100GB) and the names of the columns are > different per analysis and can't be hard coded. > {code:python} > import pyarrow.feather as feather > # Code here to check which columns are in the feather file. > ... > my_columns = ... > # Result is pandas.DataFrame > read_df = feather.read_feather('/path/to/file', columns=my_columns) > # Result is pyarrow.Table > read_arrow = feather.read_table('/path/to/file', columns=my_columns) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)