[ https://issues.apache.org/jira/browse/ARROW-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Antoine Pitrou updated ARROW-567: --------------------------------- Fix Version/s: (was: 4.0.0) 5.0.0 > [C++] File and stream APIs for interacting with "large" schemas > --------------------------------------------------------------- > > Key: ARROW-567 > URL: https://issues.apache.org/jira/browse/ARROW-567 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ > Reporter: Wes McKinney > Priority: Major > Fix For: 5.0.0 > > > For data where the metadata itself is large (> 10000 fields), doing a full > in-memory reconstruction of a record batch may be impractical if the user's > goal is to do random access on a potentially small subset of a batch. > I propose adding an API that enables "cheap" inspection of the record batch > metadata and reconstruction of fields. > Because of the flattened buffer and field metadata, at the moment the > complexity of random field access will scale with the number of fields -- in > the future we may devise strategies to mitigate this (e.g. storing a > pre-computed buffer/field lookup table in the schema) -- This message was sent by Atlassian Jira (v8.3.4#803005)