Apologies if this is the wrong place for this, but I'm looking to repeatedly select a subset of columns from a wide feather file (which has ~200k columns). What I find is that if I use RecordBatchReader::Open with the requisite arguments asking it to select the particular columns, it reads the schema over and over (once per Open call). Now that is to be expected as there doesn't seem to be a way to pass a pre-existing schema.
However, in my use case, I want the smaller queries to be fast and can't have it re-parse the schema for every call. The input file thus has to be a io::RandomAccesssFile. Looking at arrow/ipc/reader.h, the only method that can serve this purpose seems to be: Result<std::shared_ptr<RecordBatch>> ReadRecordBatch( const Buffer& metadata, const std::shared_ptr<Schema>& schema, const DictionaryMemo* dictionary_memo, const IpcReadOptions& options, io::RandomAccessFile* file); How do I efficiently read the file once to get the schema and metadata in this case? My file does not have any dictionaries. Am I thinking about this incorrectly? Would appreciate any pointers. Thanks, Ishbir Singh