ghuls commented on issue #286: URL: https://github.com/apache/arrow-rs/issues/286#issuecomment-843068396
> > IPC File Format > > We define a “file format” supporting random access that is build with the stream format. The file starts and ends with a magic string ARROW1 (plus padding). What follows in the file is identical to the stream format. At the end of the file, we write a footer containing a redundant copy of the schema (which is a part of the streaming format) plus memory offsets and sizes for each of the data blocks in the file. This enables random access any record batch in the file. See [File.fbs]( https://github.com/apache/arrow/blob/master/format/File.fbs) for the precise details of the file footer. > > Schematically we have: > ``` <magic number "ARROW1"> <empty padding bytes [to 8 byte boundary]> <STREAMING FORMAT with EOS> <FOOTER> <FOOTER SIZE: int32> <magic number "ARROW1"> ``` > > In the file format, there is no requirement that dictionary keys should be defined in a DictionaryBatch before they are used in a RecordBatch, as long as the keys are defined somewhere in the file. Further more, it is invalid to have more than one non-delta dictionary batch per dictionary ID (i.e. dictionary replacement is not supported). Delta dictionaries are applied in the order they appear in the file footer. > https://arrow.apache.org/docs/format/Columnar.html#ipc-file-format -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
