[GitHub] [arrow-rs] ghuls commented on issue #286: Unable to load Feather v2 files created by pyarrow and pandas.

GitBox Tue, 18 May 2021 03:53:20 -0700


ghuls commented on issue #286:
URL: https://github.com/apache/arrow-rs/issues/286#issuecomment-843068396



   > 
   > IPC File Format
   > 
   > We define a “file format” supporting random access that is build with the 
stream format. The file starts and ends with a magic string ARROW1 (plus 
padding). What follows in the file is identical to the stream format. At the 
end of the file, we write a footer containing a redundant copy of the schema 
(which is a part of the streaming format) plus memory offsets and sizes for 
each of the data blocks in the file. This enables random access any record 
batch in the file. See [File.fbs](
   https://github.com/apache/arrow/blob/master/format/File.fbs) for the precise 
details of the file footer.
   > 
   > Schematically we have:
   > 
   ```
   <magic number "ARROW1">
   <empty padding bytes [to 8 byte boundary]>
   <STREAMING FORMAT with EOS>
   <FOOTER>
   <FOOTER SIZE: int32>
   <magic number "ARROW1">
   ```
   > 
   > In the file format, there is no requirement that dictionary keys should be 
defined in a DictionaryBatch before they are used in a RecordBatch, as long as 
the keys are defined somewhere in the file. Further more, it is invalid to have 
more than one non-delta dictionary batch per dictionary ID (i.e. dictionary 
replacement is not supported). Delta dictionaries are applied in the order they 
appear in the file footer.
   > 
   
   https://arrow.apache.org/docs/format/Columnar.html#ipc-file-format
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] ghuls commented on issue #286: Unable to load Feather v2 files created by pyarrow and pandas.

Reply via email to