Hello all, I am working on a pure Julia implementation of the arrow standard.  
Currently I am working on ingesting the metadata, and it seems to me that the 
output I'm creating with `pyarrow` is not matching the format, so I'm trying to 
figure out where I've misunderstood it.

I've written some arrow data to disk with the code you can find in [this 
gist](https://gist.github.com/ExpandingMan/4ef3cadab6f3e6d65e672a32b821654f).

Reading the format, I expect each message to start with an `Int32` giving the 
size of the metadata flatbuffers, followed by the metadata flatbuffers 
themselves.  The `Int32`'s indeed seem to be there, however the `Message` 
flatbuffers do not start where I expect.  On the output from above, I find the 
first flatbuffers containing the `Message` with the `Schema` at byte 20.  I am 
successfully able to construct all flatbuffer objects in Julia from byte 20, 
but I was expecting to find this flatbuffer at byte 4 immediately following the 
`Int32`.  What is contained in bytes 4 to 19?

Similarly, I can find the next `Int32` at byte 144 as expected, however I can't 
find the flatbuffers after that until byte 168.  Again, I can successfully 
construct the metadata flatbuffers (in this case a `Message` containing a 
`RecordBatch`) in Julia, but I was expecting to do this from byte 148, not byte 
168.  What is contained in bytes 144 to 168?  Note that this is now a 24 byte 
boundary, where as for the first `Message` it was only 16.

What am I missing here?  I have a suspicion that there is a small flatbuffer of 
some sort being contained in the mysterious extra bytes, but the format 
description makes no mention of that.

Thanks!

Reply via email to