paleolimbot commented on issue #40597:
URL: https://github.com/apache/arrow/issues/40597#issuecomment-2002691785

   I think the idea with the file format might be more like: grab the last 8 
bytes of the file (which I think contains a magic number and the number of 
footer bytes), then grab the footer (which contains offsets for various message 
locations), then grab specific batches, perhaps in parallel.
   
   I don't know if it's worth documenting, but I wonder if APIs would want to 
serve something like the footer metadata (which includes the offsets) via 
whatever API the client is calling to get the URI to the data in the first 
place, then the client could use range requests to read specific batches (or 
split up the fetch in parallel using a thread pool) in the same way.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to