paleolimbot commented on issue #40597: URL: https://github.com/apache/arrow/issues/40597#issuecomment-2002691785
I think the idea with the file format might be more like: grab the last 8 bytes of the file (which I think contains a magic number and the number of footer bytes), then grab the footer (which contains offsets for various message locations), then grab specific batches, perhaps in parallel. I don't know if it's worth documenting, but I wonder if APIs would want to serve something like the footer metadata (which includes the offsets) via whatever API the client is calling to get the URI to the data in the first place, then the client could use range requests to read specific batches (or split up the fetch in parallel using a thread pool) in the same way. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
