ianmcook commented on issue #40597: URL: https://github.com/apache/arrow/issues/40597#issuecomment-2002661998
@kou Do you mean that a client could send a range request like `Range: batches=x-y` instead of `Range: bytes=x-y`? In that case: yes, the server would be more efficient retrieving the requested batches if the data on the server side was in the IPC file format, because the footer contains memory offsets and sizes for each record batch. But I am -1 on recommending the use of range requests with units that are not `bytes`. Although this is allowed by HTTP/1.1 (as described in [RFC 2616 Section 3.12](https://datatracker.ietf.org/doc/html/rfc2616#section-3.12)) and also by HTTP/2 (as described in [RFC 7540 Section 8](https://datatracker.ietf.org/doc/html/rfc7540#section-8)), HTTP clients and servers in general do not support this well. At best it would require overriding classes of the HTTP server libraries that are rarely overridden. At worst it would be altogether incompatible with some HTTP clients and servers. I think it is better if we recommend that HTTP APIs should handle requests for specific ranges of batches using whatever higher-level application-specific methods they choose (such as URL query parameters) and restrict the use of range requests to `bytes` units only. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
