Rich-T-kid opened a new issue, #10029:
URL: https://github.com/apache/arrow-rs/issues/10029

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   <!--
   A clear and concise description of what the problem is. Ex. I'm always 
frustrated when [...] 
   (This section helps Arrow developers understand the context and *why* for 
this feature, in addition to  the *what*)
   -->
   I want to gain more visibility into the runtime performance of arrow-flight. 
The crate currently has no benchmarks in-tree, which makes a few things hard:
   
   - Validating that future changes don't regress encode/decode or roundtrip 
performance.
   - Characterizing where time is actually spent in a Flight roundtrip, gRPC 
frame assembly, IPC decode, alignment-related copies, etc.
   - Verifying that the zero-copy properties Flight advertises actually hold 
end-to-end.
   
   **Describe the solution you'd like**
   <!--
   A clear and concise description of what you want to happen.
   -->
   1. Add a benchmark suite to arrow-flight that covers:
   
     - End-to-end roundtrip benchmarks :  full (client → server → client) over 
a real gRPC channel, measuring throughput and per-batch latency for a 
representative DoGet / DoPut flow.
     - Encode-only and decode-only benchmarks : isolate the IPC encode and 
decode steps so their measured independently and regressions can be attributed 
cleanly.
     - Tunable batch shape :  the benchmarks should parameterize over the 
number of columns (and ideally batch size and column types) so we can see how 
cost scales with schema width. Wide and narrow batches stress different 
per-column overheads.
   2. follow up PR the removes any copies / any performance optimizations 
   
   - benchmarks should **prove** these are faster
   
   **Describe alternatives you've considered**
   <!--
   A clear and concise description of any alternative solutions or features 
you've considered.
   -->
   N/A
   
   **Additional context**
   <!--
   Add any other context or screenshots about the feature request here.
   -->
   These three resources should provide the backing knowledge to understand 
arrow-flight
   [Introducing Apache Arrow 
Flight](https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/)
   [Arrow Flight RPC](https://arrow.apache.org/docs/format/Flight.html)
   [Arrow IPC](https://arrow.apache.org/docs/format/Columnar.html#format-ipc)
   
   - [arrow IPC message flat buffer 
file](https://github.com/apache/arrow/blob/main/format/Message.fbs#L34)
   - [arrow IPC schema flat buffer 
file](https://github.com/apache/arrow/blob/main/format/Schema.fbs)
   - [how kafka zero copy 
works](https://blog.2minutestreaming.com/p/apache-kafka-zero-copy-operating-system-optimization)
 
   -  - this is not directly related to how arrow-flight/arrows IPC works but 
its gives you a conceptual understanding.
   
   **TODO**: fill in remaining context when I get a chance. Non blocking
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to