Rich-T-kid opened a new issue, #10029:
URL: https://github.com/apache/arrow-rs/issues/10029
**Is your feature request related to a problem or challenge? Please describe
what you are trying to do.**
<!--
A clear and concise description of what the problem is. Ex. I'm always
frustrated when [...]
(This section helps Arrow developers understand the context and *why* for
this feature, in addition to the *what*)
-->
I want to gain more visibility into the runtime performance of arrow-flight.
The crate currently has no benchmarks in-tree, which makes a few things hard:
- Validating that future changes don't regress encode/decode or roundtrip
performance.
- Characterizing where time is actually spent in a Flight roundtrip, gRPC
frame assembly, IPC decode, alignment-related copies, etc.
- Verifying that the zero-copy properties Flight advertises actually hold
end-to-end.
**Describe the solution you'd like**
<!--
A clear and concise description of what you want to happen.
-->
1. Add a benchmark suite to arrow-flight that covers:
- End-to-end roundtrip benchmarks : full (client → server → client) over
a real gRPC channel, measuring throughput and per-batch latency for a
representative DoGet / DoPut flow.
- Encode-only and decode-only benchmarks : isolate the IPC encode and
decode steps so their measured independently and regressions can be attributed
cleanly.
- Tunable batch shape : the benchmarks should parameterize over the
number of columns (and ideally batch size and column types) so we can see how
cost scales with schema width. Wide and narrow batches stress different
per-column overheads.
2. follow up PR the removes any copies / any performance optimizations
- benchmarks should **prove** these are faster
**Describe alternatives you've considered**
<!--
A clear and concise description of any alternative solutions or features
you've considered.
-->
N/A
**Additional context**
<!--
Add any other context or screenshots about the feature request here.
-->
These three resources should provide the backing knowledge to understand
arrow-flight
[Introducing Apache Arrow
Flight](https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/)
[Arrow Flight RPC](https://arrow.apache.org/docs/format/Flight.html)
[Arrow IPC](https://arrow.apache.org/docs/format/Columnar.html#format-ipc)
- [arrow IPC message flat buffer
file](https://github.com/apache/arrow/blob/main/format/Message.fbs#L34)
- [arrow IPC schema flat buffer
file](https://github.com/apache/arrow/blob/main/format/Schema.fbs)
- [how kafka zero copy
works](https://blog.2minutestreaming.com/p/apache-kafka-zero-copy-operating-system-optimization)
- - this is not directly related to how arrow-flight/arrows IPC works but
its gives you a conceptual understanding.
**TODO**: fill in remaining context when I get a chance. Non blocking
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]