vichry2 opened a new issue, #6670:
URL: https://github.com/apache/arrow-rs/issues/6670

   **Which part is this question about**
   Arrow flight, FlightDataEncoderBuilder, do_get
   
   **Describe your question**
   Is it expected that Arrow's Python (C++) Flight implementation encodes data 
more efficiently than arrow-rs?   
   
   **Additional context**
   Hello.
   After discussion with @alamb, I am filing an issue here.
   
   Unsure if this is a bug, or if it's expected, or if there's just an issue 
with my code, but after running some tests, it seems that Rust's encoding takes 
more time and resources than Python.
   
   I am running two servers, one in Python and the other in Rust, with the same 
simple design: 
   -Create a `Table`/`RecordBatch` before starting the flight service, which 
the service will hold in memory when running.
   -When receiving a request (in `do_get`), simply provide a view of the data 
to `fl.RecordBatchStream` in Python / `FlightDataEncoderBuilder` in Rust.
   
   Because nothing is really happening on the Python side (just providing a 
view to a `Table`), and a single request is not holding the GIL for a 
significant amount of time, I imagine I'm ultimately measuring the C++ Arrow 
Flight implementation. 
   
   I have run two tests:
   1. Python script which sends *n* requests sequentially to each server, 
consuming the entire stream (`flightclient.do_get().read_all()`) and displays 
the average response time for each server.
   2. Using the Locust framework, load testing the maximum RPS capabilities of 
the servers (used `taskset -c` to seperate locust users and server).
   
   I observe the following from the tests:
   1. As the size of data sent to the client increases, the difference of 
average response time between Python and Rust servers also increases (in favor 
of Python server).
   2. Similarily, as the amount of data increases, Python is able to achieve a 
higher RPS than Rust.
   
   The Rust server's CPUs are fully utilized (using more than Python server in 
certain cases). After profiling with `perf`, I am seeing a lot of CPU usage 
related to memory movement.
   
   You can access my code here: https://github.com/vichry2/flight-benchmark 
   
   Thank you for your help!
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to