Hi,
    I'm using flight to receive streams from client and write to the
storage with python `pa.dataset.write_dataset` API. The whole data is
1 Billion rows, over 40GB with one partition column ranges from 0~63.
The container runs at 8-cores CPU and 4GB ram resources.
    It can be done about 160s (6M rows/s, each record batch is about
32K rows) for completing transferring and writing almost
synchronously, after setting 128 for io_thread_count.
     Then I'd like to find out the bottleneck of the system, CPU or
RAM or storage?
    1. I extend the ram into 32GB, then the server consumes more ram,
the writing progress works at the beginning, then suddenly slow down
and the data accumulated into ram until OOM.
    2. Then I set the ram to 64GB, so that the server will not killed
by OOM. Same happens, also, after all the data is transferred (in
memory), the server consumes all CPU shares (800%), but still very
slow on writing (not totally stopped, but about 100MB/minute).
    2.1 I'm wondering if the io thread is stuck, or the computation
task is stuck. After setting both io_thread_count and cpu_count to 32,
I wrapped the input stream of write_dataset with a callback on each
record batch, I can tell that all the record batches are consumed into
write_dataset API.
    2.2 I dumped all threads stack traces and grab a flamegraph. See
https://gist.github.com/hu6360567/e21ce04e7f620fafb5500cd93d44d3fb.

     It seems that all threads stucks at ThrottledAsyncTaskSchedulerImpl.

-- 
---------------------
Best Regards,
Wenbo Hu,

Reply via email to