pitrou opened a new issue, #14792:
URL: https://github.com/apache/arrow/issues/14792

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   The CSV streaming reader has this snippet:
   
https://github.com/apache/arrow/blob/6f4a539323da0d6aef528ccce2404e36f0a08585/cpp/src/arrow/csv/reader.cc#L860-L862
   
   and there's a similar one in the one-shot reader implementation.
   
   This will potentially execute `CSVBufferIterator::operator()` concurrently 
from multiple threads, but it is not thread-safe as it mutates internal state. 
Its execution should be serialized.
   
   Fortunately, `CSVBufferIterator` should be very cheap CPU-wise, so instead 
of transferring it to the CPU executor it can probably be serialized instead 
from the `IOStream` iterator.
   
   However, there seems to be a similar problem with `SerialBlockReader` which 
is potentially heavier, especially if `ParseOptions::newlines_in_values` has 
been enabled.
   
   So perhaps we instead need a more general `SerializingGenerator` building 
block that delays execution of a generator until its previous execution 
terminated.
   
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to