pitrou opened a new issue, #14792: URL: https://github.com/apache/arrow/issues/14792
### Describe the bug, including details regarding any error messages, version, and platform. The CSV streaming reader has this snippet: https://github.com/apache/arrow/blob/6f4a539323da0d6aef528ccce2404e36f0a08585/cpp/src/arrow/csv/reader.cc#L860-L862 and there's a similar one in the one-shot reader implementation. This will potentially execute `CSVBufferIterator::operator()` concurrently from multiple threads, but it is not thread-safe as it mutates internal state. Its execution should be serialized. Fortunately, `CSVBufferIterator` should be very cheap CPU-wise, so instead of transferring it to the CPU executor it can probably be serialized instead from the `IOStream` iterator. However, there seems to be a similar problem with `SerialBlockReader` which is potentially heavier, especially if `ParseOptions::newlines_in_values` has been enabled. So perhaps we instead need a more general `SerializingGenerator` building block that delays execution of a generator until its previous execution terminated. ### Component(s) C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
