[jira] [Commented] (ARROW-12629) [C++] Configurable read-ahead in CSV and JSON readers
[ https://issues.apache.org/jira/browse/ARROW-12629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17452498#comment-17452498 ] Antoine Pitrou commented on ARROW-12629: {{use_readahead = true}} would sound good to me. > [C++] Configurable read-ahead in CSV and JSON readers > - > > Key: ARROW-12629 > URL: https://issues.apache.org/jira/browse/ARROW-12629 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Andre Kohn >Assignee: Supun Kamburugamuva >Priority: Major > Labels: good-first-issue > > We are compiling Arrow C++ to WebAssembly and ran into the following issue > with the CSV reader: > Browsers became very picky about the use of SharedArrayBuffers after the > events around Spectre and Meltdown. > As a result, you have to compile Arrow to WebAssembly without threads if you > don't want to run your website with very strict cross-origin isolation. > Unfortunately, the CSV reader seems to always spawn a thread for the > read-ahead in both, the SerialStreamingReader and the SerialTableReader > independent of whether use_threads is set. > Right now, this effectively means that you cannot use the CSV (and JSON) > readers in threadless WebAssembly builds. > > [https://github.com/apache/arrow/blob/4363fefe46dc357a9013f0f4bcdc235e1e2e8124/cpp/src/arrow/csv/reader.cc#L839] > [https://github.com/apache/arrow/blob/4363fefe46dc357a9013f0f4bcdc235e1e2e8124/cpp/src/arrow/csv/reader.cc#L913] > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (ARROW-12629) [C++] Configurable read-ahead in CSV and JSON readers
[ https://issues.apache.org/jira/browse/ARROW-12629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17452497#comment-17452497 ] Supun Kamburugamuva commented on ARROW-12629: - What would be a good option name for this? One option would be read_ahead But if we introduce this do we need to change all the readers? One other option would be not to read ahead if use_threads = false But this option is specifically for CPU threads. > [C++] Configurable read-ahead in CSV and JSON readers > - > > Key: ARROW-12629 > URL: https://issues.apache.org/jira/browse/ARROW-12629 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Andre Kohn >Assignee: Supun Kamburugamuva >Priority: Major > Labels: good-first-issue > > We are compiling Arrow C++ to WebAssembly and ran into the following issue > with the CSV reader: > Browsers became very picky about the use of SharedArrayBuffers after the > events around Spectre and Meltdown. > As a result, you have to compile Arrow to WebAssembly without threads if you > don't want to run your website with very strict cross-origin isolation. > Unfortunately, the CSV reader seems to always spawn a thread for the > read-ahead in both, the SerialStreamingReader and the SerialTableReader > independent of whether use_threads is set. > Right now, this effectively means that you cannot use the CSV (and JSON) > readers in threadless WebAssembly builds. > > [https://github.com/apache/arrow/blob/4363fefe46dc357a9013f0f4bcdc235e1e2e8124/cpp/src/arrow/csv/reader.cc#L839] > [https://github.com/apache/arrow/blob/4363fefe46dc357a9013f0f4bcdc235e1e2e8124/cpp/src/arrow/csv/reader.cc#L913] > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (ARROW-12629) [C++] Configurable read-ahead in CSV and JSON readers
[ https://issues.apache.org/jira/browse/ARROW-12629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17448143#comment-17448143 ] Antoine Pitrou commented on ARROW-12629: In both cases (CSV and JSON) this can probably be added to {{ReadOptions}}. > [C++] Configurable read-ahead in CSV and JSON readers > - > > Key: ARROW-12629 > URL: https://issues.apache.org/jira/browse/ARROW-12629 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Andre Kohn >Priority: Major > Labels: good-first-issue > > We are compiling Arrow C++ to WebAssembly and ran into the following issue > with the CSV reader: > Browsers became very picky about the use of SharedArrayBuffers after the > events around Spectre and Meltdown. > As a result, you have to compile Arrow to WebAssembly without threads if you > don't want to run your website with very strict cross-origin isolation. > Unfortunately, the CSV reader seems to always spawn a thread for the > read-ahead in both, the SerialStreamingReader and the SerialTableReader > independent of whether use_threads is set. > Right now, this effectively means that you cannot use the CSV (and JSON) > readers in threadless WebAssembly builds. > > [https://github.com/apache/arrow/blob/4363fefe46dc357a9013f0f4bcdc235e1e2e8124/cpp/src/arrow/csv/reader.cc#L839] > [https://github.com/apache/arrow/blob/4363fefe46dc357a9013f0f4bcdc235e1e2e8124/cpp/src/arrow/csv/reader.cc#L913] > > -- This message was sent by Atlassian Jira (v8.20.1#820001)