[jira] [Commented] (ARROW-12629) [C++] Configurable read-ahead in CSV and JSON readers

2021-12-02 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17452498#comment-17452498
 ] 

Antoine Pitrou commented on ARROW-12629:


{{use_readahead = true}} would sound good to me.

> [C++] Configurable read-ahead in CSV and JSON readers
> -
>
> Key: ARROW-12629
> URL: https://issues.apache.org/jira/browse/ARROW-12629
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Andre Kohn
>Assignee: Supun Kamburugamuva
>Priority: Major
>  Labels: good-first-issue
>
> We are compiling Arrow C++ to WebAssembly and ran into the following issue 
> with the CSV reader:
> Browsers became very picky about the use of SharedArrayBuffers after the 
> events around Spectre and Meltdown.
> As a result, you have to compile Arrow to WebAssembly without threads if you 
> don't want to run your website with very strict cross-origin isolation.
> Unfortunately, the CSV reader seems to always spawn a thread for the 
> read-ahead in both, the SerialStreamingReader and the SerialTableReader 
> independent of whether use_threads is set.
> Right now, this effectively means that you cannot use the CSV (and JSON) 
> readers in threadless WebAssembly builds.
>  
> [https://github.com/apache/arrow/blob/4363fefe46dc357a9013f0f4bcdc235e1e2e8124/cpp/src/arrow/csv/reader.cc#L839]
> [https://github.com/apache/arrow/blob/4363fefe46dc357a9013f0f4bcdc235e1e2e8124/cpp/src/arrow/csv/reader.cc#L913]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-12629) [C++] Configurable read-ahead in CSV and JSON readers

2021-12-02 Thread Supun Kamburugamuva (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17452497#comment-17452497
 ] 

Supun Kamburugamuva commented on ARROW-12629:
-

What would be a good option name for this? 

One option would be 

read_ahead

But if we introduce this do we need to change all the readers?

One other option would be not to read ahead if 

use_threads = false

But this option is specifically for CPU threads. 

 

> [C++] Configurable read-ahead in CSV and JSON readers
> -
>
> Key: ARROW-12629
> URL: https://issues.apache.org/jira/browse/ARROW-12629
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Andre Kohn
>Assignee: Supun Kamburugamuva
>Priority: Major
>  Labels: good-first-issue
>
> We are compiling Arrow C++ to WebAssembly and ran into the following issue 
> with the CSV reader:
> Browsers became very picky about the use of SharedArrayBuffers after the 
> events around Spectre and Meltdown.
> As a result, you have to compile Arrow to WebAssembly without threads if you 
> don't want to run your website with very strict cross-origin isolation.
> Unfortunately, the CSV reader seems to always spawn a thread for the 
> read-ahead in both, the SerialStreamingReader and the SerialTableReader 
> independent of whether use_threads is set.
> Right now, this effectively means that you cannot use the CSV (and JSON) 
> readers in threadless WebAssembly builds.
>  
> [https://github.com/apache/arrow/blob/4363fefe46dc357a9013f0f4bcdc235e1e2e8124/cpp/src/arrow/csv/reader.cc#L839]
> [https://github.com/apache/arrow/blob/4363fefe46dc357a9013f0f4bcdc235e1e2e8124/cpp/src/arrow/csv/reader.cc#L913]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-12629) [C++] Configurable read-ahead in CSV and JSON readers

2021-11-23 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17448143#comment-17448143
 ] 

Antoine Pitrou commented on ARROW-12629:


In both cases (CSV and JSON) this can probably be added to {{ReadOptions}}.

> [C++] Configurable read-ahead in CSV and JSON readers
> -
>
> Key: ARROW-12629
> URL: https://issues.apache.org/jira/browse/ARROW-12629
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Andre Kohn
>Priority: Major
>  Labels: good-first-issue
>
> We are compiling Arrow C++ to WebAssembly and ran into the following issue 
> with the CSV reader:
> Browsers became very picky about the use of SharedArrayBuffers after the 
> events around Spectre and Meltdown.
> As a result, you have to compile Arrow to WebAssembly without threads if you 
> don't want to run your website with very strict cross-origin isolation.
> Unfortunately, the CSV reader seems to always spawn a thread for the 
> read-ahead in both, the SerialStreamingReader and the SerialTableReader 
> independent of whether use_threads is set.
> Right now, this effectively means that you cannot use the CSV (and JSON) 
> readers in threadless WebAssembly builds.
>  
> [https://github.com/apache/arrow/blob/4363fefe46dc357a9013f0f4bcdc235e1e2e8124/cpp/src/arrow/csv/reader.cc#L839]
> [https://github.com/apache/arrow/blob/4363fefe46dc357a9013f0f4bcdc235e1e2e8124/cpp/src/arrow/csv/reader.cc#L913]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)