[jira] [Commented] (ARROW-13672) [C++] BinaryBuilder doesn't preserve passed in DataType

2021-12-06 Thread Supun Kamburugamuva (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-13672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17454129#comment-17454129
 ] 

Supun Kamburugamuva commented on ARROW-13672:
-

The BaseBinaryBuilder has a constructor that accepts the type parameter but is 
not used. I think what the issue says is we should store this type and return 
it in the type() method. 

Someone using the BaseBinaryBuilder can override these two methods and achieve 
the same. So I'm thinking is there a value in implementing this? The 
FinishInternal is implemented correctly as it uses the type() method to get the 
type in a derived class. 

> [C++] BinaryBuilder doesn't preserve passed in DataType
> ---
>
> Key: ARROW-13672
> URL: https://issues.apache.org/jira/browse/ARROW-13672
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 5.0.0
>Reporter: Micah Kornfield
>Assignee: Supun Kamburugamuva
>Priority: Minor
>  Labels: beginner, good-first-issue
>
> There is a 
> [constructor|https://github.com/apache/arrow/blob/1430c93f68960e10a50d27f465eb174e76ac06b2/cpp/src/arrow/array/builder_binary.h#L56]
>  that takes a datatype for binary builder but it is discarded.  When 
> constructing an Array the type is always the value returned from type() 
> [binary|https://github.com/apache/arrow/blob/1430c93f68960e10a50d27f465eb174e76ac06b2/cpp/src/arrow/array/builder_binary.h#L390]
> If a consumer of the API wants to have an extension array this prevents them 
> from passing the extension type though.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (ARROW-13672) [C++] BinaryBuilder doesn't preserve passed in DataType

2021-12-03 Thread Supun Kamburugamuva (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Supun Kamburugamuva reassigned ARROW-13672:
---

Assignee: Supun Kamburugamuva

> [C++] BinaryBuilder doesn't preserve passed in DataType
> ---
>
> Key: ARROW-13672
> URL: https://issues.apache.org/jira/browse/ARROW-13672
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 5.0.0
>Reporter: Micah Kornfield
>Assignee: Supun Kamburugamuva
>Priority: Minor
>  Labels: beginner, good-first-issue
>
> There is a 
> [constructor|https://github.com/apache/arrow/blob/1430c93f68960e10a50d27f465eb174e76ac06b2/cpp/src/arrow/array/builder_binary.h#L56]
>  that takes a datatype for binary builder but it is discarded.  When 
> constructing an Array the type is always the value returned from type() 
> [binary|https://github.com/apache/arrow/blob/1430c93f68960e10a50d27f465eb174e76ac06b2/cpp/src/arrow/array/builder_binary.h#L390]
> If a consumer of the API wants to have an extension array this prevents them 
> from passing the extension type though.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-13672) [C++] BinaryBuilder doesn't preserve passed in DataType

2021-12-03 Thread Supun Kamburugamuva (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-13672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17453217#comment-17453217
 ] 

Supun Kamburugamuva commented on ARROW-13672:
-

Should the solution be that we remove passing the type to the constructor?

> [C++] BinaryBuilder doesn't preserve passed in DataType
> ---
>
> Key: ARROW-13672
> URL: https://issues.apache.org/jira/browse/ARROW-13672
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 5.0.0
>Reporter: Micah Kornfield
>Priority: Minor
>  Labels: beginner, good-first-issue
>
> There is a 
> [constructor|https://github.com/apache/arrow/blob/1430c93f68960e10a50d27f465eb174e76ac06b2/cpp/src/arrow/array/builder_binary.h#L56]
>  that takes a datatype for binary builder but it is discarded.  When 
> constructing an Array the type is always the value returned from type() 
> [binary|https://github.com/apache/arrow/blob/1430c93f68960e10a50d27f465eb174e76ac06b2/cpp/src/arrow/array/builder_binary.h#L390]
> If a consumer of the API wants to have an extension array this prevents them 
> from passing the extension type though.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-12629) [C++] Configurable read-ahead in CSV and JSON readers

2021-12-02 Thread Supun Kamburugamuva (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-12629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452497#comment-17452497
 ] 

Supun Kamburugamuva commented on ARROW-12629:
-

What would be a good option name for this? 

One option would be 

read_ahead

But if we introduce this do we need to change all the readers?

One other option would be not to read ahead if 

use_threads = false

But this option is specifically for CPU threads. 

 

> [C++] Configurable read-ahead in CSV and JSON readers
> -
>
> Key: ARROW-12629
> URL: https://issues.apache.org/jira/browse/ARROW-12629
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Andre Kohn
>Assignee: Supun Kamburugamuva
>Priority: Major
>  Labels: good-first-issue
>
> We are compiling Arrow C++ to WebAssembly and ran into the following issue 
> with the CSV reader:
> Browsers became very picky about the use of SharedArrayBuffers after the 
> events around Spectre and Meltdown.
> As a result, you have to compile Arrow to WebAssembly without threads if you 
> don't want to run your website with very strict cross-origin isolation.
> Unfortunately, the CSV reader seems to always spawn a thread for the 
> read-ahead in both, the SerialStreamingReader and the SerialTableReader 
> independent of whether use_threads is set.
> Right now, this effectively means that you cannot use the CSV (and JSON) 
> readers in threadless WebAssembly builds.
>  
> [https://github.com/apache/arrow/blob/4363fefe46dc357a9013f0f4bcdc235e1e2e8124/cpp/src/arrow/csv/reader.cc#L839]
> [https://github.com/apache/arrow/blob/4363fefe46dc357a9013f0f4bcdc235e1e2e8124/cpp/src/arrow/csv/reader.cc#L913]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (ARROW-12629) [C++] Configurable read-ahead in CSV and JSON readers

2021-11-30 Thread Supun Kamburugamuva (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Supun Kamburugamuva reassigned ARROW-12629:
---

Assignee: Supun Kamburugamuva

> [C++] Configurable read-ahead in CSV and JSON readers
> -
>
> Key: ARROW-12629
> URL: https://issues.apache.org/jira/browse/ARROW-12629
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Andre Kohn
>Assignee: Supun Kamburugamuva
>Priority: Major
>  Labels: good-first-issue
>
> We are compiling Arrow C++ to WebAssembly and ran into the following issue 
> with the CSV reader:
> Browsers became very picky about the use of SharedArrayBuffers after the 
> events around Spectre and Meltdown.
> As a result, you have to compile Arrow to WebAssembly without threads if you 
> don't want to run your website with very strict cross-origin isolation.
> Unfortunately, the CSV reader seems to always spawn a thread for the 
> read-ahead in both, the SerialStreamingReader and the SerialTableReader 
> independent of whether use_threads is set.
> Right now, this effectively means that you cannot use the CSV (and JSON) 
> readers in threadless WebAssembly builds.
>  
> [https://github.com/apache/arrow/blob/4363fefe46dc357a9013f0f4bcdc235e1e2e8124/cpp/src/arrow/csv/reader.cc#L839]
> [https://github.com/apache/arrow/blob/4363fefe46dc357a9013f0f4bcdc235e1e2e8124/cpp/src/arrow/csv/reader.cc#L913]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)