[jira] [Commented] (ARROW-13672) [C++] BinaryBuilder doesn't preserve passed in DataType
[ https://issues.apache.org/jira/browse/ARROW-13672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17454129#comment-17454129 ] Supun Kamburugamuva commented on ARROW-13672: - The BaseBinaryBuilder has a constructor that accepts the type parameter but is not used. I think what the issue says is we should store this type and return it in the type() method. Someone using the BaseBinaryBuilder can override these two methods and achieve the same. So I'm thinking is there a value in implementing this? The FinishInternal is implemented correctly as it uses the type() method to get the type in a derived class. > [C++] BinaryBuilder doesn't preserve passed in DataType > --- > > Key: ARROW-13672 > URL: https://issues.apache.org/jira/browse/ARROW-13672 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 5.0.0 >Reporter: Micah Kornfield >Assignee: Supun Kamburugamuva >Priority: Minor > Labels: beginner, good-first-issue > > There is a > [constructor|https://github.com/apache/arrow/blob/1430c93f68960e10a50d27f465eb174e76ac06b2/cpp/src/arrow/array/builder_binary.h#L56] > that takes a datatype for binary builder but it is discarded. When > constructing an Array the type is always the value returned from type() > [binary|https://github.com/apache/arrow/blob/1430c93f68960e10a50d27f465eb174e76ac06b2/cpp/src/arrow/array/builder_binary.h#L390] > If a consumer of the API wants to have an extension array this prevents them > from passing the extension type though. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (ARROW-13672) [C++] BinaryBuilder doesn't preserve passed in DataType
[ https://issues.apache.org/jira/browse/ARROW-13672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Supun Kamburugamuva reassigned ARROW-13672: --- Assignee: Supun Kamburugamuva > [C++] BinaryBuilder doesn't preserve passed in DataType > --- > > Key: ARROW-13672 > URL: https://issues.apache.org/jira/browse/ARROW-13672 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 5.0.0 >Reporter: Micah Kornfield >Assignee: Supun Kamburugamuva >Priority: Minor > Labels: beginner, good-first-issue > > There is a > [constructor|https://github.com/apache/arrow/blob/1430c93f68960e10a50d27f465eb174e76ac06b2/cpp/src/arrow/array/builder_binary.h#L56] > that takes a datatype for binary builder but it is discarded. When > constructing an Array the type is always the value returned from type() > [binary|https://github.com/apache/arrow/blob/1430c93f68960e10a50d27f465eb174e76ac06b2/cpp/src/arrow/array/builder_binary.h#L390] > If a consumer of the API wants to have an extension array this prevents them > from passing the extension type though. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (ARROW-13672) [C++] BinaryBuilder doesn't preserve passed in DataType
[ https://issues.apache.org/jira/browse/ARROW-13672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17453217#comment-17453217 ] Supun Kamburugamuva commented on ARROW-13672: - Should the solution be that we remove passing the type to the constructor? > [C++] BinaryBuilder doesn't preserve passed in DataType > --- > > Key: ARROW-13672 > URL: https://issues.apache.org/jira/browse/ARROW-13672 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 5.0.0 >Reporter: Micah Kornfield >Priority: Minor > Labels: beginner, good-first-issue > > There is a > [constructor|https://github.com/apache/arrow/blob/1430c93f68960e10a50d27f465eb174e76ac06b2/cpp/src/arrow/array/builder_binary.h#L56] > that takes a datatype for binary builder but it is discarded. When > constructing an Array the type is always the value returned from type() > [binary|https://github.com/apache/arrow/blob/1430c93f68960e10a50d27f465eb174e76ac06b2/cpp/src/arrow/array/builder_binary.h#L390] > If a consumer of the API wants to have an extension array this prevents them > from passing the extension type though. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (ARROW-12629) [C++] Configurable read-ahead in CSV and JSON readers
[ https://issues.apache.org/jira/browse/ARROW-12629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452497#comment-17452497 ] Supun Kamburugamuva commented on ARROW-12629: - What would be a good option name for this? One option would be read_ahead But if we introduce this do we need to change all the readers? One other option would be not to read ahead if use_threads = false But this option is specifically for CPU threads. > [C++] Configurable read-ahead in CSV and JSON readers > - > > Key: ARROW-12629 > URL: https://issues.apache.org/jira/browse/ARROW-12629 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Andre Kohn >Assignee: Supun Kamburugamuva >Priority: Major > Labels: good-first-issue > > We are compiling Arrow C++ to WebAssembly and ran into the following issue > with the CSV reader: > Browsers became very picky about the use of SharedArrayBuffers after the > events around Spectre and Meltdown. > As a result, you have to compile Arrow to WebAssembly without threads if you > don't want to run your website with very strict cross-origin isolation. > Unfortunately, the CSV reader seems to always spawn a thread for the > read-ahead in both, the SerialStreamingReader and the SerialTableReader > independent of whether use_threads is set. > Right now, this effectively means that you cannot use the CSV (and JSON) > readers in threadless WebAssembly builds. > > [https://github.com/apache/arrow/blob/4363fefe46dc357a9013f0f4bcdc235e1e2e8124/cpp/src/arrow/csv/reader.cc#L839] > [https://github.com/apache/arrow/blob/4363fefe46dc357a9013f0f4bcdc235e1e2e8124/cpp/src/arrow/csv/reader.cc#L913] > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (ARROW-12629) [C++] Configurable read-ahead in CSV and JSON readers
[ https://issues.apache.org/jira/browse/ARROW-12629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Supun Kamburugamuva reassigned ARROW-12629: --- Assignee: Supun Kamburugamuva > [C++] Configurable read-ahead in CSV and JSON readers > - > > Key: ARROW-12629 > URL: https://issues.apache.org/jira/browse/ARROW-12629 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Andre Kohn >Assignee: Supun Kamburugamuva >Priority: Major > Labels: good-first-issue > > We are compiling Arrow C++ to WebAssembly and ran into the following issue > with the CSV reader: > Browsers became very picky about the use of SharedArrayBuffers after the > events around Spectre and Meltdown. > As a result, you have to compile Arrow to WebAssembly without threads if you > don't want to run your website with very strict cross-origin isolation. > Unfortunately, the CSV reader seems to always spawn a thread for the > read-ahead in both, the SerialStreamingReader and the SerialTableReader > independent of whether use_threads is set. > Right now, this effectively means that you cannot use the CSV (and JSON) > readers in threadless WebAssembly builds. > > [https://github.com/apache/arrow/blob/4363fefe46dc357a9013f0f4bcdc235e1e2e8124/cpp/src/arrow/csv/reader.cc#L839] > [https://github.com/apache/arrow/blob/4363fefe46dc357a9013f0f4bcdc235e1e2e8124/cpp/src/arrow/csv/reader.cc#L913] > > -- This message was sent by Atlassian Jira (v8.20.1#820001)