[jira] [Created] (ARROW-1861) [Python] Fix up ASV setup, add developer instructions for writing new benchmarks and running benchmark suite locally
Wes McKinney created ARROW-1861: --- Summary: [Python] Fix up ASV setup, add developer instructions for writing new benchmarks and running benchmark suite locally Key: ARROW-1861 URL: https://issues.apache.org/jira/browse/ARROW-1861 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Wes McKinney We need to start writing more microbenchmarks as we go to prevent unintentional performance regressions (this has been a constant thorn in my side for years: http://wesmckinney.com/blog/introducing-vbench-new-code-performance-analysis-and-monitoring-tool/). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Re: General questions about Arrow & Plasma
hi Matthias, You need to use the IPC writer tools to write a Table or RecordBatch to some memory region: http://arrow.apache.org/docs/cpp/namespacearrow_1_1ipc.html There is a function for computing the size of a record batch, but not a Table yet, but you can look at its implementation to get an idea what to do: http://arrow.apache.org/docs/cpp/namespacearrow_1_1ipc.html#af3e2c42f9315d51ee531d46506790291 It would be useful to have the ability to "stage" a message stream to be written to prevent multiple passes over the data structure (once to size, then again to write). I opened https://issues.apache.org/jira/browse/ARROW-1860 - Wes On Sun, Nov 26, 2017 at 2:43 AM, Matthias Vallentinwrote: >> Here are some more examples on how to interact between Plasma and Arrow: >> http://arrow.apache.org/docs/python/plasma.html, see also the C++ >> documentation: http://arrow.apache.org/docs/cpp/md_tutorials_plasma.html > > > I'm browsing through the C++ API documentation and have trouble finding > the right API to copy an arrow::Table into a Plasma object buffer. > Concretely: > >(1) How do I get the size of the underlying buffer of an >arrow::Table needed to construct a Plasma object? > >(2) Thereafter, how do I obtain a pointer to the table data so that >I can copy the table into the Plasma object buffer? > > I also tried to look at the RecordBatch API, but couldn't find anything > there either. Both APIs seem only to provide columnar access. > >Matthias
[jira] [Created] (ARROW-1860) [C++] Add data structure to "stage" a sequence of IPC messages from in-memory data
Wes McKinney created ARROW-1860: --- Summary: [C++] Add data structure to "stage" a sequence of IPC messages from in-memory data Key: ARROW-1860 URL: https://issues.apache.org/jira/browse/ARROW-1860 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney Currently, when you need to pre-allocate space for a record batch or a stream (schema + dictionaries + record batches), you must make multiple passes over the data structures of interest (and use e.g. {{MockOutputStream}} to compute the size of the output buffer). It would be useful to make a single pass to "prepare" the IPC payload for both sizing and writing to prevent having to make multiple passes -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1859) [GLib] Add GArrowDictionaryDataType
Kouhei Sutou created ARROW-1859: --- Summary: [GLib] Add GArrowDictionaryDataType Key: ARROW-1859 URL: https://issues.apache.org/jira/browse/ARROW-1859 Project: Apache Arrow Issue Type: New Feature Components: GLib Reporter: Kouhei Sutou Assignee: Kouhei Sutou Fix For: 0.8.0 -- This message was sent by Atlassian JIRA (v6.4.14#64029)