[jira] [Created] (ARROW-1861) [Python] Fix up ASV setup, add developer instructions for writing new benchmarks and running benchmark suite locally

2017-11-26 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1861:
---

 Summary: [Python] Fix up ASV setup, add developer instructions for 
writing new benchmarks and running benchmark suite locally
 Key: ARROW-1861
 URL: https://issues.apache.org/jira/browse/ARROW-1861
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Wes McKinney


We need to start writing more microbenchmarks as we go to prevent unintentional 
performance regressions (this has been a constant thorn in my side for years: 
http://wesmckinney.com/blog/introducing-vbench-new-code-performance-analysis-and-monitoring-tool/).
 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: General questions about Arrow & Plasma

2017-11-26 Thread Wes McKinney
hi Matthias,

You need to use the IPC writer tools to write a Table or RecordBatch
to some memory region:

http://arrow.apache.org/docs/cpp/namespacearrow_1_1ipc.html

There is a function for computing the size of a record batch, but not
a Table yet, but you can look at its implementation to get an idea
what to do:

http://arrow.apache.org/docs/cpp/namespacearrow_1_1ipc.html#af3e2c42f9315d51ee531d46506790291

It would be useful to have the ability to "stage" a message stream to
be written to prevent multiple passes over the data structure (once to
size, then again to write). I opened
https://issues.apache.org/jira/browse/ARROW-1860

- Wes

On Sun, Nov 26, 2017 at 2:43 AM, Matthias Vallentin
 wrote:
>> Here are some more examples on how to interact between Plasma and Arrow:
>> http://arrow.apache.org/docs/python/plasma.html, see also the C++
>> documentation: http://arrow.apache.org/docs/cpp/md_tutorials_plasma.html
>
>
> I'm browsing through the C++ API documentation and have trouble finding
> the right API to copy an arrow::Table into a Plasma object buffer.
> Concretely:
>
>(1) How do I get the size of the underlying buffer of an
>arrow::Table needed to construct a Plasma object?
>
>(2) Thereafter, how do I obtain a pointer to the table data so that
>I can copy the table into the Plasma object buffer?
>
> I also tried to look at the RecordBatch API, but couldn't find anything
> there either. Both APIs seem only to provide columnar access.
>
>Matthias


[jira] [Created] (ARROW-1860) [C++] Add data structure to "stage" a sequence of IPC messages from in-memory data

2017-11-26 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1860:
---

 Summary: [C++] Add data structure to "stage" a sequence of IPC 
messages from in-memory data
 Key: ARROW-1860
 URL: https://issues.apache.org/jira/browse/ARROW-1860
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Wes McKinney


Currently, when you need to pre-allocate space for a record batch or a stream 
(schema + dictionaries + record batches), you must make multiple passes over 
the data structures of interest (and use e.g. {{MockOutputStream}} to compute 
the size of the output buffer). It would be useful to make a single pass to 
"prepare" the IPC payload for both sizing and writing to prevent having to make 
multiple passes



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1859) [GLib] Add GArrowDictionaryDataType

2017-11-26 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-1859:
---

 Summary: [GLib] Add GArrowDictionaryDataType
 Key: ARROW-1859
 URL: https://issues.apache.org/jira/browse/ARROW-1859
 Project: Apache Arrow
  Issue Type: New Feature
  Components: GLib
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou
 Fix For: 0.8.0






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)