[jira] [Created] (ARROW-2828) [JS] Refactor Vector Data classes

2018-07-10 Thread Paul Taylor (JIRA)
Paul Taylor created ARROW-2828: -- Summary: [JS] Refactor Vector Data classes Key: ARROW-2828 URL: https://issues.apache.org/jira/browse/ARROW-2828 Project: Apache Arrow Issue Type: Task

[jira] [Created] (ARROW-2827) [C++] LZ4 and Zstd build may be failed in parallel build

2018-07-10 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-2827: --- Summary: [C++] LZ4 and Zstd build may be failed in parallel build Key: ARROW-2827 URL: https://issues.apache.org/jira/browse/ARROW-2827 Project: Apache Arrow

Re: Map Type Metadata Representation

2018-07-10 Thread Wes McKinney
hi Bryan, Thanks for bringing this up again. I will reply in some more detail, but to help could you create a major section in https://cwiki.apache.org/confluence/display/ARROW/Columnar+Format+1.0+Milestone and include these details? We are falling significantly short of hardening a v1.0

Map Type Metadata Representation

2018-07-10 Thread Bryan Cutler
Hello All, I would like to start moving forward with Map type support and begin working on implementations. I believe we just need to define the specifics of the metadata representation before getting started. Previously, there was a thread [1] that discussed adding Map as a logical type and I'll

Re: Merging of parquet file schemas

2018-07-10 Thread Wes McKinney
hi Dan, Not yet -- the relevant JIRA is https://issues.apache.org/jira/browse/ARROW-843. We would appreciate some help with this Thanks On Tue, Jul 10, 2018 at 10:54 AM, Dan Amner wrote: > Hi, > > I am attempting to read a number of smaller parquet files and merge them into > a larger parquet

Re: Pyarrow Plasma client.release() fault

2018-07-10 Thread Corey Nolet
Update: I'm investigating the possibility that I've reached the overcommit limit in the kernel as a result of all the parallel processes. This still doesn't fix the client.release() problem but it might explain why the processing appears to halt, after some time, until I restart the Jupyter

[jira] [Created] (ARROW-2826) [C++] Clarification needed between ArrayBuilder::Init(), Resize() and Reserve()

2018-07-10 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2826: - Summary: [C++] Clarification needed between ArrayBuilder::Init(), Resize() and Reserve() Key: ARROW-2826 URL: https://issues.apache.org/jira/browse/ARROW-2826

Re: Pyarrow Plasma client.release() fault

2018-07-10 Thread Corey Nolet
Wes, Unfortunately, my code is on a separate network. I'll try to explain what I'm doing and if you need further detail, I can certainly pseudocode specifics. I am using multiprocessing.Pool() to fire up a bunch of threads for different filenames. In each thread, I'm performing a pd.read_csv(),

Re: Pyarrow Plasma client.release() fault

2018-07-10 Thread Wes McKinney
hi Corey, Can you provide the code (or a simplified version thereof) that shows how you're using Plasma? - Wes On Tue, Jul 10, 2018 at 11:45 AM, Corey Nolet wrote: > I'm on a system with 12TB of memory and attempting to use Pyarrow's Plasma > client to convert a series of CSV files (via

Pyarrow Plasma client.release() fault

2018-07-10 Thread Corey Nolet
I'm on a system with 12TB of memory and attempting to use Pyarrow's Plasma client to convert a series of CSV files (via Pandas) into a Parquet store. I've got a little over 20k CSV files to process which are about 1-2gb each. I'm loading 500 to 1000 files at a time. In each iteration, I'm

Re: Arrow stickers

2018-07-10 Thread Kelly Stirman
I updated the images in the doc to include Apache. Have a look. On Tue, Jul 10, 2018 at 7:59 AM, Julian Hyde wrote: > Thanks for driving this. > > Can you put the word “apache” in there (in smaller font if you like). That > way, if you have the logo on slide 1 of your presentation, you’ve

Re: Arrow stickers

2018-07-10 Thread Wes McKinney
A designer I work with made some quick mock ups based on ASF colors: https://www.dropbox.com/sh/oqdbyndl5ik9rrc/AAA-4d_wJyU_267SmPHShuRfa?dl=0 The words "Apache Arrow" would need to get put on the hexagons We could outsource this work to 99designs also as another possibility; it wouldn't cost a

Re: [DRAFT] Arrow Board Report

2018-07-10 Thread Jacques Nadeau
Looks good to me. Will post. Thanks for pulling together Wes! On Mon, Jul 9, 2018 at 11:41 AM, Uwe L. Korn wrote: > +1, this looks good. > > Thanks! > > On Mon, Jul 9, 2018, at 8:18 PM, Wes McKinney wrote: > > Thanks, here is an updated draft. Any other changes? > > > > ## Description: > > > >

Re: Arrow stickers

2018-07-10 Thread Julian Hyde
Thanks for driving this. Can you put the word “apache” in there (in smaller font if you like). That way, if you have the logo on slide 1 of your presentation, you’ve already done your duty to mention the Apache brand. Julian > On Jul 9, 2018, at 19:07, Kelly Stirman wrote: > > Hi everyone!

Merging of parquet file schemas

2018-07-10 Thread Dan Amner
Hi, I am attempting to read a number of smaller parquet files and merge them into a larger parquet file. The files are created by Spark jobs that run periodically throughout the day. The issue I have is that the small parquet files can have slightly different schemas and when I create the

[jira] [Created] (ARROW-2825) [C++] Need AllocateBuffer / AllocateResizableBuffer variant with default memory pool

2018-07-10 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2825: - Summary: [C++] Need AllocateBuffer / AllocateResizableBuffer variant with default memory pool Key: ARROW-2825 URL: https://issues.apache.org/jira/browse/ARROW-2825

[jira] [Created] (ARROW-2824) [GLib] Add garrow_decimal128_array_get_value()

2018-07-10 Thread yosuke shiro (JIRA)
yosuke shiro created ARROW-2824: --- Summary: [GLib] Add garrow_decimal128_array_get_value() Key: ARROW-2824 URL: https://issues.apache.org/jira/browse/ARROW-2824 Project: Apache Arrow Issue