[jira] [Created] (ARROW-2595) [Plasma] operator[] creates entries in map

2018-05-16 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-2595: - Summary: [Plasma] operator[] creates entries in map Key: ARROW-2595 URL: https://issues.apache.org/jira/browse/ARROW-2595 Project: Apache Arrow Issue

Re: PyArrow & Python Multiprocessing

2018-05-16 Thread Robert Nishihara
You're welcome! On Wed, May 16, 2018 at 6:13 PM Corey Nolet wrote: > I must say, I’m super excited about using Arrow and Plasma. > > The code you just posted worked for me at home and I’m sure I’ll figure > out what I was doing wrong tomorrow at work. > > Anyways, thanks so

Re: PyArrow & Python Multiprocessing

2018-05-16 Thread Corey Nolet
I must say, I’m super excited about using Arrow and Plasma. The code you just posted worked for me at home and I’m sure I’ll figure out what I was doing wrong tomorrow at work. Anyways, thanks so much for your help and fast replies! Sent from my iPhone > On May 16, 2018, at 7:42 PM, Robert

Re: PyArrow & Python Multiprocessing

2018-05-16 Thread Robert Nishihara
You should be able to do something like the following. # Start the store. plasma_store -s /tmp/store -m 10 Then in Python, do the following: import pandas as pd import pyarrow.plasma as plasma import numpy as np client = plasma.connect('/tmp/store', '', 0) series =

Re: PyArrow & Python Multiprocessing

2018-05-16 Thread Corey Nolet
Robert, Thank you for the quick response. I've been playing around for a few hours to get a feel for how this works. If I understand correctly, it's better to have the Plasma client objects instantiated within each separate process? Weird things seemed to happen when I attempted to share a

[jira] [Created] (ARROW-2593) [Python] TypeError: data type "mixed-integer" not understood

2018-05-16 Thread Dima Ryazanov (JIRA)
Dima Ryazanov created ARROW-2593: Summary: [Python] TypeError: data type "mixed-integer" not understood Key: ARROW-2593 URL: https://issues.apache.org/jira/browse/ARROW-2593 Project: Apache Arrow

[jira] [Created] (ARROW-2592) [Python] AssertionError in to_pandas()

2018-05-16 Thread Dima Ryazanov (JIRA)
Dima Ryazanov created ARROW-2592: Summary: [Python] AssertionError in to_pandas() Key: ARROW-2592 URL: https://issues.apache.org/jira/browse/ARROW-2592 Project: Apache Arrow Issue Type: Bug

Re: PyArrow & Python Multiprocessing

2018-05-16 Thread Robert Nishihara
Take a look at the Plasma object store https://arrow.apache.org/docs/python/plasma.html. Here's an example using it (along with multiprocessing to sort a pandas dataframe) https://github.com/apache/arrow/blob/master/python/examples/plasma/sorting/sort_df.py. It's possible the example is a bit out

Re: Arrow sync at 12pm EDT today

2018-05-16 Thread Phillip Cloud
Meeting notes from the call: Attendees/Topics to discuss - Wes - Packaging - Uwe - Packaging - Simba - Li Two Sigma - Ethan Two Sigma - Josh Two Sigma - Exceptions vs status codes - Class design question

PyArrow & Python Multiprocessing

2018-05-16 Thread Corey Nolet
I've been reading through the PyArrow documentation and trying to understand how to use the tool effectively for IPC (using zero-copy). I'm on a system with 586 cores & 1TB of ram. I'm using Panda's Dataframes to process several 10's of gigs of data in memory and the pickling that is done by

[jira] [Created] (ARROW-2591) [Python] Segmentationfault issue in pq.write_table

2018-05-16 Thread jacques (JIRA)
jacques created ARROW-2591: -- Summary: [Python] Segmentationfault issue in pq.write_table Key: ARROW-2591 URL: https://issues.apache.org/jira/browse/ARROW-2591 Project: Apache Arrow Issue Type: Bug

Re: Arrow sync at 12pm EDT today

2018-05-16 Thread Wes McKinney
Here's a new Hangout: https://hangouts.google.com/call/RN8qAVjTdPwXmGZmMx7zAAEE. Let's talk there On Thu, May 17, 2018 at 1:07 AM, Krisztián Szűcs wrote: > Same > > On May 16 2018, at 6:06 pm, Uwe L. Korn wrote: >> >> On my side I'm waiting to

Re: Arrow sync at 12pm EDT today

2018-05-16 Thread Krisztián Szűcs
Same On May 16 2018, at 6:06 pm, Uwe L. Korn wrote: > > On my side I'm waiting to someone to let me in... > On Wed, May 16, 2018, at 6:05 PM, Wes McKinney wrote: > > Google Meet says the meeting is full > > > > On Wed, May 16, 2018, 11:25 AM Alex Hagerman

Re: Arrow sync at 12pm EDT today

2018-05-16 Thread Aneesh Karve
I'm stuck at "Asking to Join." Will move to better WiFi and try again. ᐧ On Wed, May 16, 2018 at 8:25 AM, Alex Hagerman wrote: > Aneesh and I had some good conversations during the sprint at PyCon. Not > sure if he will be on the call today to share, but I won’t be able

Re: Arrow sync at 12pm EDT today

2018-05-16 Thread Uwe L. Korn
On my side I'm waiting to someone to let me in... On Wed, May 16, 2018, at 6:05 PM, Wes McKinney wrote: > Google Meet says the meeting is full > > On Wed, May 16, 2018, 11:25 AM Alex Hagerman wrote: > > > Aneesh and I had some good conversations during the sprint at

Re: Arrow sync at 12pm EDT today

2018-05-16 Thread Li Jin
I can't seem to get into the room. Anyone else in? On Wed, May 16, 2018 at 12:05 PM, Wes McKinney wrote: > Google Meet says the meeting is full > > On Wed, May 16, 2018, 11:25 AM Alex Hagerman > wrote: > > > Aneesh and I had some good conversations

Re: Arrow sync at 12pm EDT today

2018-05-16 Thread Wes McKinney
Google Meet says the meeting is full On Wed, May 16, 2018, 11:25 AM Alex Hagerman wrote: > Aneesh and I had some good conversations during the sprint at PyCon. Not > sure if he will be on the call today to share, but I won’t be able to make > it until the next call. > >

RE: Arrow sync at 12pm EDT today

2018-05-16 Thread Alex Hagerman
Aneesh and I had some good conversations during the sprint at PyCon. Not sure if he will be on the call today to share, but I won’t be able to make it until the next call. Alex From: Wes McKinney Sent: Wednesday, May 16, 2018 11:00 AM To: dev@arrow.apache.org Subject: Arrow sync at 12pm EDT

Arrow sync at 12pm EDT today

2018-05-16 Thread Wes McKinney
See you at https://meet.google.com/vtm-teks-phx

[jira] [Created] (ARROW-2590) Pyspark python_udf serialization error on grouped map (Amazon EMR)

2018-05-16 Thread Daniel Fithian (JIRA)
Daniel Fithian created ARROW-2590: - Summary: Pyspark python_udf serialization error on grouped map (Amazon EMR) Key: ARROW-2590 URL: https://issues.apache.org/jira/browse/ARROW-2590 Project: Apache

[jira] [Created] (ARROW-2589) [Python] test_parquet.py regression with Pandas 0.23.0

2018-05-16 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2589: - Summary: [Python] test_parquet.py regression with Pandas 0.23.0 Key: ARROW-2589 URL: https://issues.apache.org/jira/browse/ARROW-2589 Project: Apache Arrow

[jira] [Created] (ARROW-2588) [Plasma] Random unique ids always use the same seed

2018-05-16 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2588: - Summary: [Plasma] Random unique ids always use the same seed Key: ARROW-2588 URL: https://issues.apache.org/jira/browse/ARROW-2588 Project: Apache Arrow

[jira] [Created] (ARROW-2587) [Python] can read StructArrays from parquet but unable to write them

2018-05-16 Thread jacques (JIRA)
jacques created ARROW-2587: -- Summary: [Python] can read StructArrays from parquet but unable to write them Key: ARROW-2587 URL: https://issues.apache.org/jira/browse/ARROW-2587 Project: Apache Arrow