Re: PyArrow & Python Multiprocessing

2018-05-16 Thread Robert Nishihara
You're welcome! On Wed, May 16, 2018 at 6:13 PM Corey Nolet wrote: > I must say, I’m super excited about using Arrow and Plasma. > > The code you just posted worked for me at home and I’m sure I’ll figure > out what I was doing wrong tomorrow at work. > > Anyways, thanks so

Re: PyArrow & Python Multiprocessing

2018-05-16 Thread Corey Nolet
I must say, I’m super excited about using Arrow and Plasma. The code you just posted worked for me at home and I’m sure I’ll figure out what I was doing wrong tomorrow at work. Anyways, thanks so much for your help and fast replies! Sent from my iPhone > On May 16, 2018, at 7:42 PM, Robert

Re: PyArrow & Python Multiprocessing

2018-05-16 Thread Robert Nishihara
You should be able to do something like the following. # Start the store. plasma_store -s /tmp/store -m 10 Then in Python, do the following: import pandas as pd import pyarrow.plasma as plasma import numpy as np client = plasma.connect('/tmp/store', '', 0) series =

Re: PyArrow & Python Multiprocessing

2018-05-16 Thread Corey Nolet
Robert, Thank you for the quick response. I've been playing around for a few hours to get a feel for how this works. If I understand correctly, it's better to have the Plasma client objects instantiated within each separate process? Weird things seemed to happen when I attempted to share a

Re: PyArrow & Python Multiprocessing

2018-05-16 Thread Robert Nishihara
Take a look at the Plasma object store https://arrow.apache.org/docs/python/plasma.html. Here's an example using it (along with multiprocessing to sort a pandas dataframe) https://github.com/apache/arrow/blob/master/python/examples/plasma/sorting/sort_df.py. It's possible the example is a bit out

PyArrow & Python Multiprocessing

2018-05-16 Thread Corey Nolet
I've been reading through the PyArrow documentation and trying to understand how to use the tool effectively for IPC (using zero-copy). I'm on a system with 586 cores & 1TB of ram. I'm using Panda's Dataframes to process several 10's of gigs of data in memory and the pickling that is done by