hi Corey, Can you provide the code (or a simplified version thereof) that shows how you're using Plasma?
- Wes On Tue, Jul 10, 2018 at 11:45 AM, Corey Nolet <cjno...@gmail.com> wrote: > I'm on a system with 12TB of memory and attempting to use Pyarrow's Plasma > client to convert a series of CSV files (via Pandas) into a Parquet store. > > I've got a little over 20k CSV files to process which are about 1-2gb each. > I'm loading 500 to 1000 files at a time. > > In each iteration, I'm loading a series of files, partitioning them by a > time field into separate dataframes, then writing parquet files in > directories for each day. > > The problem I'm having is that the Plasma client & server appear to lock up > after about 2-3 iterations. It locks up to the point where I can't even > CTRL+C the server. I am able to stop the notebook and re-trying the code > just continues to lock up when interacting with Jupyter. There are no > errors in my logs to tell me something's wrong. > > Just to make sure I'm not just being impatient and possibly need to wait > for some background services to finish, I allowed the code to run overnight > and it was still in the same state when I came in to work this morning. I'm > running the Plasma server with 4TB max. > > In an attempt to pro-actively free up some of the object ids that I no > longer need, I also attempted to use the client.release() function but I > cannot seem to figure out how to make this work properly. It crashes my > Jupyter kernel each time I try. > > I'm using Pyarrow 0.9.0 > > Thanks in advance.