hi Corey,

Can you provide the code (or a simplified version thereof) that shows
how you're using Plasma?

- Wes

On Tue, Jul 10, 2018 at 11:45 AM, Corey Nolet <cjno...@gmail.com> wrote:
> I'm on a system with 12TB of memory and attempting to use Pyarrow's Plasma
> client to convert a series of CSV files (via Pandas) into a Parquet store.
>
> I've got a little over 20k CSV files to process which are about 1-2gb each.
> I'm loading 500 to 1000 files at a time.
>
> In each iteration, I'm loading a series of files, partitioning them by a
> time field into separate dataframes, then writing parquet files in
> directories for each day.
>
> The problem I'm having is that the Plasma client & server appear to lock up
> after about 2-3 iterations. It locks up to the point where I can't even
> CTRL+C the server. I am able to stop the notebook and re-trying the code
> just continues to lock up when interacting with Jupyter. There are no
> errors in my logs to tell me something's wrong.
>
> Just to make sure I'm not just being impatient and possibly need to wait
> for some background services to finish, I allowed the code to run overnight
> and it was still in the same state when I came in to work this morning. I'm
> running the Plasma server with 4TB max.
>
> In an attempt to pro-actively free up some of the object ids that I no
> longer need, I also attempted to use the client.release() function but I
> cannot seem to figure out how to make this work properly. It crashes my
> Jupyter kernel each time I try.
>
> I'm using Pyarrow 0.9.0
>
> Thanks in advance.

Reply via email to