On 03/07/2019 18.37, Israel Brewster wrote:
> I have a script that benefits greatly from multiprocessing (it’s generating a 
> bunch of images from data). Of course, as expected each process uses a chunk 
> of memory, and the more processes there are, the more memory used. The amount 
> used per process can vary from around 3 GB (yes, gigabytes) to over 40 or 50 
> GB, depending on the amount of data being processed (usually closer to 10GB, 
> the 40/50 is fairly rare). This puts me in a position of needing to balance 
> the number of processes with memory usage, such that I maximize resource 
> utilization (running one process at a time would simply take WAY to long) 
> while not overloading RAM (which at best would slow things down due to swap). 
>
> Obviously this process will be run on a machine with lots of RAM, but as I 
> don’t know how large the datasets that will be fed to it are, I wanted to see 
> if I could build some intelligence into the program such that it doesn’t 
> overload the memory. A couple of approaches I thought of:
>
> 1) Determine the total amount of RAM in the machine (how?), assume an average 
> of 10GB per process, and only launch as many processes as calculated to fit. 
> Easy, but would run the risk of under-utilizing the processing capabilities 
> and taking longer to run if most of the processes were using significantly 
> less than 10GB
>
> 2) Somehow monitor the memory usage of the various processes, and if one 
> process needs a lot, pause the others until that one is complete. Of course, 
> I’m not sure if this is even possible.
>
> 3) Other approaches?
>

Are you familiar with Dask? <https://docs.dask.org/en/latest/>

I don't know it myself other than through hearsay, but I have a feeling
it may have a ready-to-go solution to your problem. You'd have to look
into dask in more detail than I have...


-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to