For reference, here is the 0.4.3 code for addprocs()

# The main function for adding worker processes.                           
                                                             
# `manager` is of type ClusterManager. The respective managers are 
responsible                                                          
# for launching the workers. All keyword arguments (plus a few default 
values)                                                          
# are available as a dictionary to the `launch` methods                     
                                                            
#                                                                           
                                                            
# Only one addprocs can be in progress at any time                         
                                                             
#                                                                           
                                                            
const worker_lock = ReentrantLock()
function addprocs(manager::ClusterManager; kwargs...)
    lock(worker_lock)
    try
addprocs_locked(manager::ClusterManager; kwargs...)
    finally
        unlock(worker_lock)
    end
end



guess that confirms what's going on... hah.




On Monday, October 24, 2016 at 1:40:22 PM UTC-4, Ryan Gardner wrote:
>
> I'm trying to write code for sun grid engine (sge) although I think the 
> general idea applies to any addprocs.  I would like to be able to request a 
> gazillion nodes, and start using each shortly after it becomes available.
>
> An example of what I want is roughly this code:
>
>    for j=1:1000000
>       @async begin
>          new_worker = addprocs_sge(1); #request to add one sun grid engine 
> process
>          worker_init(new_worker) #brings new worker into the work
>        end
>    end
>
> The problem with this code is that addprocs (and thus addprocs_sge) seems 
> to be something like one big critical section, so the @async is effectively 
> non-existent, and the procs get added serially.  The big problem with this 
> is that it might take a day before I get even the first worker (when 
> addprocs_sge returns).  With this code, I would wait that day and only then 
> would I start the request for the second worker, which might take 
> approximately another day, and so on.  I want to get all my requests in the 
> queue, right from the start, so once I get the first worker, I'm also next 
> in line for the second, third, ...
>
>
>
> The alternative code below effectively gets the request for all workers in 
> the queue right from the start
>
>    new_workers = addprocs(1000000)  #request to add 1,000,000 sun grid 
> engine processes
>    worker_init(new_workers)
>
> but the problem with it is that I don't get any work done until all 
> 1,000,000 processes become available because the call to addprocs doesn't 
> return until it has everything (even though nodes on sge start to become 
> owned and blocked by me while it's trying to collect the whole million).
>
> Is there a way around this?  (I'm using Julia 0.4.3.  I would love to 
> upgrade, but I use a large amount of code that I don't control and isn't 
> going to be updated any time soon.  At the same time, I'd be interested in 
> solutions related to other versions regardless.)  Thanks.
>
> Ryan
>
>

Reply via email to