For reference, here is the 0.4.3 code for addprocs()
# The main function for adding worker processes. # `manager` is of type ClusterManager. The respective managers are responsible # for launching the workers. All keyword arguments (plus a few default values) # are available as a dictionary to the `launch` methods # # Only one addprocs can be in progress at any time # const worker_lock = ReentrantLock() function addprocs(manager::ClusterManager; kwargs...) lock(worker_lock) try addprocs_locked(manager::ClusterManager; kwargs...) finally unlock(worker_lock) end end guess that confirms what's going on... hah. On Monday, October 24, 2016 at 1:40:22 PM UTC-4, Ryan Gardner wrote: > > I'm trying to write code for sun grid engine (sge) although I think the > general idea applies to any addprocs. I would like to be able to request a > gazillion nodes, and start using each shortly after it becomes available. > > An example of what I want is roughly this code: > > for j=1:1000000 > @async begin > new_worker = addprocs_sge(1); #request to add one sun grid engine > process > worker_init(new_worker) #brings new worker into the work > end > end > > The problem with this code is that addprocs (and thus addprocs_sge) seems > to be something like one big critical section, so the @async is effectively > non-existent, and the procs get added serially. The big problem with this > is that it might take a day before I get even the first worker (when > addprocs_sge returns). With this code, I would wait that day and only then > would I start the request for the second worker, which might take > approximately another day, and so on. I want to get all my requests in the > queue, right from the start, so once I get the first worker, I'm also next > in line for the second, third, ... > > > > The alternative code below effectively gets the request for all workers in > the queue right from the start > > new_workers = addprocs(1000000) #request to add 1,000,000 sun grid > engine processes > worker_init(new_workers) > > but the problem with it is that I don't get any work done until all > 1,000,000 processes become available because the call to addprocs doesn't > return until it has everything (even though nodes on sge start to become > owned and blocked by me while it's trying to collect the whole million). > > Is there a way around this? (I'm using Julia 0.4.3. I would love to > upgrade, but I use a large amount of code that I don't control and isn't > going to be updated any time soon. At the same time, I'd be interested in > solutions related to other versions regardless.) Thanks. > > Ryan > >