I have a Markov Chain Monte Carlo algorithm.

Unlike most MCMC algorithms each state is expensive to compute, and each 
sample can be generated independently.  Because evaluating the markov chain 
is vey cheap, this means that I should be able to be almost embarassingly 
parallel.

My current implementation is shown below.  The idea is that 
propose_parallel_tl generates a stack of boxes.  The main loop takes item 
pops items of this stack and when it becomes empty issues all the workers 
to do their job in replenishing the stack.

Unfortunately it assumes that each worker takes roughly the same amount of 
time.  Additionally the [fetch(s) for s in spawns] means that it will wake 
for first worker to finish I presume.

How can I can get around this? I effectively want all my workers to be 
working and just a single processor going through and taking new proposals 
from the workers.  Since evaluating the chain itself is quite expensive, it 
would be a waste if a physical core was dedicated entirely to this (I'm not 
sure if the operating system would sort this out for me to)

# Propose boxes in parallel
function propose_parallel_tl(f,Y,X,stack; ncores = 1, args...)
  ncores = min(ncores, nprocs())
  println("Using $ncores cores")
  if isempty(stack)
    spawns = [@spawn proposebox_tl(f,Y,X;args...) for i = 1:ncores]
    boxes = [fetch(s) for s in spawns]
    push!(stack,boxes...)
  end
  return pop!(stack)
end


function pre_tlmh{D <: Domain} (f::Callable, Y, X::D, niters; args...)
  boxes = D[]
  stack = (D,Float64,Float64)[] #For parallelism
  box, logq, prevolfrac = propose_parallel_tl(f,Y,X,stack; args...)

  push!(boxes,box)



  naccepted = 0; nsteps = 0
  while nsteps < niters - 1
    nextbox, nextlogq, prevolfrac = propose_parallel_tl(f,Y,X,stack; args
...)

    nextlogp = logmeasure(nextbox) + log(prevolfrac)


    ...


    nsteps += 1
  end
  boxes
end


Reply via email to