Re: Parallel Digest, Vol 52, Issue 10

Ole Tange Tue, 26 Aug 2014 07:16:50 -0700

On Sat, Aug 16, 2014 at 6:12 PM, Mitchell Wyle <[email protected]> wrote:
> Here are some other ideas to consider for flexibly and dynamically adding /
> removing servers:
>
> Consider implementing what Hadoop calls "speculative execution," where you
> send the same job to two or more servers and the first to complete the job
> wins.


Look at --halt. Using that you can make the first "failing" win. Maybe
we should extend it to include a value for
'first-succeeding-job-wins'.

> Consider using aggressive timeouts for each job -- keep the jobs small and
> schedule very many of them to run; don't wait long for an individual one to
> be considered a failure.

Look at --timeout %. This is useful if you know your jobs take
approximately the same amount of time, but do not know how long in
seconds. So --timeout 300% will kill any job that takes 200% longer
than the median runtime.

> Consider "heart beats" of some kind where parallel on remote servers respond
> to the parallel dispatching jobs that they are available

The only way I can imagine heartbeats is by having some sort of daemon
monitoring the servers. As discussed elsewhere that could be
implemented, but needs further discussion.


/Ole

Re: Parallel Digest, Vol 52, Issue 10

Reply via email to