On Sat, Aug 16, 2014 at 6:12 PM, Mitchell Wyle <[email protected]> wrote: > Here are some other ideas to consider for flexibly and dynamically adding / > removing servers: > > Consider implementing what Hadoop calls "speculative execution," where you > send the same job to two or more servers and the first to complete the job > wins.
Look at --halt. Using that you can make the first "failing" win. Maybe we should extend it to include a value for 'first-succeeding-job-wins'. > Consider using aggressive timeouts for each job -- keep the jobs small and > schedule very many of them to run; don't wait long for an individual one to > be considered a failure. Look at --timeout %. This is useful if you know your jobs take approximately the same amount of time, but do not know how long in seconds. So --timeout 300% will kill any job that takes 200% longer than the median runtime. > Consider "heart beats" of some kind where parallel on remote servers respond > to the parallel dispatching jobs that they are available The only way I can imagine heartbeats is by having some sort of daemon monitoring the servers. As discussed elsewhere that could be implemented, but needs further discussion. /Ole
