Re: job race!

Ozgur Akgun Thu, 25 Apr 2013 07:38:19 -0700

Hi,

On 25 April 2013 12:19, Ole Tange <[email protected]> wrote:

> On Wed, Apr 24, 2013 at 9:25 PM, Ozgur Akgun <[email protected]> wrote:
>
> > I want to be able to say, something like `parallel --timeout (fastest *
> 2)`
> > and let get the same output.
>
> I have been pondering if I could somehow make a '--timeout 5%'. It should:
>
> 1. Run the first 3 jobs to completion (no --timeout)
> 2. Compute the average and standard deviation for all completed jobs
> 3. Adjust --timeout based on the new average, standard deviation and user
> input
> 4. Go to 2 until all jobs are finished
>

I like this story. Actually I was thinking of using median (or average + X
standard deviations) as the winning criteria instead of proximity to the
winner. However, using a multiple of the fastest runtime has one great
property: it is very easy to calculate it.

Also, I limited my original question to the case where number of jobs =
number of jobslots, but your algorithm above doesn't have such a
limitation. Accomplishing this would be great, however, the order in which
the jobs are started has a huge impact on the total runtime which disturbes
me a bit.

Going back to my original question, and extending it to the case where
number of jobs > number of jobslots, what would you think about something
like the following?

I keep the existing --timeout unchanged, and add a new option
--dynamic-timeout. This new option takes a percentage as you say.

I guess the following could work as an implementation. Given '-jX
--dynamic-timeout 200% --timeout 3600'
1. Set current_timeout = timeout
2. Run the first X jobs.
2. After every job that doesn't timeout, update current_timeout if needed.
[1,2]
3. Run new jobs as older jobs finish.

[1] This is probably obvious, but "if needed" is basically:
`candidate_timeout = job_time * percentage ; if (candidate_timeout <
current_timeout) { current_timeout = candidate_timeout }` where job_time is
the time taken by the job at hand.
[2] Updating current_timeout will need to also update any timer that is
attached to existing jobs. This might be tricky to implement. But consider
an extreme case where job 1 takes 100 seconds to complete, and another job
down the line takes only 5 seconds to complete. At this point the
implementation should be clever enough to kill job 1.

If this doesn't sound too insane, I am happy to have a go at it / help
anyone who wants to. I'll need some pointers as to where it should be done
though.

- Ozgur.

Re: job race!

Reply via email to