Now that we have reduced the amount of work available on computers, a
problem I brought up earlier is substantially more likely to occur on
computers that spend much of the time disconnected.
If the computer has a mix of projects with different deadlines and it has
multiple CPUs, it is now very likely to run into the situation where at the
end of the disconnected period it only has one task left to work on, but at
the beginning of the disconnected period it had enough tasks to get through
the disconnected period. The situation arises if there is one task with
enough work to completely fill a single CPU for the disconnected period and
the rest of the time is filled with shorter tasks to fill the remainder of
the CPUs.
Example: 2 CPUs, disconnected period = 20 hours (actual disconnected
period, and specified disconnected period).
Project CPDN Task 1, time remaining 200 hours.
WORK FETCH returns the following
SETI Task 2, 1 hour.
SETI Task 3, 1 hour.
SETI Task 4, 1 hour.
SETI Task 5, 1 hour.
SETI Task 6, 1 hour.
SETI Task 7, 1 hour.
SETI Task 8, 1 hour.
SETI Task 9, 1 hour.
SETI Task 10, 1 hour.
SETI Task 11, 1 hour.
SETI Task 12, 1 hour.
SETI Task 13, 1 hour.
SETI Task 14, 1 hour.
SETI Task 15, 1 hour.
SETI Task 16, 1 hour.
SETI Task 17, 1 hour.
SETI Task 18, 1 hour.
SETI Task 19, 1 hour.
SETI Task 20, 1 hour.
SETI Task 21, 1 hour.
Now since CPDN has been running by itself for a bit, and some other project
got the 20 hours yesterday, SETI gets both CPUs for the first 10 hours.
This leaves CPDN getting no time for the first 10 hours, and then getting
10 hours alone while leaving a CPU completely idle. I am watching
something similar happen on a couple of my machines now.
There are two possible solutions, both of which involve checking how much
time is covered by all CPUs if the selection is least remaining time first.
1) Fetch more work if the computer will have idle CPUs if the tasks are
run in least remaining time first order.
2) Add a CPU scheduling test that kicks in when the computer will have
idle CPUs if the tasks are run in least remaining time first order that
runs the tasks in longest remaining time first order. (NOTE that high
priority would take precedence here - as always).
I would tend to favor #1.
In either case, the code would have code that did a simple packing of the
tasks ordered by least remaining CPU time. For purposes of the simulation,
having a multi CPU task start all at the same time would not be required as
the client is only interested in totals for CPUs. Yes, starting the multi
CPU task would stop N-1 tasks at this point. The assumption is that those
tasks would then restart after the multi CPU task is completed.
Pseudo code for #1:
Sort the tasks by remaining estimated CPU time.
For each task
Select the lowest N CPU slots in the array
Add the CPU time remaining / CPU to each of the selected CPU slots
For each CPU slot
If the time in the CPU slot < the disconnected period
Add (the disconnected period - time in CPU slot) to work fetch time
If work fetch time > 0, attempt to fetch this amount of work.
Note that this only needs to run for resources where there is more than one
instance of the resource, and it would work for GPUs as well as CPUs.
However, more people have multiple CPUs than have multiple GPUs (although
this is changing with the increase in multiple monitor setups).
jm7
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.