Re: [boinc_dev] BOINC's Quota system needs change

Raistmer Sun, 28 Mar 2010 13:21:47 -0700

Unfortunately, binding quota to device type (instead of device instance) 
will not solve current issues with multy-GPU hosts.
Such hosts (or hosts with multy-core GPU) can do correct computations on one 
GPU (GPU core) and incorrect (for example, constantly throwing -9 overflow 
in SETI project) ones on another.
IMHO no need to implement full-scale scheduling algorithms (I suppose this 
thing you called modeling) per-device basis.
All that would be needed is just additional field in structure that 
describes device.
When work assigned to device BOINC knows to what particular device it 
assigns particular task. Then it could check (client, not server) outcome of 
this particular result (was computational error or not) and update 
corresponding field in structure for particular device.
Sure, it can't catch invalid results, invalid status will be known only 
after validation, i.e. server should be involved.
But such simplified mechanism could check computational (in particular, 
SETI's -9 overflow or CUDA-specific -1, not implemented) errors.
Unfortunately, there are complications, overflow can be thrown for 
completely valid result too, but here rate of such errors could play some 
role...
As bigger extention, BOINC client could attach additional field with device 
ID when reporting result to server.
On next request server could tell client updated good/bad ratio for each 
device ID. Devices with poor good/bad ratios could be disabled for some 
period of time (smth like device-wide backoff in computations). Here 
server-side changes required, but again, no need to do full-scale scheduling 
on per-device basis. Actually, scheduling should not be touched at all. 
BOINC client could just disable/enable corresponding devices according to 
device good/bad ratio (this would just decrease number of devices available 
for scheduling, AFAIK BOINC currently should deal with same situation. For 
example, number of available devices changes when user starts "no-GPU" app).


----- Original Message ----- 
From: "David Anderson" <da...@ssl.berkeley.edu>
To: "Raistmer" <raist...@mail.ru>
Cc: <boinc_dev@ssl.berkeley.edu>
Sent: Sunday, March 28, 2010 11:23 PM
Subject: Re: [boinc_dev] BOINC's Quota system needs change


> The new system (see updated doc:
> http://boinc.berkeley.edu/trac/wiki/CreditNew)
> will have separate quotas and error rates per resource type
> (CPU, NVIDIA, ATI).
>
> Maintaining these separately for each GPU would require
> modeling multiple GPUs separately,
> rather than as N instances of the same thing as is currently done.
> This would be a sweeping change, and won't get done in the near term.
>
> -- David
>
> Raistmer wrote:
>> If hosts' task quota computed in old way, host that does valid CPU 
>> computations but invalid GPU ones will pollute database and waste project 
>> resource indefinitely.
>> GPU usually much faster than CPU so many invalid tasks can be returned 
>> per single valid one.
>> Moreover, even if CPU/GPU quota separation will be introducted, there are 
>> still multi GPU hosts that can pollute database with even bigger rate 
>> doing correct computations on one GPU and invalid ones on anothers.
>> Current quota system applicable only to single host-single device 
>> approach and apparently should be changed.
>> Right now I have no good idea what replacement can be, but this question 
>> definitely deserves consideration.
>>
>> One possible solution could be to track good/bad results ratio per 
>> hardvare device (not per host) and inhibit work fetch for whole host if 
>> one of its devices has too bad good/bad ratio. Or issue some instruction 
>> to BOINC client to block affected device from reciving work (it could be 
>> more graceful approach).
>> More ideas?
>>
>> _______________________________________________
>> boinc_dev mailing list
>> boinc_dev@ssl.berkeley.edu
>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>> To unsubscribe, visit the above URL and
>> (near bottom of page) enter your email address.
> 

_______________________________________________
boinc_dev mailing list
boinc_dev@ssl.berkeley.edu
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Re: [boinc_dev] BOINC's Quota system needs change

Reply via email to