> I'll make sure that in the new design: > > 1) error rate and max jobs/day are maintained for each > (host, app version), rather than for the host as a whole This will not help with app_info where app version can be setted arbitrary. Again, look at my scenario. There are 2 devices in host. Same host, same app, same CUDA version, same driver But one works OK constantly - the other - not (but it not just broken - that's the difficulty of situation). You will end with maintaning separate queue for each device on host and completely overloaded project servers that will handle tons of actually unneeded info.
> 2) max jobs/day is maintained more conservatively, so that > if 1 out of N GPUs is returning bad results, > the host will get few GPU jobs (e.g. 1 per day) Then you have good chances to decrease overall performance with new scenario (even going to adaprive replication) instead of increasing it, but it will not solve fundamental problem of adaptive replication: each individual returned result will have MUCH, really MUCH lower confidence level regarding to redundancy of 2 approach. If project can rely only on set of results as whole, maybe it's not so bad. But if each separate result needed (and I think for SETI it's the case, cause even persistance check that uses set of results will probably fail on single incorrect result until we have each point of sky observed truly _many_ (tens times maybe? ) times), such decrease in confidence level will ruin trust to project results. IMO we dont need many but untrusted results, they will have no value at all... > > -- David > > Raistmer wrote: >>> Consider a possible scenario where the children are allowed to use the >>> host >>> for gaming when they have finished their homework, and the games leave >>> the >>> GPU in a bad state. Such a host could transition from reliable to >>> unreliable >>> daily, and hundreds of corrupted results could be assimilated each time. >>> If >>> the host were turned off at bedtime, it would be in reliable condition >>> when >>> turned on the next day. >>> >>> The daily quota is no protection for scenarios like that if the host is >>> also >>> doing CPU work for the same project. All it takes is one good CPU result >>> for >>> each 49 bad GPU results to keep a daily quota of 100 at max. >>> -- >>> Joe >> >> Just the same I've seen on one of my PCs with dual GPU. >> 9400GT go mad time to time (maybe system overheat, maybe some another >> reason) and starts to produce legal but bad results. >> And this case even worse (in regards to quota) than described above cause >> the same host has another fast (relative) GPU - 9600GSO - that continue >> to produce correct results. >> It + CPU surely can keep quota far from zero... and even possible >> discrimination between CPU and GPU quotas will not help in this case... >> _______________________________________________ >> boinc_dev mailing list >> boinc_dev@ssl.berkeley.edu >> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev >> To unsubscribe, visit the above URL and >> (near bottom of page) enter your email address. > _______________________________________________ boinc_dev mailing list boinc_dev@ssl.berkeley.edu http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.