I'll look into that one and come up with a fix..

On Thu, Jun 4, 2015 at 7:31 AM, Teemu Mannermaa <[email protected]> wrote:

> Hello,
>
> I've been tracking down a strange scheduler problem where it fails to
> find a valid app version for a host at random times only to succeed the
> next round. Obviously there's a valid app version for the host.
>
> I added some additional logging (attached as it helps understand all
> failure conditions of version select) and found out that the if at
> https://github.com/BOINC/boinc/blob/master/sched/sched_version.cpp#L845
> seems to be one that fails even though there's no BAVP yet. Somehow the
> r ends up negative here as shown in the log:
>   "[version] Not selected, AV#36 r*45.66 GFLOP <= Best AV 0.00 GFLOP
> (r=-1.391884, n=1)"
> and so never select the app version even if that's the only one
> otherwise valid for the host. :(
>
> What this relates is the <version_select_random_factor> option as
> explained in
> https://boinc.berkeley.edu/trac/wiki/ProjectOptions#Appversionselection.
> We had no value for that before so the default at
> https://github.com/BOINC/boinc/blob/master/sched/sched_config.cpp#L94
> gets used (1.0 that already seems to contradict the documented default
> of 0.1). Now I'm not sure what range rand_normal() at
> https://github.com/BOINC/boinc/blob/master/lib/util.cpp#L571 can return
> values but my guess is they can be negative (and less than -1).
>
> I'm not sure how to fix this as I don't quite understand the logic
> behind this code. I've worked around it for now by setting
> <version_select_random_factor> explicitly to 0.1 and now there have been
> no negative r incidents. Might as well disable it as the estimation
> discrepancies (especially between a new OpenCL app and an old
> CUDA/Stream app) can be in the order 10 or 100, not mere 1.
> --
> Teemu Mannermaa
> System Specialist
>
> "Anything is possible but probabilities vary."
>
>
> _______________________________________________
> boinc_projects mailing list
> [email protected]
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_projects
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.
>
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to