Yes, it does happen with 6.12.  I believe what is happening is that a high
priority task is first in priority, and a multi CPU task that takes all
CPUs on the device is second in priority.  There are several possible
fixes:

1)  Have the function that orders the tasks by priority order all of the
tasks by sorting by the criteria needed to order the tasks.  The function
that picks the tasks to run would then skip over any tasks that use more
resources than available.  This would also fix RAM over usage and other
resource allocation issues that are not just the count.  This option also
allows tasks to change what resources they need on the fly if we decide to
do it (this should not be that hard to implement).

2)  Have the function that orders the tasks by priority skip over tasks
that use more resources than are available.  Possibly slightly easier to
implement in the short term, but probably less useful overall.

jm7


                                                                           
             David Anderson                                                
             <da...@ssl.berkel                                             
             ey.edu>                                                    To 
             Sent by:                  boinc_dev@ssl.berkeley.edu          
             <boinc_dev-bounce                                          cc 
             s...@ssl.berkeley.ed                                             
             u>                                                    Subject 
                                       Re: [boinc_dev] Initial scheduling  
                                       checkin                             
             11/01/2010 06:05                                              
             PM                                                            
                                                                           
                                                                           
                                                                           
                                                                           




I checked in a fix for 1).

Does 2) happen with 6.12?
If not, let's just wait for 6.12.

-- David

On 01-Nov-2010 7:18 AM, Richard Haselgrove wrote:
> I agree with John - this is a major change, and will need extensive
testing.
>
> David, may I ask what your current expectation is for the timeline for
the
> new scheduler? Specifically, are you going to attempt to incorporate it
into
> v6.12, or would it be better to get all the 'notices' angst out of the
way
> via a public release (and debug if necessary after BETA testing byt the
> public at large) first, and then we can concentrate all resources on
> functionality?
>
> I'm concerned that there seem to be a couple of recently-reported issues
> which might slip through the cracks.
>
> 1) In v6.12.4, the thrashing of GPU tasks into and out of GPU memory,
> because there seems to be no 'Task Switch Interval' inhibition on the new
> GPU scheduling by debt.
>
> 2) In v6.10.58, the idle CPUs apparently caused by the scheduler
incorrectly
> handling the triple mixture of High Priority CPU / Multithread / ordinary
> priority single CPU tasks.
> (from http://boinc.berkeley.edu/dev/forum_thread.php?id=6138)
>
> If the new scheduler is going to be put into v6.12 (which will inevitably
> delay that release a bit), could those fixes (when ready) be backported
into
> v6.10, please?
>
> Or if the new scheduler has to wait for v6.14, perhaps we should
concentrate
> on getting v6.12 finished first?
>
>
>
>> This needs to be tested thoroughly with some very long term simulations
>> involving several years of simulation time.  Anything that involves the
>> concept of recent for work fetch will break resource share over the long
>> term when used in conjunction with CPDN.
>>
>> jm7
>>
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of boinc_cvs digest..."
>>
>>
>> Today's Topics:
>>
>>    1. r22608 - in trunk/boinc: . api client lib sched
>>       (boinc...@ssl.berkeley.edu)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Fri, 29 Oct 2010 16:41:35 -0700
>> From: boinc...@ssl.berkeley.edu
>> Subject: [boinc_cvs] r22608 - in trunk/boinc: . api client lib sched
>> To: boinc_...@ssl.berkeley.edu
>> Message-ID:<201010292341.o9tnfzuh013...@mail.ssl.berkeley.edu>
>> Content-Type: text/plain; charset=UTF-8
>>
>> Author: davea
>> Date: 2010-10-29 16:41:34 -0700 (Fri, 29 Oct 2010)
>> New Revision: 22608
>>
>> Modified:
>>    trunk/boinc/api/boinc_api.cpp
>>    trunk/boinc/checkin_notes
>>    trunk/boinc/client/client_types.cpp
>>    trunk/boinc/client/cpu_sched.cpp
>>    trunk/boinc/client/net_stats.cpp
>>    trunk/boinc/client/work_fetch.h
>>    trunk/boinc/lib/util.cpp
>>    trunk/boinc/lib/util.h
>>    trunk/boinc/sched/credit.cpp
>>    trunk/boinc/sched/update_stats.cpp
>> Log:
>> - client: small initial checkin for new scheduling system.
>>     Keep track of per-project recent estimated credit
>>
>>
>> Modified: trunk/boinc/api/boinc_api.cpp
>> ===================================================================
>> --- trunk/boinc/api/boinc_api.cpp          2010-10-29 18:58:26 UTC (rev
>> 22607)
>> +++ trunk/boinc/api/boinc_api.cpp          2010-10-29 23:41:34 UTC (rev
>> 22608)
>> @@ -835,9 +835,9 @@
>> #else
>>          strcpy(abspath, path);
>> #endif
>> -        argv[0] = GRAPHICS_APP_FILENAME;
>> +        argv[0] = (char*)GRAPHICS_APP_FILENAME;
>>          if (fullscreen) {
>> -            argv[1] = "--fullscreen";
>> +            argv[1] = (char*)"--fullscreen";
>>              argv[2] = 0;
>>              argc = 2;
>>          } else {
>>
>> Modified: trunk/boinc/checkin_notes
>> ===================================================================
>> --- trunk/boinc/checkin_notes        2010-10-29 18:58:26 UTC (rev 22607)
>> +++ trunk/boinc/checkin_notes        2010-10-29 23:41:34 UTC (rev 22608)
>> @@ -7660,3 +7660,20 @@
>>                          client_msgs.cpp
>>              clientgui/
>>                          NoticeListCtrl.cpp
>> +
>> +David  29 Oct 2010
>> +    - client: small initial checkin for new scheduling system.
>> +        Keep track of per-project recent estimated credit
>> +
>> +    api/
>> +        boinc_api.cpp
>> +    client/
>> +        client_types.cpp
>> +        cpu_sched.cpp
>> +        net_stats.cpp
>> +        work_fetch.h
>> +    lib/
>> +        util.cpp,h
>> +    sched/
>> +        credit.cpp
>> +        update_stats.cpp
>>
>> Modified: trunk/boinc/client/client_types.cpp
>> ===================================================================
>> --- trunk/boinc/client/client_types.cpp          2010-10-29 18:58:26 UTC
>> (rev 22607)
>> +++ trunk/boinc/client/client_types.cpp          2010-10-29 23:41:34 UTC
>> (rev 22608)
>> @@ -202,6 +202,8 @@
>>          if (parse_bool(buf, "dont_request_more_work",
>> dont_request_more_work)) continue;
>>          if (parse_bool(buf, "detach_when_done", detach_when_done))
>> continue;
>>          if (parse_bool(buf, "ended", ended)) continue;
>> +        if (parse_double(buf, "<rec>", pwf.rec)) continue;
>> +        if (parse_double(buf, "<rec_time>", pwf.rec_time)) continue;
>>          if (parse_double(buf, "<short_term_debt>",
>> cpu_pwf.short_term_debt)) continue;
>>          if (parse_double(buf, "<long_term_debt>",
cpu_pwf.long_term_debt))
>> continue;
>>          if (parse_double(buf, "<cpu_backoff_interval>",
>> cpu_pwf.backoff_interval)) continue;
>> @@ -275,6 +277,8 @@
>>          "<master_fetch_failures>%d</master_fetch_failures>\n"
>>          "<min_rpc_time>%f</min_rpc_time>\n"
>>          "<next_rpc_time>%f</next_rpc_time>\n"
>> +        "<rec>%f</rec>\n"
>> +        "<rec_time>%f</rec_time>\n"
>>          "<short_term_debt>%f</short_term_debt>\n"
>>          "<long_term_debt>%f</long_term_debt>\n"
>>          "<cpu_backoff_interval>%f</cpu_backoff_interval>\n"
>> @@ -314,6 +318,8 @@
>>          master_fetch_failures,
>>          min_rpc_time,
>>          next_rpc_time,
>> +        pwf.rec,
>> +        pwf.rec_time,
>>          cpu_pwf.short_term_debt,
>>          cpu_pwf.long_term_debt, cpu_pwf.backoff_interval,
>> cpu_pwf.backoff_time,
>>          cuda_pwf.short_term_debt, cuda_pwf.long_term_debt,
>>
>> Modified: trunk/boinc/client/cpu_sched.cpp
>> ===================================================================
>> --- trunk/boinc/client/cpu_sched.cpp             2010-10-29 18:58:26 UTC
>> (rev 22607)
>> +++ trunk/boinc/client/cpu_sched.cpp             2010-10-29 23:41:34 UTC
>> (rev 22608)
>> @@ -514,6 +514,33 @@
>>      debt_interval_start = now;
>> }
>>
>> +#define REC_HALF_LIFE (30*86400)
>> +
>> +// update REC (recent estimated credit)
>> +//
>> +static void update_rec() {
>> +    double f = gstate.host_info.p_fpops;
>> +
>> +    for (unsigned int i=0; i<gstate.projects.size(); i++) {
>> +        PROJECT* p = gstate.projects[i];
>> +        double x = p->cpu_pwf.secs_this_debt_interval * f;
>> +        if (gstate.host_info.have_cuda()) {
>> +            x += p->cuda_pwf.secs_this_debt_interval * f *
>> cuda_work_fetch.relative_speed;
>> +        }
>> +        if (gstate.host_info.have_ati()) {
>> +            x += p->ati_pwf.secs_this_debt_interval * f *
>> ati_work_fetch.relative_speed;
>> +        }
>> +        update_average(
>> +            gstate.now,
>> +            gstate.debt_interval_start,
>> +            x,
>> +            REC_HALF_LIFE,
>> +            p->pwf.rec,
>> +            p->pwf.rec_time
>> +        );
>> +    }
>> +}
>> +
>> // adjust project debts (short, long-term)
>> //
>> void CLIENT_STATE::adjust_debts() {
>> @@ -551,6 +578,8 @@
>>          work_fetch.accumulate_inst_sec(atp, elapsed_time);
>>      }
>>
>> +    update_rec();
>> +
>>      cpu_work_fetch.update_long_term_debts();
>>      cpu_work_fetch.update_short_term_debts();
>>      if (host_info.have_cuda()) {
>>
>> Modified: trunk/boinc/client/net_stats.cpp
>> ===================================================================
>> --- trunk/boinc/client/net_stats.cpp             2010-10-29 18:58:26 UTC
>> (rev 22607)
>> +++ trunk/boinc/client/net_stats.cpp             2010-10-29 23:41:34 UTC
>> (rev 22608)
>> @@ -71,6 +71,7 @@
>>      }
>>      double start_time = gstate.now - dt;
>>      update_average(
>> +        gstate.now,
>>          start_time,
>>          nbytes,
>>          NET_RATE_HALF_LIFE,
>>
>> Modified: trunk/boinc/client/work_fetch.h
>> ===================================================================
>> --- trunk/boinc/client/work_fetch.h        2010-10-29 18:58:26 UTC (rev
>> 22607)
>> +++ trunk/boinc/client/work_fetch.h        2010-10-29 23:41:34 UTC (rev
>> 22608)
>> @@ -237,6 +237,10 @@
>>      bool can_fetch_work;
>>      bool compute_can_fetch_work(PROJECT*);
>>      bool has_runnable_jobs;
>> +    double rec;
>> +        // recent estimated credit
>> +    double rec_time;
>> +        // when it was last updated
>>      PROJECT_WORK_FETCH() {
>>          memset(this, 0, sizeof(*this));
>>      }
>>
>> Modified: trunk/boinc/lib/util.cpp
>> ===================================================================
>> --- trunk/boinc/lib/util.cpp         2010-10-29 18:58:26 UTC (rev 22607)
>> +++ trunk/boinc/lib/util.cpp         2010-10-29 23:41:34 UTC (rev 22608)
>> @@ -234,6 +234,7 @@
>> // html/inc/credit.inc
>> //
>> void update_average(
>> +    double now,
>>      double work_start_time,       // when new work was started
>>                                      // (or zero if no new work)
>>      double work,                    // amount of new work
>> @@ -241,8 +242,6 @@
>>      double&  avg,                    // average work per day (in and
out)
>>      double&  avg_time                // when average was last computed
>> ) {
>> -    double now = dtime();
>> -
>>      if (avg_time) {
>>          // If an average R already exists, imagine that the new work
was
>> done
>>          // entirely between avg_time and now.
>>
>> Modified: trunk/boinc/lib/util.h
>> ===================================================================
>> --- trunk/boinc/lib/util.h           2010-10-29 18:58:26 UTC (rev 22607)
>> +++ trunk/boinc/lib/util.h           2010-10-29 23:41:34 UTC (rev 22608)
>> @@ -61,7 +61,7 @@
>> extern double linux_cpu_time(int pid);
>> #endif
>>
>> -extern void update_average(double, double, double, double&, double&);
>> +extern void update_average(double, double, double, double, double&,
>> double&);
>>
>> extern int boinc_calling_thread_cpu_time(double&);
>>
>>
>> Modified: trunk/boinc/sched/credit.cpp
>> ===================================================================
>> --- trunk/boinc/sched/credit.cpp           2010-10-29 18:58:26 UTC (rev
>> 22607)
>> +++ trunk/boinc/sched/credit.cpp           2010-10-29 23:41:34 UTC (rev
>> 22608)
>> @@ -55,10 +55,12 @@
>>      DB_TEAM team;
>>      int retval;
>>      char buf[256];
>> +    double now = dtime();
>>
>>      // first, process the host
>>
>>      update_average(
>> +        now,
>>          start_time, credit, CREDIT_HALF_LIFE,
>>          host.expavg_credit, host.expavg_time
>>      );
>> @@ -76,6 +78,7 @@
>>      }
>>
>>      update_average(
>> +        now,
>>          start_time, credit, CREDIT_HALF_LIFE,
>>          user.expavg_credit, user.expavg_time
>>      );
>> @@ -103,6 +106,7 @@
>>              return retval;
>>          }
>>          update_average(
>> +            now,
>>              start_time, credit, CREDIT_HALF_LIFE,
>>              team.expavg_credit, team.expavg_time
>>          );
>> @@ -799,6 +803,7 @@
>> int write_modified_app_versions(vector<DB_APP_VERSION>&  app_versions) {
>>      unsigned int i, j;
>>      int retval = 0;
>> +    double now = dtime();
>>
>>      if (config.debug_credit&&  app_versions.size()) {
>>          log_messages.printf(MSG_NORMAL,
>> @@ -827,6 +832,7 @@
>>              }
>>              for (j=0; j<av.credit_samples.size(); j++) {
>>                  update_average(
>> +                    now,
>>                      av.credit_times[j], av.credit_samples[j],
>> CREDIT_HALF_LIFE,
>>                      av.expavg_credit, av.expavg_time
>>                  );
>>
>> Modified: trunk/boinc/sched/update_stats.cpp
>> ===================================================================
>> --- trunk/boinc/sched/update_stats.cpp           2010-10-29 18:58:26 UTC
>> (rev 22607)
>> +++ trunk/boinc/sched/update_stats.cpp           2010-10-29 23:41:34 UTC
>> (rev 22608)
>> @@ -54,6 +54,7 @@
>>      DB_USER user;
>>      int retval;
>>      char buf[256];
>> +    double now = dtime();
>>
>>      while (1) {
>>          retval = user.enumerate("where expavg_credit>0.1");
>> @@ -66,7 +67,9 @@
>>          }
>>
>>          if (user.expavg_time>  update_time_cutoff) continue;
>> -        update_average(0, 0, CREDIT_HALF_LIFE, user.expavg_credit,
>> user.expavg_time);
>> +        update_average(
>> +            now, 0, 0, CREDIT_HALF_LIFE, user.expavg_credit,
>> user.expavg_time
>> +        );
>>          sprintf( buf, "expavg_credit=%f, expavg_time=%f",
>>              user.expavg_credit, user.expavg_time
>>          );
>> @@ -84,6 +87,7 @@
>>      DB_HOST host;
>>      int retval;
>>      char buf[256];
>> +    double now = dtime();
>>
>>      while (1) {
>>          retval = host.enumerate("where expavg_credit>0.1");
>> @@ -96,7 +100,9 @@
>>          }
>>
>>          if (host.expavg_time>  update_time_cutoff) continue;
>> -        update_average(0, 0, CREDIT_HALF_LIFE, host.expavg_credit,
>> host.expavg_time);
>> +        update_average(
>> +            now, 0, 0, CREDIT_HALF_LIFE, host.expavg_credit,
>> host.expavg_time
>> +        );
>>          sprintf(
>>              buf,"expavg_credit=%f, expavg_time=%f",
>>              host.expavg_credit, host.expavg_time
>> @@ -142,6 +148,7 @@
>>      DB_TEAM team;
>>      int retval;
>>      char buf[256];
>> +    double now = dtime();
>>
>>      while (1) {
>>          retval = team.enumerate("where expavg_credit>0.1");
>> @@ -163,7 +170,10 @@
>>              continue;
>>          }
>>          if (team.expavg_time<  update_time_cutoff) {
>> -            update_average(0, 0, CREDIT_HALF_LIFE, team.expavg_credit,
>> team.expavg_time);
>> +            update_average(
>> +                now, 0, 0, CREDIT_HALF_LIFE, team.expavg_credit,
>> +                team.expavg_time
>> +            );
>>          }
>>          sprintf(
>>              buf, "expavg_credit=%f, expavg_time=%f, nusers=%d",
>>
>>
>>
>> ------------------------------
>>
>> _______________________________________________
>> boinc_cvs mailing list
>> boinc_...@ssl.berkeley.edu
>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_cvs
>>
>>
>> End of boinc_cvs Digest, Vol 71, Issue 48
>> *****************************************
>>
>>
>>
>> _______________________________________________
>> boinc_dev mailing list
>> boinc_dev@ssl.berkeley.edu
>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>> To unsubscribe, visit the above URL and
>> (near bottom of page) enter your email address.
>>
>
>
> _______________________________________________
> boinc_dev mailing list
> boinc_dev@ssl.berkeley.edu
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.
_______________________________________________
boinc_dev mailing list
boinc_dev@ssl.berkeley.edu
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.



_______________________________________________
boinc_dev mailing list
boinc_dev@ssl.berkeley.edu
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to