What's the units of estimated time (i'm assuming thats what et stands for)?
I have debug credit on but don't see anything fishy. I did fix the problem
with the variance going negative however.
Getting rid of the entries with negative variance seemed to fix the WUs getting
immediately timed out (at least on DNA@Home), but on milkyway@home it looks
like the problem is still happening on the GPU applications. On DNA@Home
credit still really seems to be jumping around quite a bit though. It also
seems to be a bit different between different versions of the application (even
though the work done is the same, with the 64 bit applications running a bit
fastetr).
Here's a few samples of what we have in the database for host_app_versions:
mysql> select pfc_n, pfc_avg, et_n, et_avg, et_var, et_q, turnaround_n,
turnaround_avg, turnaround_var, turnaround_q, consecutive_valid from
host_app_version where pfc_n > 0;
+-------+---------------------+------+----------------------+----------------------+----------------------+--------------+----------------------+----------------------+-------------------+-------------------+
| pfc_n | pfc_avg | et_n | et_avg | et_var
| et_q | turnaround_n | turnaround_avg |
turnaround_var | turnaround_q | consecutive_valid |
+-------+---------------------+------+----------------------+----------------------+----------------------+--------------+----------------------+----------------------+-------------------+-------------------+
| 1 | 0.0608558504137879 | 1 | 1.5243745290758e-11 |
0 | 0 | 1 | 152891 |
0 | 0 | 0 |
| 1 | 2.58209918342134 | 1 | 1.10946655144921e-09 |
0 | 0 | 1 | 162239 |
0 | 0 | 0 |
| 2 | 0.0427719111949127 | 2 | 9.18903602096224e-12 |
7.83036536159823e-54 | 1.56607307231965e-53 | 2 |
7866 | 0 | 0 | 0 |
| 13 | 0.0485200335062071 | 13 | 9.43560019904897e-12 |
2.41793047141943e-24 | 3.14330961284526e-23 | 13 |
6547.30769230769 | 5269785.59763313 | 68507212.7692307 | 0
|
| 4 | 0.0476863039948864 | 4 | 1.24448883455494e-11 |
3.82694981236738e-24 | 1.53077992494695e-23 | 4 |
2936.5 | 272152.25 | 1088609 | 2 |
| 491 | 0.0269815387497421 | 491 | 2.69815387497423e-12 |
1.21744253650842e-25 | 3.73559997358907e-24 | 491 |
1382.11124143616 | 58934.6673100876 | 20782800.9473685 | 85
|
| 6 | 0.0411280680726881 | 6 | 1.3830948838265e-11 |
5.77790802864275e-24 | 3.46674481718565e-23 | 6 |
15007.1666666667 | 62621528.4722222 | 375729170.833333 | 2
|
| 1 | 2.26565394039762 | 1 | 1.57044468513331e-09 |
0 | 0 | 1 | 152482 |
0 | 0 | 1 |
| 6 | 0.0463229739240934 | 6 | 1.60544527347133e-11 |
1.2619495066352e-23 | 7.57169703981121e-23 | 6 |
45190.3333333333 | 6212011902.55556 | 37272071415.3333 | 2
|
| 6 | 0.0398122344654779 | 6 | 1.3950955107839e-11 |
1.07252109051859e-24 | 6.43512654311157e-24 | 6 |
12516.3333333333 | 109456015.555556 | 656736093.333333 | 0
|
| 4 | 0.0347176130281241 | 4 | 3.4717613028124e-12 |
4.12948777229023e-28 | 1.65179510891609e-27 | 4 |
1768.25 | 3704.68750000011 | 14818.7500000005 | 4 |
| 2 | 0.0590930818482324 | 2 | 7.07885876306951e-11 |
4.4529207780773e-23 | 8.9058415561546e-23 | 2 |
62291 | 23280625 | 46561250 | 0 |
| 5 | 0.0464941332492785 | 5 | 1.13156978089522e-11 |
6.41335358333943e-24 | 3.20667679166972e-23 | 5 |
15211.8 | 30983968.56 | 154919842.8 | 0 |
| 1 | 12.2900239078247 | 1 | 1.22900239078247e-09 |
0 | 0 | 1 | 91695 |
0 | 0 | 1 |
| 1 | 1.82911811915752 | 1 | 5.89075825250098e-10 |
0 | 0 | 1 | 91844 |
0 | 0 | 0 |
| 9 | 0.0289599172359925 | 9 | 2.33205301981538e-12 |
1.92213552886199e-25 | 1.72992197597579e-24 | 9 |
9420.55555555553 | 56886749.1358023 | 511980742.222221 | 0
|
| 4 | 0.0613410885213004 | 4 | 1.74933059711131e-11 |
1.21730599530587e-23 | 4.8692239812235e-23 | 4 |
60028.25 | 2080159344.6875 | 8320637378.75 | 0 |
| 2 | 2.35841110666499 | 2 | 1.00162190515027e-09 |
5.51265703910173e-22 | 1.10253140782035e-21 | 2 |
64723 | 8185321 | 16370642 | 0 |
| 5 | 3.67006941329779 | 5 | 1.55856294009462e-09 |
2.48037138283298e-22 | 1.24018569141649e-21 | 5 |
91421.8000000002 | 311832297.360004 | 1559161486.80002 | 1
|
| 25 | 0.0642310856516222 | 25 | 4.34122401117199e-18 |
3.26249249142758e-33 | 2.11491062878555e-23 | 25 |
0.000387041332263157 | 0.000823596056140079 | 15971940464.6317 |
0 |
| 3 | 0.0553984697432635 | 3 | 1.35118218886009e-11 |
1.88356776751884e-24 | 5.65070330255653e-24 | 3 |
192374.333333333 | 5540325024.22222 | 16620975072.6667 | 1
|
| 1 | 0.0598170364655694 | 1 | 5.98170364655694e-12 |
0 | 0 | 9 | 614772.111111111 |
46729777600.0988 | 420567998400.889 | 1 |
| 2 | 2.40935570399402 | 2 | 1.06014587530039e-09 |
1.85511209862074e-21 | 3.71022419724149e-21 | 2 |
124098.5 | 94799432.25 | 189598864.5 | 0 |
| 2 | 3.75383876744165 | 2 | 1.6517348099531e-09 |
2.99517737806749e-21 | 5.99035475613498e-21 | 2 |
127580 | 29343889 | 58687778 | 1 |
| 8 | 0.0578282387907833 | 8 | 6.3612824715241e-12 |
1.46948745955661e-24 | 1.17558996764529e-23 | 8 |
44661.375 | 606227179.484374 | 4849817435.875 | 0 |
| 2 | 2.23312330116474 | 2 | 9.84069931342639e-10 |
5.90889370820386e-24 | 1.18177874164077e-23 | 2 |
95942 | 103022500 | 206045000 | 0 |
| 3 | 3.63719484324392 | 3 | 1.60291363852544e-09 |
1.64648084029495e-24 | 4.93944252088485e-24 | 3 |
107642.333333333 | 54688129.5555556 | 164064388.666667 | 1
|
| 21 | 0.0642256881516386 | 21 | 1.09036856455564e-15 |
8.8214656876127e-29 | 2.20594786579723e-23 | 21 |
505.659194736842 | 25563595.0825748 | 10440376104.9474 | 0
|
| 6 | 2.34050329264044 | 6 | 9.89736523845484e-10 |
4.37458601659855e-21 | 2.62475160995913e-20 | 6 |
95674.5 | 980319184.916667 | 5881915109.5 | 4 |
| 2 | 3.77813413626504 | 2 | 1.59777679398117e-09 |
4.73409884425066e-21 | 9.46819768850132e-21 | 2 |
152731 | 93702400 | 187404800 | 1 |
| 11 | 0.0609584177049988 | 11 | 6.44268317703125e-12 |
6.98168165232986e-25 | 7.67984981756284e-24 | 11 |
19811.7272727273 | 419002387.107439 | 4609026258.18183 | 1
|
| 4 | 2.26442477619328 | 4 | 9.56990350219497e-10 |
2.33114941628846e-22 | 9.32459766515382e-22 | 4 |
55192 | 389648138.500001 | 1558592554 | 4 |
| 2 | 3.65648558160343 | 2 | 1.54530256606437e-09 |
3.17969413640606e-23 | 6.35938827281211e-23 | 2 |
94488 | 149426176 | 298852352 | 1 |
| 6 | 0.06014566257186 | 6 | 6.354684343455e-12 |
2.76845041181775e-24 | 1.66107024709065e-23 | 6 |
22074.8333333333 | 65456529.8055557 | 392739178.833334 | 0
|
| 1 | 0.0431140601815484 | 1 | 1.15377789324284e-11 |
0 | 0 | 1 | 60165 |
0 | 0 | 0 |
| 1 | 2.1059399217778 | 1 | 8.06129407346789e-10 |
0 | 0 | 1 | 83878 |
0 | 0 | 0 |
| 3 | 0.0593007303668922 | 3 | 2.26996326588174e-11 |
7.28273644399769e-25 | 2.18482093319931e-24 | 3 |
23793.6666666667 | 12274360.8888889 | 36823082.6666667 | 0
|
| 1 | 1.6277956282417 | 1 | 6.20468128385158e-10 |
0 | 0 | 2 | 434207.5 |
66045145056.25 | 132090290112.5 | 0 |
| 3 | 0.0477833757390128 | 3 | 9.10608265023302e-12 |
8.17238714328239e-25 | 2.45171614298472e-24 | 3 |
16690 | 313709316.666667 | 941127950 | 0 |
| 1 | 3.64975787228264 | 1 | 1.5681029150687e-09 |
0 | 0 | 4 | 584137 |
34387457907 | 137549831628 | 1 |
| 4 | 0.0519220785054065 | 4 | 5.58174922345779e-12 |
6.21576421159636e-25 | 2.48630568463854e-24 | 4 |
160889.5 | 13596732816.25 | 54386931265.0001 | 0 |
| 1 | 0.0456064727581774 | 1 | 8.12880675803266e-12 |
0 | 0 | 1 | 42213 |
0 | 0 | 0 |
| 2 | 0.0511535917725667 | 2 | 3.91354527306662e-12 |
7.66442002789915e-26 | 1.53288400557983e-25 | 2 |
50329.5 | 700793256.25 | 1401586512.5 | 0 |
| 2 | 0.0580980639826211 | 2 | 5.80980639826211e-12 |
7.06853806299458e-27 | 1.41370761259892e-26 | 2 |
31031 | 6466849 | 12933698 | 0 |
| 1 | 0.0606483710293593 | 1 | 1.76557193484095e-11 |
0 | 0 | 1 | 318206 |
0 | 0 | 0 |
| 1 | 2.46555788144639 | 1 | 1.19627095720665e-09 |
0 | 0 | 2 | 723793.5 |
1062336242.25 | 2124672484.5 | 0 |
| 2 | 0.035869244246474 | 2 | 3.5869244246474e-12 |
1.3471788987623e-28 | 2.69435779752459e-28 | 2 |
81553 | 19044 | 38088 | 0 |
| 1 | 3.61083165032698 | 1 | 1.69230451915117e-09 |
0 | 0 | 1 | 682445 |
0 | 0 | 1 |
| 3 | 0.0553657138954992 | 3 | 1.3884535526865e-11 |
1.46954109844377e-23 | 4.40862329533132e-23 | 3 |
110462 | 13759663184.6667 | 41278989554 | 1 |
| 1 | 0.0248152588777772 | 1 | 2.48152588777772e-12 |
0 | 0 | 1 | 1283 |
0 | 0 | 0 |
| 520 | 0.0241197254291424 | 520 | 2.41197254291426e-12 |
4.65667059095906e-27 | 9.46979186110223e-24 | 521 |
1476.4846785565 | 161431234.687471 | 449044068502.524 | 88
|
and the app_versions:
mysql> select plan_class, platformid, pfc_n, pfc_avg, pfc_scale, expavg_credit,
expavg_time from app_version;
+-------------+------------+--------+--------------------+--------------------+------------------+----------------+
| plan_class | platformid | pfc_n | pfc_avg | pfc_scale |
expavg_credit | expavg_time |
+-------------+------------+--------+--------------------+--------------------+------------------+----------------+
| ati13ati | 2 | 0 | 0 | 0 |
0 | 0 |
| ati13ati | 1 | 0 | 0 | 0 |
0 | 0 |
| | 6 | 0 | 0 | 0 |
0 | 0 |
| sse2 | 1 | 5183 | 1.85749578234338 | 0.503725434514578 |
25226.0714193459 | 1304806990.634 |
| | 16 | 2291 | 1.55319630231679 | 0.602459002671409 |
11556.9668607176 | 1304806969.92 |
| sse2 | 3 | 172 | 2.13004908052475 | 0.440368467359475 |
1550.72958210654 | 1304806488.414 |
| | 5 | 94 | 3.68178975961765 | 0 |
1670.5982438278 | 1304806017.214 |
| | 17 | 2 | 3.01483121473874 | 0 |
70.4898827326305 | 1304789315.745 |
| | 1 | 4280 | 3.52354851422042 | 0.265673550786121 |
22663.7425133748 | 1304807011.149 |
| | 4 | 515 | 1.8807907132518 | 0.497335472702601 |
3175.12668588108 | 1304806918.037 |
| sse2 | 4 | 506 | 1.89408156907962 | 0.494072198515848 |
2981.01655596329 | 1304806980.21 |
| | 3 | 160 | 2.44289583533288 | 0.383488010634257 |
1453.44498906773 | 1304806644.165 |
| cuda_opencl | 1 | 20276 | 12.150426232501 | 0.0769823181204677 |
128392.781710289 | 1304807031.767 |
| cuda_opencl | 4 | 2180 | 12.9333397864016 | 0.0723930237730462 |
30448.440729627 | 1304806969.922 |
| mt | 3 | 3627 | 0.0318466991417629 | 1.33462430200817 |
3024.85449954595 | 1304806765.874 |
| mt | 1 | 46671 | 0.0485419340069251 | 0.875701729493197 |
38641.0973243242 | 1304807026.351 |
| mt | 2 | 64215 | 0.0379434702153851 | 1.10886393595113 |
53288.5661698342 | 1304807026.352 |
| mt | 16 | 25102 | 0.0454158291762786 | 0.935924636672358 |
20532.6724309367 | 1304807026.351 |
| mt | 4 | 9959 | 0.037694977137984 | 1.12760517716179 |
8152.90755061967 | 1304806906.111 |
| | 3 | 0 | 0 | 0 |
0 | 0 |
| ati14 | 1 | 0 | 0 | 0 |
0 | 0 |
| ati14 | 4 | 0 | 0 | 0 |
0 | 0 |
| ati14 | 1 | 277933 | 0.0363949710428762 | 25.6784053535731 |
1197420.25439924 | 1304807031.768 |
| ati14 | 4 | 3860 | 0.0371031306160493 | 25.2217496509322 |
15677.4480390175 | 1304807031.769 |
+-------------+------------+--------+--------------------+--------------------+------------------+----------------+
24 rows in set (0.00 sec)
On May 7, 2011, at 6:14 PM, David Anderson wrote:
> Please set <debug_credit/> in your config;
> it will print some log messages that may clarify this.
> -- David
>
> On 04-May-2011 2:22 AM, Travis Desell wrote:
>> Looking in the database, it seems like he computer who had this problem (and
>> there are others) have really low values for et_avg in the database:
>>
>> mysql> select pfc_n, et_n, et_avg, et_var, et_q from host_app_version where
>> host_id = 1017 or host_id = 2270;
>> +-------+------+----------------------+-----------------------+----------------------+
>> | pfc_n | et_n | et_avg | et_var | et_q
>> |
>> +-------+------+----------------------+-----------------------+----------------------+
>> | 15 | 15 | 5.42546789377713e-10 | 3.81303780501951e-21 |
>> 5.71955670752926e-20 |
>> | 26 | 26 | 4.96156582191146e-17 | -2.79897154084492e-31 |
>> 1.46049779705622e-22 |
>> +-------+------+----------------------+-----------------------+----------------------+
>>
>> Could that be causing the problem?
>>
>> Why would the estimated time be so far off?
>>
>> --Travis
>>
>>
>> On May 4, 2011, at 1:18 AM, Travis Desell wrote:
>>
>>> Even more problems. I've been using the new credit policy on DNA@Home as
>>> well, and users are getting errors along these lines:
>>>
>>>
>>> 5/3/2011 10:54:42 PM DNA@Home Aborting task
>>> test_update3_0_3866_0_0: exceeded elapsed time limit 0.010911
>>> 5/3/2011 11:03:27 PM DNA@Home Aborting task
>>> test_update3_0_4009_0_0: exceeded elapsed time limit 0.010911
>>> 5/3/2011 11:03:27 PM DNA@Home Aborting task
>>> test_update3_0_4006_0_1: exceeded elapsed time limit 0.010911
>>> 5/3/2011 11:03:27 PM DNA@Home Aborting task
>>> test_update3_0_4005_0_1: exceeded elapsed time limit 0.010911
>>> 5/3/2011 11:03:27 PM DNA@Home Aborting task
>>> test_update3_0_4004_0_1: exceeded elapsed time limit 0.010911
>>> 5/3/2011 11:03:27 PM DNA@Home Aborting task
>>> test_update3_0_3964_0_0: exceeded elapsed time limit 0.010911
>>> 5/3/2011 11:03:27 PM DNA@Home Aborting task
>>> test_update3_0_3953_0_1: exceeded elapsed time limit 0.010911
>>> 5/3/2011 11:03:27 PM DNA@Home Aborting task
>>> test_update3_0_3937_0_1: exceeded elapsed time limit 0.010911
>>> 5/3/2011 11:03:27 PM DNA@Home Aborting task
>>> test_update3_0_3916_0_0: exceeded elapsed time limit 0.010911
>>>
>>> For some reason, they're using some weird time limits.. any fixes for this?
>>>
>>>
>>> On May 1, 2011, at 3:12 PM, Travis Desell wrote:
>>>
>>>> Just recently we've been having an error where for some of our nbody
>>>> workunits the client errors out with the message:
>>>>
>>>> Maximum elapsed time exceeded
>>>>
>>>> After just a couple seconds.
>>>>
>>>> For example:
>>>>
>>>> http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=20284800
>>>>
>>>> However, in the workunit the rsc_fpops_bound is set:
>>>>
>>>> mysql> select rsc_fpops_bound, rsc_fpops_est, rsc_memory_bound,
>>>> rsc_disk_bound from workunit where id = 13655742;
>>>> +----------------------+-----------------+------------------+----------------+
>>>> | rsc_fpops_bound | rsc_fpops_est | rsc_memory_bound |
>>>> rsc_disk_bound |
>>>> +----------------------+-----------------+------------------+----------------+
>>>> | 2.32172068444737e+16 | 232172068444737 | 500000000 |
>>>> 52428800 |
>>>> +----------------------+-----------------+------------------+----------------+
>>>> 1 row in set (0.00 sec)
>>>>
>>>>
>>>> Any reason why this could be happening?
>>>>
>>>>
>>>> ----------------------------------------------------------------------------------------------------------
>>>> Travis Desell<deselt @ cs.rpi.edu> 1-518-867-1054
>>>> Adjunct Professor& Postdoctoral Research Assistant
>>>> Rensselaer Polytechnic Institute, 110 8th Street, Troy NY 12180, USA
>>>> http://www.cs.rpi.edu/~deselt/
>>>> MilkyWay@Home ( http://milkyway.cs.rpi.edu/ )
>>>> DNA@Home ( http://dnahome.cs.rpi.edu/ )
>>>> Worldwide Computing Laboratory ( http://wcl.cs.rpi.edu/ )
>>>> ----------------------------------------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> boinc_dev mailing list
>>>> [email protected]
>>>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>>>> To unsubscribe, visit the above URL and
>>>> (near bottom of page) enter your email address.
>>>
>>> ----------------------------------------------------------------------------------------------------------
>>> Travis Desell<deselt @ cs.rpi.edu> 1-518-867-1054
>>> Adjunct Professor& Postdoctoral Research Assistant
>>> Rensselaer Polytechnic Institute, 110 8th Street, Troy NY 12180, USA
>>> http://www.cs.rpi.edu/~deselt/
>>> MilkyWay@Home ( http://milkyway.cs.rpi.edu/ )
>>> DNA@Home ( http://dnahome.cs.rpi.edu/ )
>>> Worldwide Computing Laboratory ( http://wcl.cs.rpi.edu/ )
>>> ----------------------------------------------------------------------------------------------------------
>>>
>>
>> ----------------------------------------------------------------------------------------------------------
>> Travis Desell<deselt @ cs.rpi.edu> 1-518-867-1054
>> Adjunct Professor& Postdoctoral Research Assistant
>> Rensselaer Polytechnic Institute, 110 8th Street, Troy NY 12180, USA
>> http://www.cs.rpi.edu/~deselt/
>> MilkyWay@Home ( http://milkyway.cs.rpi.edu/ )
>> DNA@Home ( http://dnahome.cs.rpi.edu/ )
>> Worldwide Computing Laboratory ( http://wcl.cs.rpi.edu/ )
>> ----------------------------------------------------------------------------------------------------------
>>
>> _______________________________________________
>> boinc_dev mailing list
>> [email protected]
>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>> To unsubscribe, visit the above URL and
>> (near bottom of page) enter your email address.
----------------------------------------------------------------------------------------------------------
Travis Desell <deselt @ cs.rpi.edu>
1-518-867-1054
Adjunct Professor & Postdoctoral Research Assistant
Rensselaer Polytechnic Institute, 110 8th Street, Troy NY 12180, USA
http://www.cs.rpi.edu/~deselt/
MilkyWay@Home ( http://milkyway.cs.rpi.edu/ )
DNA@Home ( http://dnahome.cs.rpi.edu/ )
Worldwide Computing Laboratory ( http://wcl.cs.rpi.edu/ )
----------------------------------------------------------------------------------------------------------
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.