On Tue, Feb 9, 2010 at 8:53 PM, David Anderson <da...@ssl.berkeley.edu> wrote:
> There are no GPU-only apps.
> They all use some CPU.
> I guess you could say use at most 1% of processors
> (although that would still allow a 1-CPU app to run)

The <ncpus> flag is there foremost run more work than you actually
have CPUs in your system. The focus of this function should go back to
that flag. Now its use is confusing, while it's also being adopted by
some who think they know better than you, to set up their strict
amount of CPUs, instead of the "On multiprocessors" preference
setting.

It's also used by some to tell BOINC to primarily focus on using all
GPUs in the system and (neigh on) no CPUs. Even with <ncpus>0<ncpus>
set at this time (no CPUs), BOINC will use part of one CPU to cater
for the GPUs. It'll be able to do so since the GPU apps will always be
started by a CPU, but the CPU doesn't do much more than translate the
task to kernels and transfer that to the GPU's memory banks, plus
write whatever their outcome is back to disk.

Whether or not it gives the divide by zero problem Richard came
across, is something that needs to be tested. Nothing against Richard,
but he only saw it on one of his systems and he's the only one who saw
it thus far and only in the latest BOINC. Is he the only tester left
to BOINC? Can he reproduce that same error with all the previous BOINC
versions? Can it have been a fluke?

Thus far the only problem we've ever seen with <ncpus>0<ncpus> is
people complaining that BOINC stopped using their CPUs after upgrading
from BOINC 6.2 to BOINC 6.4 or above. It's possible that some Linux
distros come with a BOINC version with a pre-installed cc_config.xml
with this flag set to zero as well. But that needs investigation.

Does that give evidence the divide by zero error was never there? No.
But it doesn't give conclusive evidence that it was there either. For
all we know it was a cosmic ray that hit Richard's diskplatter in the
exact position where that entry was for his client_state.xml file. ;-)

I have seen (very localized) data transfer and disk-writing corruption
do strange things to entries in the client_state.xml file, while not
otherwise corrupting the file itself. Perhaps that we need to have a
better check at BOINC start-up if all the contents of the present
client_state.xml file are somewhat the same as the ones in the last
backup in client_state_prev. xml that we made? Now we write too
quickly to the backup file, without a sanity check, thereby possibly
corrupting both.

Back to the <ncpus> flag and its meaning. As I told you and Rom in
private, you changed using the how BOINC would recognize that the
service installation was used between BOINC 5 and 6 by going from
ENABLEPROTECTEDAPPLICATIONEXECUTION to
ENABLEPROTECTEDAPPLICATIONEXECUTION2, with BOINC 6 ignoring the
ENABLEPROTECTEDAPPLICATIONEXECUTION entry in the registry.

Perhaps that you need something similar for test flags that seem to
over-complicate things at this time. How can you easily reset their
use? By making them anew. Most all flags say in a way what they do,
like <memtest_debug> is used for memory debug tests, while
<sched_op_debug> is used for scheduler debug operations. So why can't
we rename <ncpus> to something that immediately tells us what its use
is, plus be using that one from the next BOINC major version onwards,
it ignoring the previous entry?
E.g. <test_only_ncpus>, <test_nr_cpus> or something similar.

Sorry for the wall of text. I had to be thorough in expressing my views.

-- 
-- Jord.
_______________________________________________
boinc_dev mailing list
boinc_dev@ssl.berkeley.edu
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to