On Tue, Feb 9, 2010 at 8:53 PM, David Anderson <da...@ssl.berkeley.edu> wrote: > There are no GPU-only apps. > They all use some CPU. > I guess you could say use at most 1% of processors > (although that would still allow a 1-CPU app to run)
The <ncpus> flag is there foremost run more work than you actually have CPUs in your system. The focus of this function should go back to that flag. Now its use is confusing, while it's also being adopted by some who think they know better than you, to set up their strict amount of CPUs, instead of the "On multiprocessors" preference setting. It's also used by some to tell BOINC to primarily focus on using all GPUs in the system and (neigh on) no CPUs. Even with <ncpus>0<ncpus> set at this time (no CPUs), BOINC will use part of one CPU to cater for the GPUs. It'll be able to do so since the GPU apps will always be started by a CPU, but the CPU doesn't do much more than translate the task to kernels and transfer that to the GPU's memory banks, plus write whatever their outcome is back to disk. Whether or not it gives the divide by zero problem Richard came across, is something that needs to be tested. Nothing against Richard, but he only saw it on one of his systems and he's the only one who saw it thus far and only in the latest BOINC. Is he the only tester left to BOINC? Can he reproduce that same error with all the previous BOINC versions? Can it have been a fluke? Thus far the only problem we've ever seen with <ncpus>0<ncpus> is people complaining that BOINC stopped using their CPUs after upgrading from BOINC 6.2 to BOINC 6.4 or above. It's possible that some Linux distros come with a BOINC version with a pre-installed cc_config.xml with this flag set to zero as well. But that needs investigation. Does that give evidence the divide by zero error was never there? No. But it doesn't give conclusive evidence that it was there either. For all we know it was a cosmic ray that hit Richard's diskplatter in the exact position where that entry was for his client_state.xml file. ;-) I have seen (very localized) data transfer and disk-writing corruption do strange things to entries in the client_state.xml file, while not otherwise corrupting the file itself. Perhaps that we need to have a better check at BOINC start-up if all the contents of the present client_state.xml file are somewhat the same as the ones in the last backup in client_state_prev. xml that we made? Now we write too quickly to the backup file, without a sanity check, thereby possibly corrupting both. Back to the <ncpus> flag and its meaning. As I told you and Rom in private, you changed using the how BOINC would recognize that the service installation was used between BOINC 5 and 6 by going from ENABLEPROTECTEDAPPLICATIONEXECUTION to ENABLEPROTECTEDAPPLICATIONEXECUTION2, with BOINC 6 ignoring the ENABLEPROTECTEDAPPLICATIONEXECUTION entry in the registry. Perhaps that you need something similar for test flags that seem to over-complicate things at this time. How can you easily reset their use? By making them anew. Most all flags say in a way what they do, like <memtest_debug> is used for memory debug tests, while <sched_op_debug> is used for scheduler debug operations. So why can't we rename <ncpus> to something that immediately tells us what its use is, plus be using that one from the next BOINC major version onwards, it ignoring the previous entry? E.g. <test_only_ncpus>, <test_nr_cpus> or something similar. Sorry for the wall of text. I had to be thorough in expressing my views. -- -- Jord. _______________________________________________ boinc_dev mailing list boinc_dev@ssl.berkeley.edu http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.