A viable alternative to the infinitely variable "Client ID" problem (that on VM systems creates thousands of IDs), but for most hosts only changes each update ...
The one project this seems to affect directly : Einstein @ Home -- but any project with a large user base has the “Client ID flux problem” ... The Current state of things 1. Each time you update or upgrade a BOINC Client, a new Client ID gets generated. Mostly this only happens less than 5 times per year -- so not much database cleaning is needed. Mostly the project users do this with the “Merge CPUs” section of the project website, relating to tracking how many PCs the user has registered to that project. 2. Running BOINC in fixed and stable VMs does not induce extra Client ID generation to any extra extent. 3. In “Cloud” computing, VMs getting assigned BOINC images induce extra Client ID generation each time. Here is where the massive database bloat seems to be. There is a lot of DB user record cleaning and maintained involved as when this happens it happens in the millions. The solution TURN THE “Client ID” into some kind of tuple ... 1. A less volatile Client ID structure would be something like this -- Client_ID_tuple (“DE00KK”, md5(CPUID_record_all_bits), md5(GPU_ID), md5(OS-String), md5(BOINC-bild-id), md5(Client_ID(old))) -- “DE00KK” to “DE99KK” is for version numbers, this is standard Morse Code notation. -- Depending on what the xml syntax will permit, each tuple element could be separated by “|” or “;” or “,” ... -- 3 x MD5s would not consume too many more bits the SHA hash used now -- It is expected that on VMs ... that all three of these fields will not change each time a BOINC image is moved from one VM to another. -- The old Client_ID generation code would only be needed to run each time one updates to a new BOINC client version. The other signature elements in the tuple would not change, but on VMs at least only 2 of them might change. Some database logic could see the defect and merge the proper records. -- The bytes consumed here is hopefully no more than 3x the current Client_ID, but as this may save millions of regenerated Client_IDs from being generated and stored this is a modest cost. 2. Dragons a. CPUID is an x86 dependency, so for ARM or other CPU designs some other kind of CPU fingerprinting will be required. The Methods used for each CPU type will vary, so feel free to standardize on something sane and workable. b. On different OSes, the “OS-String” may be put in places different from the Linux / BSD / OSX / Windows locations. However, even QNX and Minix3 puts this string in a place that is within the standards of POSIX. Some edge code may be required on each OS. c. BOINC-build-id is the only component that is standardized here. It will only change each update. d. GPU_ID will need to have some preset values to indicate that no usable GPU is available. Otherwise use the standardized string that identifies the GPU within its API. There will be some edge code issues here. On VMs, direct use of the GPU may be mostly impossible -- but the outcome will hopefully always be “No GPU found”. e. Some VMs may forbid x86 CPUID instruction use, so a preset value will be needed to indicate this. This may be the one easy way to hint that a BOINC client is running in a VM. f. Some way will have to be found to change the Client_ID(old) a lot less often. With each update it should change, but changes should otherwise happen no more than 4x per year. Taking an md5() snapshot of it will reduce the bits required to store it. I don’t know if this will fix the problem, but it may at least get part of the way. MP DNS @ H _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
