A viable alternative to the infinitely variable "Client ID" problem (that on VM 
systems creates thousands of IDs), but for most hosts only changes each update 
... 

The one project this seems to affect directly : Einstein @ Home
-- but any project with a large user base has the “Client ID flux problem” ...


The Current state of things

1. Each time you update or upgrade a BOINC Client, a new Client ID gets 
generated. Mostly this only happens less than 5 times per year -- so not much 
database cleaning is needed. Mostly the project users do this with the “Merge 
CPUs” section of the project website, relating to tracking how many PCs the 
user has registered to that project. 

2. Running BOINC in fixed and stable VMs does not induce extra Client ID 
generation to any extra extent. 

3. In “Cloud” computing, VMs getting assigned BOINC images induce extra Client 
ID generation each time. Here is where the massive database bloat seems to be. 
There is a lot of DB user record cleaning and maintained involved as when this 
happens it happens in the millions. 


The solution

TURN THE “Client ID” into some kind of tuple ...

1. A less volatile Client ID structure would be something like this

-- Client_ID_tuple (“DE00KK”, md5(CPUID_record_all_bits), md5(GPU_ID), 
md5(OS-String), md5(BOINC-bild-id), md5(Client_ID(old)))

-- “DE00KK” to “DE99KK” is for version numbers, this is standard Morse Code 
notation. 

-- Depending on what the xml syntax will permit, each tuple element could be 
separated by “|” or “;” or “,” ...

-- 3 x MD5s would not consume too many more bits the SHA hash used now

-- It is expected that on VMs ... that all three of these fields will not 
change each time a BOINC image is moved from one VM to another. 

-- The old Client_ID generation code would only be needed to run each time one 
updates to a new BOINC client version. The other signature elements in the 
tuple would not change, but on VMs at least only 2 of them might change. Some 
database logic could see the defect and merge the proper records. 

-- The bytes consumed here is hopefully no more than 3x the current Client_ID, 
but as this may save millions of regenerated Client_IDs from being generated 
and stored this is a modest cost. 

2. Dragons

a. CPUID is an x86 dependency, so for ARM or other CPU designs some other kind 
of CPU fingerprinting will be required. The Methods used for each CPU type will 
vary, so feel free to standardize on something sane and workable.  

b. On different OSes, the “OS-String” may be put in places different from the 
Linux / BSD / OSX / Windows locations. However, even QNX and Minix3 puts this 
string in a place that is within the standards of POSIX. Some edge code may be 
required on each OS.

c. BOINC-build-id is the only component that is standardized here. It will only 
change each update. 

d. GPU_ID will need to have some preset values to indicate that no usable GPU 
is available. Otherwise use the standardized string that identifies the GPU 
within its API. There will be some edge code issues here.

On VMs, direct use of the GPU may be mostly impossible -- but the outcome will 
hopefully always be “No GPU found”.

e. Some VMs may forbid x86 CPUID instruction use, so a preset value will be 
needed to indicate this. This may be the one easy way to hint that a BOINC 
client is running in a VM.

f. Some way will have to be found to change the Client_ID(old) a lot less 
often. With each update it should change, but changes should otherwise happen 
no more than 4x per year. Taking an md5() snapshot of it will reduce the bits 
required to store it. 



I don’t know if this will fix the problem, but it may at least get part of the 
way.



MP

DNS @ H

_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to