Each EMR needs a unique person identifier code, I think nobody will object that. To 
allow exchange of EMRs (portability), these UPICs have to be not only unique but also 
allocated always by the same algorithm. It is obvious, that the chances for duplicates 
will still be high if you don't use a clearinghouse for those UPICs when the 
population is large enough.

Now, some countries like the Scandinavians have already unique person identifier 
codes, which each and every single inhabitant, be it native, immigrant or temporary 
resident, gets allocated either at time of birth or arrival in the counrty. Initially 
I though these national UPICs could be used as a base for an internatiional one (just 
by appending the two leter domaincode for example), but then, what happens with those 
who move f.e. from Norway to Sweden? Problem is the lack of persistence. You could use 
the code at the first time any code was allocated, but people would forget the older 
codes.

I searched the net, usenet archives, statistical yearbooks, ... and could not find a 
good system yet (Hey,maybe I just did not stumble over it - does anybody know a good 
one?)

Ergo, we need a (P)UPIC, a persistent unique personal identification code. Maybe we 
have to accept something less than perfect, something like a PPUPIC, a persistent 
pseudo-unique p.i.c. This would be a code "as unique as possible" (= duplicates 
unlikely but possible) that can be constructed out of information any patient in any 
country would know. What information should we use? Should be information most of the 
patients would know and at the same time would be discriminant enough to help building 
up the "uniqueness".

* sex: (at date of birth, changes disregarded)
* date of birth: in some coutries still a problem, but a good candidate
* country of birth: name of the country at time of birth  
* city of birth: again, some may not know, but a good discriminator
* name initials: name given at birth, later changes disregarded
* initials of parents first names: if known

An early proposal in the GNUMed project was the following:
character (unicode!) position
* 1..8         date of birth (yyyymmdd)
* 9             gender (c [m|f|?])
* 10..14     initials (ccccc [first2 + middle + last2])
* 15..16     mothers initials (cc [first+maiden|uu (unknown))
* 17..18     country of birth (cc [country code])
* 19..20     city of birth (cc [first two letters])

My ppupic would then be "19630224mhophemsdeme": looks long and ugly, but can be 
reconstructed anywhere at any time out of persistent information known to me.

I checked this against a database I have from a European study regarding
colorectal carcinoma,where I happened to write the database & statistics
package for. In the 289.000 entries, although fairly incomplete regarding
some of the details above, there was not a single double entry using this
simple  algorithm. Could be a starting point, but it is not good enough. One of the 
main critics towards this code was the mothers initials, as the maiden name is not 
known / not disclosable often enough in some countries. Looking closer at the 
proposal, there still a nightmare of difficult definitions "under the hood", despite 
the apparent simplicity.

Once again, I ask the list for help, proposals, and criticism.

Horst

Reply via email to