Re: [GENERAL] Primary keys for companies and people

John D. Burger Tue, 07 Feb 2006 13:27:47 -0800

Leif B. Kristensen wrote:

Still, I'm struggling with the basic concept of /identity/, eg. is the
William Smith born to John Smith and Jane Doe in 1733, the same William
Smith who marries Mary Jones in the same parish in 1758? You may never
really know. Still, collecting such disparate "facts" under the same ID
number, thus taking the identity more or less for granted, is the modus
operandi of computer genealogy. Thus, one of the major objectives of
genealogy research, the assertion of identity, becomes totally hidden
the moment that you decide to cluster disparate evidence about what may
actually have been totally different persons, under a single ID number.

We have a similar issue in a database in which we are integratingmultiple geographic gazetteers, e.g., USGS, NGA, Wordnet. We cannot besure that source A's Foobar City is the same as source B's. Ourapproach is to =always= import them as separate entities, and use atable of equivalences that gets filled out using various heuristics.For example, if each source indicates that its Foobar City is in BazCounty, and we decide to equate the counties, we may equate the cities.

The alternative is of course to collect each cluster of evidence undera
separate ID, but then the handling of a "person" becomes a programmer's
nightmare.

Our intent is to have views and/or special versions of the databasethat collapse equivalent entities, but I must confess that we have notdone much along these lines - I hope it is not too nightmarish.


- John D. Burger
  MITRE


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

              http://archives.postgresql.org

Re: [GENERAL] Primary keys for companies and people

Reply via email to