Re: [GENERAL] Primary Keys

2016-04-29 Thread Adrian Klaver

On 04/29/2016 11:07 AM, Dustin Kempter wrote:

Hi all,
Is there a query I can run that will scan through all the tables of a
database and give me a list of all tables without a primary key? Im not
having any luck with this.


Two options:

First
http://www.postgresql.org/docs/9.5/interactive/catalog-pg-class.html

select * from pg_class where relhaspkey ='f' and relkind ='r' and 
relname not like 'pg_%';



*NOTE* from above link:
"
relhaspkey  boolTrue if the table has (or once had) a primary 
key
"
So it may not totally reflect current reality.


Second

http://www.postgresql.org/docs/9.5/interactive/information-schema.html

http://www.postgresql.org/docs/9.5/interactive/infoschema-table-constraints.html

http://www.postgresql.org/docs/9.5/interactive/infoschema-tables.html

I restricted the below to exclude system and information_schema tables:

select * from information_schema.tables where table_catalog = 'test' and 
table_schema !='pg_catalog' and table_schema != 'information_schema' 
and table_name not in(select table_name from 
information_schema.table_constraints where constraint_type = 'PRIMARY KEY');



*NOTE* The information returned is dependent on the privileges of the 
user running the query, so if you want to see everything run as a superuser.




Thanks in advance!





--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Primary Keys

2016-04-29 Thread John McKown
On Fri, Apr 29, 2016 at 1:20 PM, Melvin Davidson 
wrote:

>
>
> On Fri, Apr 29, 2016 at 2:07 PM, Dustin Kempter <
> dust...@consistentstate.com> wrote:
>
>> Hi all,
>> Is there a query I can run that will scan through all the tables of a
>> database and give me a list of all tables without a primary key? Im not
>> having any luck with this.
>>
>> Thanks in advance!
>
>

>
>> Please, ALWAYS provide Postgresql version & O/S, regardless of whether
> you think it is pertinet.
>
>
> Now try this:
>
> SELECT n.nspname, c.relname as table
>   FROM pg_class c
> JOIN pg_namespace n ON (n.oid =c.relnamespace )
>  WHERE relkind = 'r' AND
>relname NOT LIKE 'pg_%' AND
>relname NOT LIKE 'sql_%' AND
>relhaspkey = FALSE
> ORDER BY n.nspname, c.relname;
>

​That gives a list of all tables in all schemas in the database. But this
needs to be refined to those without a primary key. Using the -E switch, I
looked at the output from \di+ , which will list ​whether the table
has a primary key or not, that command is implemented via multiple SELECT
statements which I haven't reviewed yet.



>
>
> --
> *Melvin Davidson*
> I reserve the right to fantasize.  Whether or not you
> wish to share my fantasy is entirely up to you.
>



-- 
The unfacts, did we have them, are too imprecisely few to warrant our
certitude.

Maranatha! <><
John McKown


Re: [GENERAL] Primary Keys

2016-04-29 Thread Melvin Davidson
On Fri, Apr 29, 2016 at 2:07 PM, Dustin Kempter  wrote:

> Hi all,
> Is there a query I can run that will scan through all the tables of a
> database and give me a list of all tables without a primary key? Im not
> having any luck with this.
>
> Thanks in advance!
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Please, ALWAYS provide Postgresql version & O/S, regardless of whether you
think it is pertinet.


Now try this:

SELECT n.nspname, c.relname as table
  FROM pg_class c
JOIN pg_namespace n ON (n.oid =c.relnamespace )
 WHERE relkind = 'r' AND
   relname NOT LIKE 'pg_%' AND
   relname NOT LIKE 'sql_%' AND
   relhaspkey = FALSE
ORDER BY n.nspname, c.relname;

-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


[GENERAL] Primary Keys

2016-04-29 Thread Dustin Kempter

Hi all,
Is there a query I can run that will scan through all the tables of a 
database and give me a list of all tables without a primary key? Im not 
having any luck with this.


Thanks in advance!


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] primary keys

2009-09-12 Thread Grant Maxwell

Hi Folks

I'm looking for a bit of advice regarding alpha primary keys.

I have a table (designed by someone else) that has a numeric primary  
key and also a unique non-null email address field.


The use of the primary key is causing me some headaches in that in   
multiple database server environment each server allocates a unique  
number from a range
(and that works fine) but when the table is replicated (master-master- 
master) the exception handling is a bit tricky because each database  
server may have
records that are duplicate at the email address field - with a  
different primary key number.


I don't know why it was done this way but it seems to me that the  
email addresses are unique, non null and could be used as the primary  
key. This would make the

replication much faster and simpler.

Does anyone out there think the change (number to email address as  
primary key) would be a bad idea ?


your thoughts would be appreciated.

regards
Grant

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] primary keys

2009-09-12 Thread Merlin Moncure
On Sat, Sep 12, 2009 at 12:35 PM, Grant Maxwell
grant.maxw...@maxan.com.au wrote:
 Hi Folks

 I'm looking for a bit of advice regarding alpha primary keys.

 I have a table (designed by someone else) that has a numeric primary key and
 also a unique non-null email address field.

 The use of the primary key is causing me some headaches in that in  multiple
 database server environment each server allocates a unique number from a
 range
 (and that works fine) but when the table is replicated
 (master-master-master) the exception handling is a bit tricky because each
 database server may have
 records that are duplicate at the email address field - with a different
 primary key number.

 I don't know why it was done this way but it seems to me that the email
 addresses are unique, non null and could be used as the primary key. This
 would make the
 replication much faster and simpler.

 Does anyone out there think the change (number to email address as primary
 key) would be a bad idea ?

 your thoughts would be appreciated.

I think it's a fine idea so long as you understand:
*) the index(es) will be larger
*) referring tables will have to be updated (via RI) if/when the email
address changes
*) lots of people will tell you that you are doing it wrong :-)
*) if you need to specialize p-key further (add another field), it can
be a big change

on the plus side:
*) you get to drop an index because you obviously had to index the key
serparately
*) if you query a referring table and are interested in email address
(and no dependent props), you get to skip a join
*) sorting be email address can be free
*) less data transfer headaches

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] primary keys

2009-09-12 Thread Bill Moran
On Sun, 13 Sep 2009 02:35:02 +1000
Grant Maxwell grant.maxw...@maxan.com.au wrote:

 Hi Folks
 
 I'm looking for a bit of advice regarding alpha primary keys.
 
 I have a table (designed by someone else) that has a numeric primary  
 key and also a unique non-null email address field.
 
 The use of the primary key is causing me some headaches in that in   
 multiple database server environment each server allocates a unique  
 number from a range
 (and that works fine) but when the table is replicated (master-master- 
 master) the exception handling is a bit tricky because each database  
 server may have
 records that are duplicate at the email address field - with a  
 different primary key number.
 
 I don't know why it was done this way but it seems to me that the  
 email addresses are unique, non null and could be used as the primary  
 key. This would make the
 replication much faster and simpler.
 
 Does anyone out there think the change (number to email address as  
 primary key) would be a bad idea ?
 
 your thoughts would be appreciated.

I think you have two issues here, and if you identify them as such, you'll
come up with a better decision.

The first is, Should the email address be my primary key.  And the answer
to that really depends on whether you expect it to change a lot, whether you
have a lot of FK relations and how easy/difficult it will be to manage those.

The second, which may influence the first, is how do I manage conflicts
when data from different sources is merged? which is a huge, complex
question.  One potential answer is to replace your integer surrogate key
with a GUID key, which will be unique from all the different sources.  You
could also use a prefix system to ensure uniqueness.  Another option is
to use the email address itself.  I'm surprised by your approach, as
personally, I've seen a LOT of people who share a single email address
(husband/wife, for example).  It's not horribly common, since getting
addresses is easy, but it happens, and it's something to consider when
setting this up: if two people share an email address and try to add their
records at different places, how will the system handle it?

In any event, if all those factors tell you that you should switch to using
the email address as the PK, I doubt you'll regret your decision.  Just be
sure to take into account the foreign key factor early on and you shouldn't
have too many problems (in my experience, a lot of people are unaware of
ON DELETE CASCADE and ON UPDATE CASCADE).

Hope this helps.

-Bill

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] primary keys

2009-09-12 Thread Merlin Moncure
On Sat, Sep 12, 2009 at 2:11 PM, Bill Moran wmo...@potentialtech.com wrote:
 On Sun, 13 Sep 2009 02:35:02 +1000
 Grant Maxwell grant.maxw...@maxan.com.au wrote:

 Hi Folks

 I'm looking for a bit of advice regarding alpha primary keys.

 I have a table (designed by someone else) that has a numeric primary
 key and also a unique non-null email address field.

 The use of the primary key is causing me some headaches in that in
 multiple database server environment each server allocates a unique
 number from a range
 (and that works fine) but when the table is replicated (master-master-
 master) the exception handling is a bit tricky because each database
 server may have
 records that are duplicate at the email address field - with a
 different primary key number.

 I don't know why it was done this way but it seems to me that the
 email addresses are unique, non null and could be used as the primary
 key. This would make the
 replication much faster and simpler.

 Does anyone out there think the change (number to email address as
 primary key) would be a bad idea ?

 your thoughts would be appreciated.

 I think you have two issues here, and if you identify them as such, you'll
 come up with a better decision.

 The first is, Should the email address be my primary key.  And the answer
 to that really depends on whether you expect it to change a lot, whether you
 have a lot of FK relations and how easy/difficult it will be to manage those.

 The second, which may influence the first, is how do I manage conflicts
 when data from different sources is merged? which is a huge, complex
 question.  One potential answer is to replace your integer surrogate key
 with a GUID key, which will be unique from all the different sources.  You

guid solves the surrogate issue wrt data transfer, but glosses over
what happens when you have duplicates.

 could also use a prefix system to ensure uniqueness.  Another option is
 to use the email address itself.  I'm surprised by your approach, as
 personally, I've seen a LOT of people who share a single email address
 (husband/wife, for example).  It's not horribly common, since getting

that's going to depend on how it's defined in the app. couple of
different approaches:

email _must_ be unique, and we care not to whom it belongs:
email (email primary key)
contact (email references email [...])

we do care option 1:
email (email, memo text, primary key(email, memo))
contact (email, memo, references email(email, memo))

storing info in memo field to distinguish the different users

we do care option 2:
email (email primary key)
contact (contact_id)
email_contact_map
(
  contact_id references contact,
  email references email,
  memo, -- 'person a', 'person b', etc
  primary key(contact_id, email),
)

The point is this: if you (the user) needs to distinguish between
email adress users, that information should be in the
database...allowing multiple entry of email addresses via serial or
guid allows you to sneak by this requirement

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] primary keys

2009-09-12 Thread Grant Maxwell


On 13/09/2009, at 2:46 AM, Tom Lane wrote:


Grant Maxwell grant.maxw...@maxan.com.au writes:

I don't know why it was done this way but it seems to me that the
email addresses are unique, non null and could be used as the primary
key. This would make the replication much faster and simpler.



Does anyone out there think the change (number to email address as
primary key) would be a bad idea ?


One thing that's often considered a useful attribute of a primary  
key is

that it be immutable.  In your application, do users ever change their
email addresses?  If so, what should happen --- is it okay to treat  
that

as effectively a new entry?

This would be ok. The table keeps a list of email addresses and some  
stats on them
users may have multiple addresses but would never alter a specific  
record.



Also, if you have any other tables referencing this one via foreign
keys, you'd have to have them storing the email address instead of
the serial number; it'll be bulkier and address updates will be that
much more expensive.

so text PKs will be less efficient than numeric ?


You can find lots and lots and lots of discussion of this topic if
you search the archives for talk about natural versus surrogate keys.


Good pointer = thanks Tom

regards, tom lane



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] primary keys

2009-09-12 Thread Tom Lane
Grant Maxwell grant.maxw...@maxan.com.au writes:
 I don't know why it was done this way but it seems to me that the  
 email addresses are unique, non null and could be used as the primary  
 key. This would make the replication much faster and simpler.

 Does anyone out there think the change (number to email address as  
 primary key) would be a bad idea ?

One thing that's often considered a useful attribute of a primary key is
that it be immutable.  In your application, do users ever change their
email addresses?  If so, what should happen --- is it okay to treat that
as effectively a new entry?

Also, if you have any other tables referencing this one via foreign
keys, you'd have to have them storing the email address instead of
the serial number; it'll be bulkier and address updates will be that
much more expensive.

You can find lots and lots and lots of discussion of this topic if
you search the archives for talk about natural versus surrogate keys.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] primary keys

2006-04-21 Thread Patrick TJ McPhee
In article [EMAIL PROTECTED],
Klint Gore [EMAIL PROTECTED] wrote:

% works for me on version 8.1.3
% 
% SELECT attname
%  FROM pg_index
%JOIN pg_class ON (indrelid = pg_class.oid)
%JOIN pg_attribute ON (attrelid = pg_class.oid)
%  WHERE indisprimary IS TRUE
%AND attnum = any(indkey)
%AND relname = $tablename;

This will work on 7.4, 8.0, or 8.1

SELECT attname
 FROM pg_index
   JOIN pg_class as c1 ON (indrelid = c1.oid)
   JOIN pg_class as c2 ON (indexrelid = c2.oid)
   JOIN pg_attribute ON (attrelid = c2.oid)
 WHERE indisprimary
   AND c1.relname = $tablename
;

No arrays are hurt by this query.
-- 

Patrick TJ McPhee
North York  Canada
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[GENERAL] primary keys

2006-04-19 Thread Orion Henry
I'm trying to craft a query that will determine what column(s) are the 
primary key for a given table.   I have succeeded but the query is so 
ugly that it borders on silly and cannot work for an arbitrary number of 
tables since indkey is an int2vect and the ANY keyword does not work on 
it. 

Please tell me there's an easier way to do this.  Here is the query for 
tablename $table.


SELECT attname
FROM pg_index
 JOIN pg_class ON (indrelid = pg_class.oid)
 JOIN pg_attribute ON (attrelid = pg_class.oid)
WHERE indisprimary IS TRUE
 AND (attnum = indkey[0] OR attnum = indkey[1] OR attnum = indkey[2])
 AND relname = '$table';

Orion


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] primary keys

2006-04-19 Thread Klint Gore
On Wed, 19 Apr 2006 19:39:45 -0700, Orion Henry [EMAIL PROTECTED] wrote:
 I'm trying to craft a query that will determine what column(s) are the 
 primary key for a given table.   I have succeeded but the query is so 
 ugly that it borders on silly and cannot work for an arbitrary number of 
 tables since indkey is an int2vect and the ANY keyword does not work on 
 it. 
 
 Please tell me there's an easier way to do this.  Here is the query for 
 tablename $table.
 
 SELECT attname
 FROM pg_index
   JOIN pg_class ON (indrelid = pg_class.oid)
   JOIN pg_attribute ON (attrelid = pg_class.oid)
 WHERE indisprimary IS TRUE
   AND (attnum = indkey[0] OR attnum = indkey[1] OR attnum = indkey[2])
   AND relname = '$table';
 
 Orion

works for me on version 8.1.3

SELECT attname
 FROM pg_index
   JOIN pg_class ON (indrelid = pg_class.oid)
   JOIN pg_attribute ON (attrelid = pg_class.oid)
 WHERE indisprimary IS TRUE
   AND attnum = any(indkey)
   AND relname = $tablename;

or on v7 you could try

select pcl.relname,  
   (select array_accum(attname) from pg_attribute where attrelid = 
pco.conrelid and attnum = any(pco.conkey)) as cols
from pg_constraint pco
join pg_class pcl on pcl.oid = pco.conrelid
where pcl.relname = $tablename
and pco.contype = 'p'

klint.

+---+-+
: Klint Gore: Non rhyming:
: EMail   : [EMAIL PROTECTED]   :  slang - the:
: Snail   : A.B.R.I.:  possibilities  :
: Mail  University of New England   :  are useless   :
:   Armidale NSW 2351 Australia : L.J.J.  :
: Fax : +61 2 6772 5376 : :
+---+-+

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [GENERAL] Primary keys for companies and people

2006-02-07 Thread John D. Burger

Leif B. Kristensen wrote:


Still, I'm struggling with the basic concept of /identity/, eg. is the
William Smith born to John Smith and Jane Doe in 1733, the same William
Smith who marries Mary Jones in the same parish in 1758? You may never
really know. Still, collecting such disparate facts under the same ID
number, thus taking the identity more or less for granted, is the modus
operandi of computer genealogy. Thus, one of the major objectives of
genealogy research, the assertion of identity, becomes totally hidden
the moment that you decide to cluster disparate evidence about what may
actually have been totally different persons, under a single ID number.


We have a similar issue in a database in which we are integrating 
multiple geographic gazetteers, e.g., USGS, NGA, Wordnet.  We cannot be 
sure that source A's Foobar City is the same as source B's.  Our 
approach is to =always= import them as separate entities, and use a 
table of equivalences that gets filled out using various heuristics.  
For example, if each source indicates that its Foobar City is in Baz 
County, and we decide to equate the counties, we may equate the cities.


The alternative is of course to collect each cluster of evidence under 
a

separate ID, but then the handling of a person becomes a programmer's
nightmare.


Our intent is to have views and/or special versions of the database 
that collapse equivalent entities, but I must confess that we have not 
done much along these lines - I hope it is not too nightmarish.


- John D. Burger
  MITRE


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [GENERAL] Primary keys for companies and people

2006-02-06 Thread Michael Glaesemann


On Feb 4, 2006, at 2:23 , Merlin Moncure wrote:


If you kind determine an easy natural differentiator, invent one:
create table contact
(
  account text, name text, memo text,
  primary key(account, name, memo)
);

The memo field is blank in most cases unlees it's needed.  Suppose you
were filling contact information in your databse and Taking your
second John Smith from an account...your operator says, 'we already
have a john smith for your account, can you give us something to
identify him?'  Put that in the memo field and there you go.


Merlin,

Thanks for the blissfully simple solution! I think this can work well  
for me.


Michael Glaesemann
grzm myrealbox com




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Primary keys for companies and people

2006-02-03 Thread Merlin Moncure
 I definitely agree with you here, Merlin. Mutability is not the issue
 at hand. May I ask what strategies you use for determining uniqueness
 for people?

Well, that depends on the particular problem at hand.  If you had two
john smiths in your system, how would you distinguish them? If you
assinged an account number in your system and gave it to the person to
refer back to you, this ok...this is not a surrogate key per se, but a
meaningful alias.

However, that technique can't always be applied, take the case of the
'contacts' table for an account.  Since you don't give each contact of
each accunt a number and you don't print a number representing them on
various reports (there would be no reason to), adding a surrogate to
the table adds nothing to your database, it's just meaningless
infromation with no semantic value.  There *must* be a semantic
difference between the two John Smiths or you should be storing one
record in the database, not two.

If you kind determine an easy natural differentiator, invent one:
create table contact
(
  account text, name text, memo text,
  primary key(account, name, memo)
);

The memo field is blank in most cases unlees it's needed.  Suppose you
were filling contact information in your databse and Taking your
second John Smith from an account...your operator says, 'we already
have a john smith for your account, can you give us something to
identify him?'  Put that in the memo field and there you go.

Now your operator is taking information which has value pertaining to
the scope of the problem domain your application exists in.  This is
just one example of how to approach the problem  Now there is no
ambiguiity about which john smith you are dealing with. This may not
be a perfect solution for every application but there is basically has
to be a method of finding semantic unquenes to your data or you have a
hole in your data model.  Glossing over that hole with artificial
information solves nothing.

There are pracitcal reasons to use surrogates but the uniqueness
argument generally holds no water.  By 'generating' uniqueness you are
breaking your data on mathematical terms.  Until you truly understand
the ramifcations of that statement you can't really understand when
the practical cases apply.  Many of the arguments for surrugates based
on mutability and uniqueness are simply illogical.

The performance issue is more complex and leads to the issue of
practicality.  I wouldn't find any fault with a modeler who
benchmarked his app with a surrogate vs. a 5 part key  and chose the
former as long as he truly understood the downside to doing that
(extra joins).

Merlin

---(end of broadcast)---
TIP 6: explain analyze is your friend


[GENERAL] Primary keys for companies and people

2006-02-02 Thread Michael Glaesemann

Hello, all!

Recently there was quite a bit of discussion regarding surrogate keys  
and natural keys. I'm not interested in discussing the pros and cons  
of surrogate keys. What I'd like to find out are the different  
methods people actually use to uniquely identify companies and people  
*besides* surrogate keys.


I'm currently working on an application that will include contact  
information, so being able to uniquely identify these two entities is  
of interest to me. Right now I'm thinking of uniquely identifying  
companies by telephone number. For example:


create table companies (
company_id integer primary key -- telephone number, not serial
, company_name text not null
);

Of course, the company may have more than one telephone number  
associated with it, so there will also be a table associating  
telephone numbers and companies.


create table companies__telephone_numbers (
company_id integer not null
references companies (company_id)
on update cascade on delete cascade
, telephone_number integer not null
, unique (company_id, telephone_number)
);

There should also be a trigger that will check that the company_id  
matches an existing telephone number associated with the company,  
something like:


create function assert_company_id_telephone_number_exists
returns trigger
language plpgsql as $$
begin
if exists (
select company_id
from companies
except
select company_id
from companies
join companies__telephone_numbers on (company_id = telephone_number)
)
then raise exception 'company_id must match existing company  
telephone number';

end if;
return null;
end;
$$;

For people I'm more or less stumped. I can't think of a combination  
of things that I know I'll be able to get from people that I'll want  
to be able to add to the database. Starting off we'll have at least  
7,000 individuals in the database, and I don't think that just family  
and given names are going to be enough. I don't think we'll be able  
to get telephone numbers for all of them, and definitely aren't going  
to be getting birthdays for all.


I'm very interested to hear what other use in their applications for  
holding people and companies.


Michael Glaesemann
grzm myrealbox com




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] Primary keys for companies and people

2006-02-02 Thread Leif B. Kristensen
On Thursday 02 February 2006 09:05, Michael Glaesemann wrote:

For people I'm more or less stumped. I can't think of a combination
of things that I know I'll be able to get from people that I'll want
to be able to add to the database. Starting off we'll have at least
7,000 individuals in the database, and I don't think that just family
and given names are going to be enough. I don't think we'll be able
to get telephone numbers for all of them, and definitely aren't going
to be getting birthdays for all.

I'm very interested to hear what other use in their applications for
holding people and companies.

I've been thinking long and hard about the same thing myself, in 
developing my genealogy database. For identification of people, there 
seems to be no realistic alternative to an arbitrary ID number.

Still, I'm struggling with the basic concept of /identity/, eg. is the 
William Smith born to John Smith and Jane Doe in 1733, the same William 
Smith who marries Mary Jones in the same parish in 1758? You may never 
really know. Still, collecting such disparate facts under the same ID 
number, thus taking the identity more or less for granted, is the modus 
operandi of computer genealogy. Thus, one of the major objectives of 
genealogy research, the assertion of identity, becomes totally hidden 
the moment that you decide to cluster disparate evidence about what may 
actually have been totally different persons, under a single ID number.

The alternative is of course to collect each cluster of evidence under a 
separate ID, but then the handling of a person becomes a programmer's 
nightmare.

I have been writing about my genealogy data model here: 
url:http://solumslekt.org/forays/blue.php The model has been slightly 
modified since I wrote this; due to what I perceive as 'gotchas' in the 
PostgreSQL implementation of table inheritance, I have dropped the 
'citations' table. Besides, I've dropped some of the surrogate keys, 
and more will follow. I really should update this article soon.

I should perhaps be posting this under another subject, but I feel that 
beneath the surface, Michael's problem and my own are strongly related.
-- 
Leif Biberg Kristensen | Registered Linux User #338009
http://solumslekt.org/ | Cruising with Gentoo/KDE

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Primary keys for companies and people

2006-02-02 Thread David Goodenough
On Thursday 02 February 2006 09:07, Leif B. Kristensen wrote:
 On Thursday 02 February 2006 09:05, Michael Glaesemann wrote:
 For people I'm more or less stumped. I can't think of a combination
 of things that I know I'll be able to get from people that I'll want
 to be able to add to the database. Starting off we'll have at least
 7,000 individuals in the database, and I don't think that just family
 and given names are going to be enough. I don't think we'll be able
 to get telephone numbers for all of them, and definitely aren't going
 to be getting birthdays for all.
 
 I'm very interested to hear what other use in their applications for
 holding people and companies.

 I've been thinking long and hard about the same thing myself, in
 developing my genealogy database. For identification of people, there
 seems to be no realistic alternative to an arbitrary ID number.

 Still, I'm struggling with the basic concept of /identity/, eg. is the
 William Smith born to John Smith and Jane Doe in 1733, the same William
 Smith who marries Mary Jones in the same parish in 1758? You may never
 really know. Still, collecting such disparate facts under the same ID
 number, thus taking the identity more or less for granted, is the modus
 operandi of computer genealogy. Thus, one of the major objectives of
 genealogy research, the assertion of identity, becomes totally hidden
 the moment that you decide to cluster disparate evidence about what may
 actually have been totally different persons, under a single ID number.

 The alternative is of course to collect each cluster of evidence under a
 separate ID, but then the handling of a person becomes a programmer's
 nightmare.

 I have been writing about my genealogy data model here:
 url:http://solumslekt.org/forays/blue.php The model has been slightly
 modified since I wrote this; due to what I perceive as 'gotchas' in the
 PostgreSQL implementation of table inheritance, I have dropped the
 'citations' table. Besides, I've dropped some of the surrogate keys,
 and more will follow. I really should update this article soon.

 I should perhaps be posting this under another subject, but I feel that
 beneath the surface, Michael's problem and my own are strongly related.
There is also the problem that a name can change.  People change names
by deed-poll, and also women can adopt a married name or keep their old
one.  All in all an ID is about the only answer.

David

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] Primary keys for companies and people

2006-02-02 Thread Martijn van Oosterhout
On Thu, Feb 02, 2006 at 10:36:54AM +, David Goodenough wrote:
  Still, I'm struggling with the basic concept of /identity/, eg. is the
  William Smith born to John Smith and Jane Doe in 1733, the same William
  Smith who marries Mary Jones in the same parish in 1758? You may never
  really know. Still, collecting such disparate facts under the same ID
  number, thus taking the identity more or less for granted, is the modus
  operandi of computer genealogy. Thus, one of the major objectives of
  genealogy research, the assertion of identity, becomes totally hidden
  the moment that you decide to cluster disparate evidence about what may
  actually have been totally different persons, under a single ID number.
 
  The alternative is of course to collect each cluster of evidence under a
  separate ID, but then the handling of a person becomes a programmer's
  nightmare.

 There is also the problem that a name can change.  People change names
 by deed-poll, and also women can adopt a married name or keep their old
 one.  All in all an ID is about the only answer.

True, the issue being ofcourse that changing a name doesn't change
their identity.

To the GP, your page is an interesting one and raises several
interesting points. In particular the one about the person being the
conclusion of the rest of the database. You essentially have a set of
facts A married B in C on date D and you're trying to correlate
these. In the end it's just a certain amount of guess work, especially
since back then they wern't that particular about spelling as they are
today.

My naive view is that you're basically assigning trust values to each
fact and the chance that two citations refer to the same person. In
principle you'd be able to cross-reference all these citations and
build the structure quasi-automatically. I suppose in practice this is
done by hand.

As for your question, I think you're stuck with having a person ID.
Basically because you need to identify a person somehow. Given you
still have the original citiations, you can split a person into
multiple if the situation appears to not work out.

One thing I find odd though, your person objects have no birthdate or
deathdate. Or birth place either. I would have thought these elements
would be fundamental in determining if two people are the same, given
that they can't change and people are unlikely to forget them.

Put another way, two people with the same birthday in the same place
with similar names are very likely to be the same. If you can
demostrate this is not the case that's another fact. In the end you're
dealing with probabilities, you can never know for sure.

Anyway, hope this helps. It's a subject I've been vaguely interested in
but never really had the time to look into.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


signature.asc
Description: Digital signature


Re: [GENERAL] Primary keys for companies and people

2006-02-02 Thread Ted Byers


- Original Message - 
From: Leif B. Kristensen [EMAIL PROTECTED]

To: pgsql-general@postgresql.org
Sent: Thursday, February 02, 2006 4:07 AM
Subject: Re: [GENERAL] Primary keys for companies and people



[snip]
I'm very interested to hear what other use in their applications for
holding people and companies.


I've been thinking long and hard about the same thing myself, in
developing my genealogy database. For identification of people, there
seems to be no realistic alternative to an arbitrary ID number.

Still, I'm struggling with the basic concept of /identity/, eg. is the
William Smith born to John Smith and Jane Doe in 1733, the same William

[snip]

I have long been interested in this issue, and it is one that transcends the 
problem of IDs in IT.  For my second doctorate, I examined this in the 
context of historical investigation, applying numerical classification 
techniques to biographical information that can be extracted from historical 
documents.  It is, I fear, a problem for which only a probabilistic answer 
can be obtained in most historical cases.  For example, there was an 
eleventh century viking king Harold who as a teenager was part of his 
cousin's court, and then found it necessary to flee to Kiev when his cousin 
found hiimself on the losing side of a rebellion.  He then made his way into 
the Byzantine empire and served the emperor as a mercenary through much of 
the mediterranean, finally returning in fame and glory to Norway where he 
found another relative (a nephew IIRC) on the throne, which he inherited 
about a year after his return.  Impagine yourself as a historian trying to 
write his biography.  You'd find various documents all over the western 
world (as known in the viking age) written in a variety of languages, and 
using different names to refer to him.  It isn't an easy task to determine 
which documents refer specifically to him.  And to make things even more 
interesting, many documents refer to a given person only by his official 
title, and in other cases, the eldest son of each generation was given the 
same name as his father.


In my own case, in the time I was at the University of Toronto, I know of 
four other men who had precisely the same name I have.  I know this from 
strange phone calls from faculty I never studied with about assignments and 
examinations for courses I had never taken.  In each case, the professor 
checked again with the university's records department and found the correct 
student.  The last case was particularly disturbing since in that case, 
things were a bit different in that I had taken a graduate course with the 
professor in question, and he stopped me on campus and asked about an 
assignment for a given advanced undergraduate course that I had not taken, 
but my namesake had.  What made this disturbing is that not only did the 
other student carry my name, but he also looked enough like me that our 
professor could mistake me for him on campus!  I can only hope that he is a 
well behaved, law abiding citizen!  The total time period in question was 18 
years.  In general, the problem only gets more challenging as the length, 
and as the age, of the historical period considered increases.


The point is, not only are the combinations of family and given names not 
reliably unique, even certain biological data, such as photographs of the 
human face, not adequately unique.  Even DNA fingerprints, putatively the 
best available biometric technology, are not entirely reliable since even 
that can not distinguish between identical twins, and at the same time, 
there can be, admittedly extremely rare as far as we know, developmental 
anomalies resulting in a person being his own twin (this results from twin 
fetuses merging, with the consequence that the resulting person has some 
organ systems from one of the original fetuses and some from the other). 
For historical questions, I don't believe one can get any better than 
inference based on a balance of probabilities.  A geneologist has no option 
but to become an applied statistician!  For purposes of modern investigation 
or for the purpose of modern business, one may do better through an 
appropriate use of a combination of technologies.  This is a hard problem, 
even with the use of the best available technologies and especially given 
the current problems associated with identity theft.


For software developers in general, and database developers in particular, 
there are several distinct questions to consider.:


1) How does one reliably determine identity to begin with, and then use that 
identity with whatever technology one might use to represent it?


2) How good does this technology, and identification process need to be?  In 
other words, how does the cost of a mistake (esp. in identification) relate 
to the increased cost of using better technology?  In this analysis, one 
needs to consider both the cost of such a mistake to the person identified

Re: [GENERAL] Primary keys for companies and people

2006-02-02 Thread Merlin Moncure
  I should perhaps be posting this under another subject, but I feel that
  beneath the surface, Michael's problem and my own are strongly related.
 There is also the problem that a name can change.  People change names
 by deed-poll, and also women can adopt a married name or keep their old
 one.  All in all an ID is about the only answer.

I'll take the other side of this issue.  The fact that a primary key
is mutable does not make it any less primary.  As long as we can can
count on it to be unique, how often identiying info changes has no
bearing on its selection as a p-key from a relational standpoint.  SQL
gives us RI to help deal with this but in this is not always practical
if for example you you have a changing p-key that cascades to a
million records.  The performance issue has zero meaning in a
conceptual sense however and I think you are trying to grapple things
in conceptual terms.

By assigning a surrogate key to a person, you are simply moving the
guess work from one place to another.  If you can't logically
determine who 'John Smith' is, how can you possibly expect to relate
information to him? (which john smith? why, etc)...you are just hiding
a guess behind a number.  Put into other words, *a record must have
unique identifiying criteria or the table containing it cannot be
expected to give correct results*.  This is, more or less, a
mathematical truth.  The non key attributes of the tuple are now
undefined because they can give two or more different answers to the
same question.  Surrogates do not change this principle, they just
hide it but it still has to be dealt with.

Merlin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Primary keys for companies and people

2006-02-02 Thread Michael Glaesemann


On Feb 3, 2006, at 7:25 , Merlin Moncure wrote:

There is also the problem that a name can change.  People change  
names
by deed-poll, and also women can adopt a married name or keep  
their old

one.  All in all an ID is about the only answer.


I'll take the other side of this issue.  The fact that a primary key
is mutable does not make it any less primary.  As long as we can can
count on it to be unique, how often identiying info changes has no
bearing on its selection as a p-key from a relational standpoint.




The performance issue has zero meaning in a
conceptual sense however and I think you are trying to grapple things
in conceptual terms.


I definitely agree with you here, Merlin. Mutability is not the issue  
at hand. May I ask what strategies you use for determining uniqueness  
for people?


Michael Glaesemann
grzm myrealbox com


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Primary keys for companies and people

2006-02-02 Thread Mark Dilger

Michael Glaesemann wrote:

Hello, all!

Recently there was quite a bit of discussion regarding surrogate keys  
and natural keys. I'm not interested in discussing the pros and cons  of 
surrogate keys. What I'd like to find out are the different  methods 
people actually use to uniquely identify companies and people  *besides* 
surrogate keys.


I'm currently working on an application that will include contact  
information, so being able to uniquely identify these two entities is  
of interest to me. Right now I'm thinking of uniquely identifying  
companies by telephone number. For example:


create table companies (
company_id integer primary key -- telephone number, not serial
, company_name text not null
);

Of course, the company may have more than one telephone number  
associated with it, so there will also be a table associating  telephone 
numbers and companies.


create table companies__telephone_numbers (
company_id integer not null
references companies (company_id)
on update cascade on delete cascade
, telephone_number integer not null
, unique (company_id, telephone_number)
);

There should also be a trigger that will check that the company_id  
matches an existing telephone number associated with the company,  
something like:


create function assert_company_id_telephone_number_exists
returns trigger
language plpgsql as $$
begin
if exists (
select company_id
from companies
except
select company_id
from companies
join companies__telephone_numbers on (company_id = telephone_number)
)
then raise exception 'company_id must match existing company  telephone 
number';

end if;
return null;
end;
$$;

For people I'm more or less stumped. I can't think of a combination  of 
things that I know I'll be able to get from people that I'll want  to be 
able to add to the database. Starting off we'll have at least  7,000 
individuals in the database, and I don't think that just family  and 
given names are going to be enough. I don't think we'll be able  to get 
telephone numbers for all of them, and definitely aren't going  to be 
getting birthdays for all.


I'm very interested to hear what other use in their applications for  
holding people and companies.


Michael Glaesemann
grzm myrealbox com




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq



Telephone numbers make bad primary keys because they get recycled.  A phone 
number that belongs to me this year may belong to somebody else next year.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] Primary keys and speed

2001-09-09 Thread Rob Brown-Bayliss

Hi, I have not yet seen an answer to the following, can I assume it's
not a problem?

On Thu, 2001-09-06 at 19:58, Rob Brown-Bayliss wrote:
 
 Hello.
 
 I am looking at useing uuid's as primary keys rather than a normal
 sequence of numbers.  
 
 The uuids are long text strings like so:
 
 7559e648-a29c-11d5-952f-00c026a18838
 
 The reason for useing them is that it's almost gaurenteed that 
 imported data from another site is going to have a unique identifier.
 
 But I was wondering if this will impact on the speed of the database.
 
 In the long run the application does not need to be blindingly fast as 99% 
 of the time it is waiting on human interaction. 
 
 Any ideas?
 
-- 

  Rob Brown-Bayliss
 ---==o==---

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [GENERAL] Primary keys and speed

2001-09-09 Thread Joe Conway

  I am looking at useing uuid's as primary keys rather than a normal
  sequence of numbers.
 
  The uuids are long text strings like so:
 
  7559e648-a29c-11d5-952f-00c026a18838
 
  The reason for useing them is that it's almost gaurenteed that
  imported data from another site is going to have a unique identifier.
 
  But I was wondering if this will impact on the speed of the database.
 
  In the long run the application does not need to be blindingly fast as
99%
  of the time it is waiting on human interaction.
 
  Any ideas?
 

I have not done *comparative* studies of using these kind of strings versus
sequence numbers, but I have done a fair amount of this sort of thing in one
of the applications I've been working on. On old hardware (dual ppro 200
with 512MB and RAID5) I got lookups times at around 10 milliseconds, from a
table with 20 million records. The key was a 40 byte hex string. The timing
was from running set show_query_stats = true; and then looking at the tail
of the postgres serverlog.

My best advice is to load some sample data in your schema on your hardware
and give it a try.

Hope this helps,

-- Joe




---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



[GENERAL] Primary keys and speed

2001-09-06 Thread Rob Brown-Bayliss


Hello.

I am looking at useing uuid's as primary keys rather than a normal
sequence of numbers.  

The uuids are long text strings like so:

7559e648-a29c-11d5-952f-00c026a18838

The reason for useing them is that it's almost gaurenteed that 
imported data from another site is going to have a unique identifier.

But I was wondering if this will impact on the speed of the database.

In the long run the application does not need to be blindingly fast as 99% 
of the time it is waiting on human interaction. 

Any ideas?

-- 

  Rob Brown-Bayliss
 ---==o==---

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly