Kasten,

> I have until now used surrogate primary keys on all table like
> so:

<snip>

> Short question: Is this OK re your concerns for using
> surrogates, eg. using a surrogate but making sure that at any
> one time there *would* be a real primary key candidate ?

Yes, this is an example of that.  You have a long text field as part of the 
key, and that would kill you performance-wise if diagnosis was referred to in 
other tables and joined in views.   

Keep in mind, though, that if a table is an "end node"; that is, if its PK is 
not used as an FK by any other table, then worries about the performance of 
keys and size of indexes are unfounded.   In fact, for such tables, the 
surrogate key is a performance drag; it adds a column and an index which are 
not needed.

Now, addressing your table, I would have concerns other than the use of 
primary keys.    I suggest humbly that your data model/business logic may 
need some development:

create table diagnosis (
    pk serial primary key,
    fk_patient integer
        not null
        references patient(pk)
        on update cascade
        on delete cascade,
    narrative text
        not null,
    unique(fk_patient, narrative)
);

This was obviously created so that a patient could have multiple diagnoses.   
However, there is no information in the table to indicate *why* there are 
multiple diagnoses.   And you are using a real key based on a long text 
field; always hazardous, as there are many ways to phrase the same 
information and duplication is likely.   To do it in english, your postulates 
look like:

PATIENT 67 was given a diagnosis of WATER ON THE KNEE.
PATIENT 456 was given a diagnosis of ACUTE HYPOCHONDRIA.

But this is a bit sketchy.   Who made these diagnoses?   When did they make 
them?  Why?  This table could carry a *lot* more information, and should (sql 
is shorthand)

create table diagnosis (
    pk serial primary key,
    fk_patient integer  references patient(pk),
    fk_visit integer references visits(pk),
    fk_complaint integer references complaints(pk)
    fk_staff integer references medical_staff(pk)
    narrative text,
    unique(fk_patient, fk_visit, fk_complaint, fk_staff)
);

Then your postulates become *much* more informative:

PATIENT 67 was given a diagnosis by STAFF MEMBER 12 on his VISIT #3
    in response to NOT BEING ABLE TO WALK of WATER ON THE KNEE
PATIENT 456 was given a diagnosis by STAFF MEMBER 19 on his VISIT #192
    in response to THE CREEPY-CRAWLIES of ACUTE HYPOCHONDRIA

It also allows you to establish a much more useful key; it's reasonable to 
expect that a single staff member on one visit in response to one complaint 
would only give one diagnosis.   Otherwise, you have more than database 
problems.  And it prevents you from having to rely on a flaky long text key.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faqs/FAQ.html

Reply via email to