Everything should have an id, and combination tables like person-gene should 
have the person id + the gene id.
You might look at this tool to make your life easier:
https://github.com/pgmodeler/pgmodeler

First, read up on this:
http://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model
then, it will be obvious how to use the tool.

-----Original Message-----
From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Modulok
Sent: Friday, March 29, 2013 7:39 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] data modeling genes and alleles... help!

List,

I have a data modeling problem. That much, I know. The question is how do I 
model this? (Below.)

I'm making a database which will store pseudo-genetic data. It's a basic 
many-to-many setup::

    create table person(
        name varchar(32) primary key
    );
    create table gene(
        name varchar(32) primary key
    );
    create table person_gene(
        person varchar(32) references person(name),
        gene varchar(32) references gene(name)
    );

And I have data like::

    insert into person(name)
    values
        ('foo')
    ;
    insert into gene(name)
    values
        ('hair'),
        ('eye')
    ;
    insert into person_gene(person, gene)
    values
        ('foo', 'hair'),
        ('foo', 'eye')
    ;

Great. This is important as I need to be able to ask questions like "who 
carries gene 'x'?" as well as "what genes does person 'y' carry?" But then 
things get thorny...

I also need to store the properties of the individual genes (the alleles). This 
is akin to an instance of one of the many gene classes in my application code.
So I make more tables::

    create table hair(
        id serial primary key,
        density float,
        thickness float
    );
    create table eye(
        id serial primary key,
        pupil_type int
    );

How do I store a reference to this data? I'd add a column to the person_gene 
table, but it points to what? I can't reference a column name because they're 
all stored in different tables. I also can't store them in the same table, as 
they all store different data. Do I store the *table name* itself in a column 
of the gene_table? (Smells like a klude.)

A person might not carry all genes. The number of genes in existence is not 
fixed. New ones are introduced infrequently. There may be genes that no one 
carries. (I assume I just make a new table each time a new gene is introduced?)

I thought about just pickling/marshaling the instances of my various gene 
classes and just having a single 'genes' table which has a blob column but I 
hesitate to do that because I want to be able to do queries on aggregate allele 
stats. Things like "how many persons have pupil type 1?", etc.

It's late and I've probably over complicated it. Any pointers or advice on how 
to model this would be greatly appreciated.

Cheers!
-Modulok-


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make 
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to