Re: [GENERAL] Schema design question

2008-03-31 Thread Ben

On Tue, 1 Apr 2008, Andrej Ricnik-Bay wrote:


On 29/03/2008, Ben <[EMAIL PROTECTED]> wrote:

I'm working on a project which requires me to keep track of objects,
 each of which can have an arbitrary number of attributes. Although
 there will be many attributes that an object can have, the data types
 of those attributes won't be all that varried (int, float, text,
 boolean, date, etc.).


And a somewhat unorthodox suggestion for the list ... would it
be worthwhile considering a different storage mechanism all
together, like maybe an LDAP directory type of thing?  The query
language is admittedly very limited.


Hm worthwhile for some, perhaps, but not for this project. This is 
only a small part of a much larger whole, most of which fits quite well 
into SQL. Thanks though! I think I'm leaning towards the 
one-table-with-many-columns approach, as it seems to require the simpliest 
set of constraints while still giving me type-safety.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Schema design question

2008-03-31 Thread Andrej Ricnik-Bay
On 29/03/2008, Ben <[EMAIL PROTECTED]> wrote:
> I'm working on a project which requires me to keep track of objects,
>  each of which can have an arbitrary number of attributes. Although
>  there will be many attributes that an object can have, the data types
>  of those attributes won't be all that varried (int, float, text,
>  boolean, date, etc.).

And a somewhat unorthodox suggestion for the list ... would it
be worthwhile considering a different storage mechanism all
together, like maybe an LDAP directory type of thing?  The query
language is admittedly very limited.

Cheers,
Andrej

-- 
Please don't top post, and don't use HTML e-Mail :}  Make your quotes concise.

http://www.american.edu/econ/notes/htmlmail.htm

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Schema design question

2008-03-31 Thread Ron Mayer

Ben wrote:
I'm working on a project which requires me to keep track of objects, 
each of which can have an arbitrary number of attributes. Although there 
will be many attributes that an object can have,...
Anyway, this seems like a common problem without a perfect solution, and 
I'm sure people must have hindsight opinions on how they solved it. Your 
thoughts?


If each attribute can only occur once for any object, and if your
queries are based more on exact matches of the values rather than
range queries, you might want to look into the hstore module:
http://www.postgresql.org/docs/current/static/hstore.html

If attributes can occur more than once, you might even want to
serialize the whole object as some text format (yaml, xml, etc)
instead.





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Schema design question

2008-03-28 Thread Craig Ringer
Ben wrote:

> create table attrs (id serial primary key, name text);
> create table obj (id serial primary key, name text);
> create table att (oid int references obj.id, aid int references attrs.id,
> value_int int, value_float float, value_text text, value_bool bool,
> value_date date);

I think I saw mention here of the DB using a bitmap in the tuple header
to avoid storing NULL fields.

If that's the case (don't trust my word on it), then combined with a
CHECK constraint that ensures that at most one of your typed fields may
be not null, this option might at least prove to be the most efficient.
However, it won't be fun to query.

Storing them all as text won't be much fun to query, which I'd consider
another argument for the many-types tuple. '2' > '11' =  't',  '002' <>
'2', etc.

--
Craig Ringer

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] schema design question

2007-02-25 Thread Didier Gasser-Morlay
On Sat, 24 Feb 2007 00:59:02 -0800
[EMAIL PROTECTED] (snacktime) wrote:

> Say you have 8 different data models that are related enough to share
> roughly 70% of the same fields, but the shared fields are not always
> the same.  And also within any given model, some fields can be empty.
> The business logic is that data is pulled from all the data models and
> put into a common format that has all the combined fields, and sent
> over the wire as a transaction.
> 

> 
> Would most of you create a separate tabel for each transaction type?

I would personnally have one single table which gives more opportunity
to add some processing types in the future rather than to have to
revisit all scripts, routines etc to add a new table.

you must however be careful about possible contention on your table:
PK should be an integer and table should be clustered: you always
append at the end of the table and you can select from that table for
past transactions. 

> I'm also curious how others would handle the batch closings.  In the
> past I have created separate tables for open transactions and
> transactions that have been captured/settled.  When a transaction is
> captured it's moved to a different table instead of just having a
> column to mark it as captured.  Normally I would select all the
> transactions to capture, insert them into the captured table, delete
> them from the open transactions table, process the batch, and if the
> batch goes through commit everything.  That narrows down the number of
> things that can go wrong after you have submitted the batch.  The
> alternative would be to just have a column to mark transactions as
> capture and leave them all in one table.  I've always been paranoid
> about doing that because it leaves open the possibility of capturing
> thousands of transactions twice if you have a bug, as opposed to a few
> hundred at most.

I would use a marker field. Moviong all these transactions around seems
like an awful lot of data to move. most certainly if you have a
rollback to do. 
IMHO the more data you move around the more fragmented your DB
becomes and the more you have to look after it. (Note I am not talking
about PostGresSQL, I am too much of a beginner for that, but that's my
experience with Sybase, Firebird and Oracle)  

You probably have several safeguards: your transactions must share a
common field like a batch number you could have a serate table holding
a status for each batch ( processing, processed, rejected etc...),
significant totals of the batch, totals processed,  a timestamp for each
stage etc 

This also gives the opportunity to put in place a simple monitoring
system (always nice to give your users an insight about what happens
in the system and when)

My personnal rule of thumb is that even if disk space is cheap,
bandwidth and I/O are still at a premium. All my collegues not
really playing by this rule have always produced systems which end up
running like dead cows over time.

The you can create a sweeping mechanism that offloads once in a while
processed data to an historical table used for MI, reporting etc... it
all depends on what you intend to do with the data. 
 
> I spent quite a few years working at payment gateways and am now
> creating an open source platform that does the same thing that your
> normal payment gateway does.  It's been a while since I've had the
> chance to look at this problem in a fresh light.  Most of the
> processing code at payment gateways is left pretty much untouched once
> it's working, it's not something you go in and refactor every few
> months even if it's not perfect.

Very interesting project :)

> 
> Would appreciate any feedback.
> 
> Chris
> 
> ---(end of
> broadcast)--- TIP 2: Don't 'kill -9' the
> postmaster
> 

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Schema design question

2005-08-17 Thread Matt Miller
On Wed, 2005-08-17 at 13:40 -0700, Bill Moseley wrote:
> a course
> and class share so many columns.  ...and
> I worry about changing a column type on one table and forgetting to
> change it on the other table.

Postgres types might help here.  You could probably create a type that
contains the common columns, and then embed that type where you need it.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Schema design question

2005-08-17 Thread Bill Moseley
On Wed, Aug 17, 2005 at 07:41:20PM +, Matt Miller wrote:

Thanks for responding, Matt:


> create table course (id serial primary key,
>  description varchar);
> create table teacher (id serial primary key,
>   name varchar);
> create table course_teacher (course_id integer not null,
>  teacher_id integer not null);
[...]
> create table class (id serial primary key,
> course_id integer not null,
> teacher_id integer not null,
> starts_on date,
> location varchar);

There may be more than one teacher in the class so instead I'd need
another "class_teacher" link table.

I guess what "bugged" me about this type of layout is that a course
and class share so many columns.  Duplication just looks wrong -- and
I worry about changing a column type on one table and forgetting to
change it on the other table.  Also have to remember to copy all
columns every time a specific class is created.

On the other hand, if I used a single table to represent both types of
entities, then selects are always going to have something like WHERE
type = 'course' added onto the WHERE.  That's extra processing for no
good reason.

> I'm sure there are many ways to get there.  To me, the way I've
> described is the most-direct way to represent the relationships you've
> described.

And thanks very much for you help.

-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] Schema design question

2005-08-17 Thread Sebastian Hennebrueder
Bill Moseley schrieb:

>On Wed, Aug 17, 2005 at 10:05:39PM +0200, Sebastian Hennebrueder wrote:
>  
>
>>> Con: Column duplication in the two tables -- two tables look a lot alike
>>>  Need to have duplicate link tables (one pointing to each table)
>>> 
>>>
>>>  
>>>
>>They are not duplicated. As you say later in your explanation the course
>>settings may change. So the data may be the same right at the beginning
>>but can differ by the time. => It is not the same data!!
>>
>>
>
>I meant that I would have two tables that look like they hold very
>similar data.  That's not really a problem -- just bugs me to see
>duplication.  Plus, it could introduce errors if I ever modified, say,
>a columns type in one table and not the matching column in the other
>table.
>
>Any opinions on which table layout you would use?
>
>Thanks,
>
>  
>
It is still not the same data. When my name is Sebastian Hennebrueder
and your name is Sebastian Hennebrueder, than we are not the same person.
Class and Course is not the same, they only have accidentally the same
data right at the beginning.
And as they have a reference to each other you should put them in two
separate tables, so that you can create a foreign key relation to
enforce the relation.

I do not now a good online tutorial for database normalisation but just
try Google or ask here.

-- 
Best Regards / Viele Grüße

Sebastian Hennebrueder



http://www.laliluna.de

Tutorials for JSP, JavaServer Faces, Struts, Hibernate and EJB 

Get support, education and consulting for these technologies - uncomplicated 
and cheap.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [GENERAL] Schema design question

2005-08-17 Thread Matt Miller
On Wed, 2005-08-17 at 10:49 -0700, Bill Moseley wrote:
> The parent object is a general course description, and the
> child object is a specific instance of a course
> ...
> tables that represent classes taught at a
> school.  The parent object is a general course ... the
> child object is ... a "class" -- which
> is a course taught at a given time and location.  A course can be
> taught multiple times ... A course (and thus a class) can have
> multiple instructors
> 
> How would you layout the tables for somethings like this?

create table course (id serial primary key,
 description varchar);
create table teacher (id serial primary key,
  name varchar);
create table course_teacher (course_id integer not null,
 teacher_id integer not null);
alter table course_teacher add primary key (course_id, teacher_id);
alter table course_teacher add foreign key (course_id)
   references course
   deferrable initially deferred;
create index course_teacher_teacher_ix on course_teacher (teacher_id);
alter table course_teacher add foreign key (teacher_id)
   references teacher
   deferrable initially deferred;
create table class (id serial primary key,
course_id integer not null,
teacher_id integer not null,
starts_on date,
location varchar);
create index class_course_ix on class (course_id);
alter table class add foreign key (course_id)
  references course
  deferrable initially deferred;
create index class_teacher_ix on class (teacher_id);
alter table class add foreign key (teacher_id)
  references teacher
  deferrable initially deferred;

> A class
> normally uses the course's default instructors, but may be different
> for specific classes instance.

When a class is created the user first specifies course_id.  At that
point the app can look at course_teacher and offer the list of default
teachers.  In case a non-default teacher is desired the app also offers
a lookup into teacher to see all available teachers.  The teacher_id
column of class is thus populated.  Set the "start_on" date and the
"location," and you're done.

> I can think (out loud) of three ways to set this up:

I'm sure there are many ways to get there.  To me, the way I've
described is the most-direct way to represent the relationships you've
described.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly