Right. An important aspect of database design is knowing when to use a
relational system and when not. Sure it looks super elegant to use this
nice one-to-many message to headers arrangement... but then take a look
at why you're doing it: to be able to quickly retrieve the headers most
needed for searches and for quick header requests. These are the headers
which are accessed **many* orders of magnitude* more often than the rest.

If you have a finite number of insanely frequently accessed data, it only
makes sense to keep that data in-row in specifically named columns.

If you have a potentially infinite number of infrequently accessed data,
it makes no sense to name the columns individually (not that anybody's
suggested it) but clearly it should be stored in key/value pairs as you
have suggested. There's just one issue: the data doesn't come at us in
parsed key/value pairs. It comes as a big block of text, and will later
be returned to the user in that exact same big block of text. Why store
it any differently!?


My vision here is a "fastheaders" table which has a message number and
half a dozen columns defined. On some regular basis, or upon message
insertion, the messageblks table is scanned and the first entry of each
new message number, which is the header, is parsed for these fast headers
and they are stored for future searches and quick/short header listings.

Aaron


On Fri, 4 Jul 2003, lou wrote:

> In some email I received from "Matthew T. O'Connor" <matthew@zeut.net> on 
> Fri, 4 Jul 2003
> 03:15:34 -0400, wrote:
>
> > From: "Aaron Stone" <[EMAIL PROTECTED]>
> > > The reason for a new headers table is not so that the headers are taken
> > > out of the messageblks table, but rather because we want some of those
> > > headers to be pulled out and stored in columns of their own.
> > >
> > > The new table would have just a messageid and a few columns named after
> > > the most often used headers. I would expect these all to be indexed, btw.
> > > The actual header itself *should* be stored verbatim and intact as a
> > > chunk of the message, but a "fastheaders" table that holds From, To,
> > > Subject, Date and few others will dramatically speed up many simple IMAP
> > > listing operations as well as your most common searches.
> >
> > Again, I don't think we should have columns specific to particular headers,
> > I think it would be better to have a more generic structure for the
> > message_headers table like:
> > id serial primary key
> > header text (possibly an int referencing a lookup table of headers)
> > header_value text (contains the value of the header)
>
> the database wont get bulky?
> let say avarage 15 headers per message (my case) and 10000 messages.
> 15x10000 and here we go we have 150 000 records + same size doubled indeces ..
> imagine more. obviously it's not a problem of a query to access message_blks 
> and
> header_blks at the same time when a message have to be retrieved, at the same 
> time
> imagine the complexity of this query :)
> Also too many indces aint good..
>
> this structure is _so_ granulated.
>
> It looks nice,
> Lets discuss more things and actually get them researched.
> the optimal solution should be somewhere between:
> performance + genericness + granulation, also dont think how to optimize 
> dbmail
> but the database too, this game has been played by two players.
>
> Anyone else?
>
> > Index both of these columns and it will be quite fast.  By default we would
> > probably only add typical columns such as To, From, Subject, Date etc... but
> > this allows things to be added as needed.  If a particular site / user /
> > mailbox / folder needs to search on a non default header (size for example)
> > then the server can be configured to track this information without needing
> > to change database structure.
>
> header_common
> header_more
>
> but the most commonly used headers (by dbmail) in common and all the others 
> inside
> header_more, less records less indeces.
>
>
> cheers
> -
> _______________________________________________
> Dbmail mailing list
> Dbmail@dbmail.org
> https://mailman.fastxs.nl/mailman/listinfo/dbmail
>

Reply via email to