Finally I come in the discussion :)
I'm all for optimizing headers, BUT:
I'd like to optimize for IMAP. Optimizing for POP3 is no issue, since
POP just downloads the whole message.
I don't really like to optimize for native clients. DBMail's IMAP should
be fast enough to work with normal IMAP clients (webbased or running on
clients). One way of optimizing is to implement IMAP features extensions
that'll help getting faster, like SORT and THREAD.
Currently, I'm not busy with this at all, but if anybody wants to do
some statistical research to find out what headers different clients
FETCH using IMAP, we can see which headers can cache.
By the way, I guess that having one table with cached headers like Aaron
proposed would be the best. I think we should only cache headers that
are used frequently and/or that are always requested *together* by
clients (it's no use if for a certain client program, we get all but one
headers from the cache, and get the last header from parsing the message.
Ilja
Magnus Sundberg wrote:
Well,
I beleive a flag on the messageblock would be a good start, since it is
not that intrusive.
But I am very unsusre about how much performance we would gain by
searching directly in heavily optimized fast-header tables. Is the pain
worth the gain?
Small installations do not need the extra performance.
Large installations can probably get away with more expensive hardware
Huge installations may run out of steam on the best availible hardware,
here the fast-headers might be of some use.
My beleif is that we should go for the easiest implementation, that is
the least intrusive and get a feeling for what we need.
I know that MySQL supports queries with regular expressions. These
queries will probably be quite fast.
I think we should se what happens when we implement server side sorting,
based on clever SQL statements using the messageblocks.
The huge question is: How much do we gain in performance by using
special header tables compared to searching the messageblocks? The gain
might be quite small.
well, that my $0.02
Magnus
Aaron Stone wrote:
I'm thinking in terms of searches primarily, where a query might be:
select distinct(message_id) from fastheaders
where header = 'to' and contents like '%bob%'
You're looking for displaying the message, where the webmail application
probably doesn't want to read in the header blocks of the message and
parse
them just to show the Date, From, Subject, To, CC, Bcc, Fcc, Reply-To
fields.
Originally I suggested using a table that has the header fields as the
columns, thinking that those were the only commonly requested ones,
but it was
either Ilja or Roel at the time who watched for a couple of IMAP
queries and
saw all kinds of headers being requested both for searching and for
listing
messages. So the configurable columns thing means that the admin needs
to be
watching to see what fields his clients are requesting, and to add
those, and
it means quite a bit more complexity in DBMail, which would need to
either
find the header in the header cache table or parse the full headers.
For your webmail application, since you know exactly which header
fields you
want and you're writing your own queries, you can just do this:
select header, contents from fastheaders
where message_id = 82 and (header = 'To' or header = 'From' or ...)
My instinct is that this would still be reasonably fast, completely
flexible
and generic. Even if the query took slightly longer, the much smaller
result
set size and the negligible parsing needed would offset the query speed.
Aaron
""Mark Mackay - Orcon"" <[EMAIL PROTECTED]> said:
Surely for webmail applications, etc you're after a join where you
get the
headers as columns for displaying a message list; to avoid having nested
queries.
E.g. avoid "select * from messages"; then for each message "select *
from
headers where messageid=X"
I know the goal is to have the headers table really flexible so that
additional headers can be added, etc -- but maybe this could be done
differently. A run-time config file where you mapped the column
names/numbers to header names, etc.
That way if a developer wanted to add a new column and start caching
a new
header they could alter the table and simply update the config files,
one-by-one. Or disable the headercache altogether.
I could be missing something here -- is there a query which can turn a
one-to-many join into a single line?
/Mark
-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Aaron Stone
Sent: Thursday, 18 March 2004 6:30 p.m.
To: DBMAIL Developers Mailinglist
Subject: Re: [Dbmail-dev] NNTP
Seems terribly complicated, and it would make the messages very
difficult to
dump by hand. I'm also not sure why you want to have two tables; is
that what
you're proposing:
header_labels (index on label)
id label
--------------
1 to
2 from
3 subject
message_headers (index on messageid, labelid, header and labelid,
header)
-----------------------------------------------------
id message_id label_id header
1 1 1 [EMAIL PROTECTED]
2 1 2 [EMAIL PROTECTED]
3 1 3 Hey Bob, it's Joe!
I think that unless your database has miraculously good JOINs, this
is a
nightmare; note that you cannot reassemble by returning rows in
ascending
order of the 'id' column, another column would be needed to keep the
order of
the headers. This is probably the most compact was to store the
headers,
though, but I think it is at the cost of being severely slow and
obtuse.
My proposed header looks like this:
fastheaders (index on messageid, header, contents and header,
contents)
id message_id header contents
1 1 to [EMAIL PROTECTED]
2 1 from [EMAIL PROTECTED]
Suffers from using a lot more space, but would have faster search
times.
Administrators would be able to more easily query the table by hand.
As it is
intended to only be a cache, one could freely zap this table and
rebuild it
from the original headers. Naturally, I'm strongly advocating my
idea ;-)
Aaron
""Ed K."" <[EMAIL PROTECTED]> said:
[snip]
What if a solution was proposed in which the order and the
formatting of
the headers could be preserved. A documented method to
deconstruct and
construct the headers, and working code to be included in
db.c. Then a flag
in the messageblks row that would indicate if the header is
either only,
also, or not in the fastheaders table. i suggest we call
the table pair
message_headers and header_labels.
ed
Security on the internet is impossible without strong, open,
and unhindered encryption.
_______________________________________________
Dbmail-dev mailing list
[email protected]
http://twister.fastxs.net/mailman/listinfo/dbmail-dev
--
_______________________________________________
Dbmail-dev mailing list
[email protected]
http://twister.fastxs.net/mailman/listinfo/dbmail-dev
_______________________________________________
Dbmail-dev mailing list
[email protected]
http://twister.fastxs.net/mailman/listinfo/dbmail-dev
_______________________________________________
Dbmail-dev mailing list
[email protected]
http://twister.fastxs.net/mailman/listinfo/dbmail-dev