> >>>CREATE TABLE headers ( > >>> id SERIAL PRIMARY KEY, > >>> name TEXT NOT NULL > >>>); > >>>CREATE UNIQUE INDEX headers_name_udx ON headers(name); > >>> > >>>CREATE TABLE header_msg_map ( > >>> id SERIAL8 PRIMARY KEY, > >>> physmessage_id INT NOT NULL, > >>> header_id INT NOT NULL, > >>> value TEXT NOT NULL, > >>> hdr_order INT2 NOT NULL, > >>> FOREIGN KEY (header_id) REFERENCES headers(id) > >>>); > >>>CREATE INDEX header_msg_map_idx ON header_msg_map(physmessage_id, > >>>header_id);
> Also, remember that you add several bytes to the database by adding > the id in both tables (8 bytes each) plus the separate indexes on > both of id. (Btw, I ussume you meant SERIAL8 in both tables). Hmm, I see I was wrong on the SERIAL8. You use id(4) and header_id(4) for the join. That halves the extra size of course. > So unless you can save _on_average_ atleast > (16 bytes + index + table overhead) then it's actually going to increase > database size. And how many header names are 16+ bytes? Not many.. Actually it'd only be 4bytes (double think'o). Doing some calculations: (no numbers for indexes, excluding header_msg_map.id addition and hdr_order and disregarding compression and table overheads) Lets say we have 100 mails like a random I picked in my mailbox now, each has: recieved recieved recieved from to subject date organization message-id mime-version content-type content-transfer-encoding x-priority x-msmail-priority x-mailer x-mimeole importance xrbl-warning x-note x-spam-tests-failed x-rcpt-to -- For 1 message: With one table: 212 bytes for name +index on (physmessage_id,name) With two tables: 196 bytes for headers.name 76 bytes for headers.id +index on (name,id) 4 bytes for header_id +index on (physmessage_id,header_id) So, for one message this is a clear loss. -- For 100 messages: With one table: 21200 bytes for name +index on (physmessage_id,name) With two tables: 196 bytes for headers.name 76 bytes for headers.id +index on (name,id) 400 bytes +index on (physmessage_id,header_id) So this is a very clear gain. This makes a clear case against my earlier suggestion and in favor of yours. So lets go with yours. But I'd remove the id in header_msg_map as this adds a whole lot to each header and is of no real use as far as I can see. There would likely be no need to ever update or delete a single header entry. Deletion would be done referencing physmessage_id instead. This further reduces the database size. hdr_order is a nice addition although I don't think it's of much use unless we remove the raw header, and that would kill backwards-compatability for sure. If you add the indexes I mentioned in the example above and remove the id I think we have a winner. Sean: Sorry for mis-reading and questioning your work. -Dead2 who really needs to stop using strong painkillers
