Michael Monnerie wrote: > select count(*),n.headername, headervalue from dbmail_headervalue v > join dbmail_headername n on n.id=v.headername_id > group by headername_id,n.headername,v.headervalue > order by count(*) desc; > > This query took 802 seconds(!):
That's not too bad actually. > Jun 19 13:38:23 db.zmi.at postgres[9364]: [6-1] 2007-06-19 13:38:23 CEST > DB=testdbmail HOST=[local] SESSTRT=2007-06-19 12:57:22 CEST LOG: Dauer: > 802768.318 ms Anweisung: select > Jun 19 13:38:23 db.zmi.at postgres[9364]: [6-2] count(*),n.headername, > headervalue from dbmail_headervalue v join dbmail_headername n on > n.id=v.headername_id group by > Jun 19 13:38:23 db.zmi.at postgres[9364]: [6-3] > headername_id,n.headername,v.headervalue order by count(*) desc; > > (I've cut out some lines): > > 88599 | X-Sieve | CMU > Sieve 2.2 > 70106 | X-Virus-Scanned | > amavisd-new at goelsen.net > 68771 | X-Virus-Scanned | > amavisd-new at zmi.at > 62133 | X-Virus-Scanned | by > amavisd-new at datamatix.at > 49643 | Precedence | > bulk > 24697 | X-Virus-Checked | > Checked by ClamAV on apache.org > 24537 | X-Spam-Check-By | > apache.org > 23075 | X-MSMail-Priority | > Normal > 19532 | X-Priority | 3 > (Normal) > 19261 | Mailing-List | > contact [EMAIL PROTECTED]; run by ezmlm > 19259 | List-Post | > <mailto:[EMAIL PROTECTED]> > 19258 | Delivered-To | > mailing list [EMAIL PROTECTED] > 19258 | list-help | > <mailto:[EMAIL PROTECTED]> > 19258 | List-Id | > <users.spamassassin.apache.org> > 19258 | list-unsubscribe | > <mailto:[EMAIL PROTECTED]> > 19110 | Delivered-To | > [EMAIL PROTECTED] > 19110 | Sender | > [EMAIL PROTECTED] > 4874 | List-Archive | > <http://sourceforge.net/mailarchive/forum.php?forum=egroupware-german> > 4873 | List-Unsubscribe | > <https://lists.sourceforge.net/lists/listinfo/egroupware-german>, > <mailto:[EMAIL PROTECTED] > 4873 | List-Subscribe | > <https://lists.sourceforge.net/lists/listinfo/egroupware-german>, > <mailto:[EMAIL PROTECTED] > 2473 | List-Subscribe | > <http://www.links2linux.de/cgi-bin/mailman/listinfo/packman>, > <mailto:[EMAIL PROTECTED] > 2466 | List-Unsubscribe | > <http://www.links2linux.de/cgi-bin/mailman/listinfo/packman>, > <mailto:[EMAIL PROTECTED] > 2456 | List-Unsubscribe | > <http://links2linux.de/cgi-bin/mailman/listinfo/packman-adm>, > <mailto:[EMAIL PROTECTED] > 2456 | List-Subscribe | > <http://links2linux.de/cgi-bin/mailman/listinfo/packman-adm>, > <mailto:[EMAIL PROTECTED] > > You can see that especially for mailing lists, > compressing headers can be *very* space saving (those > lines are not even finished on the right side..). > Plus, I've got 160.000 headervalues which are empty strings. Yuk. I think you've proven our point. > > select count(*) from dbmail_messages; > 121212 > select count(*) from dbmail_physmessage; > 156133 > select count(*) from dbmail_headername; > 2386 > select count(*) from dbmail_headervalue; > 4551894 > > So 4,5 million values for 121k messages with only 2400 > header names. Amazeing. > BTW, Is it OK to have more physmessages than messages? No. It's a known bug there are some checks missing from the maintenance code. > >> create table dbmail_headervalue ( >> id char(64) not null primary key, -- sha1 digest of >> headervalue varchar(255) not null default '', > > Can't the headervalue be much longer than 255? Currently we have > headervalue text DEFAULT ''::text NOT NULL Ah, yes. My tables have remnants of the initial setup done in 2.1.5. > > Received: from mailsrv.zmi.at (mailsrv1.zmi.at [212.69.162.198]) > (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) > (Client CN "mailsrv.zmi.at", Issuer "power4u.zmi.at" (not verified)) > by power2u.goelsen.net (Postfix) with ESMTP id 0E7F677A43 > for <[EMAIL PROTECTED]>; Tue, 19 Jun 2007 09:49:21 +0200 (CEST) > > I did a quick look, they are stored in one entry, so probably this field > should be > varchar(30000) or something. Anybody got the correct RFC for reference? No. TEXT is just fine. > > And a > select length(headervalue) from dbmail_headervalue order by > length(headervalue) desc; > shows there are a lot of longer entries: > > length > -------- > 14890 > 9660 > 7985 > 7985 > 7768 > 7768 > 7625 > 7596 > 7596 > 6791 > 6595 > [snip] That's totally absurd. I seriously doubt headervalues that long are valid. They may even indicate problems with the mime-parser. -- ________________________________________________________________ Paul Stevens paul at nfg.nl NET FACILITIES GROUP GPG/PGP: 1024D/11F8CD31 The Netherlands________________________________http://www.nfg.nl _______________________________________________ DBmail mailing list DBmail@dbmail.org https://mailman.fastxs.nl/mailman/listinfo/dbmail