Michael Monnerie wrote:

> select count(*),n.headername, headervalue from dbmail_headervalue v 
>  join dbmail_headername n on n.id=v.headername_id 
>  group by headername_id,n.headername,v.headervalue 
>  order by count(*) desc;
> 
> This query took 802 seconds(!):

That's not too bad actually.

> Jun 19 13:38:23 db.zmi.at postgres[9364]: [6-1] 2007-06-19 13:38:23 CEST 
> DB=testdbmail HOST=[local] SESSTRT=2007-06-19 12:57:22 CEST LOG:  Dauer: 
> 802768.318 ms  Anweisung: select
> Jun 19 13:38:23 db.zmi.at postgres[9364]: [6-2]  count(*),n.headername, 
> headervalue from dbmail_headervalue v join dbmail_headername n on 
> n.id=v.headername_id group by
> Jun 19 13:38:23 db.zmi.at postgres[9364]: [6-3]  
> headername_id,n.headername,v.headervalue order by count(*) desc;
> 
> (I've cut out some lines):
> 
> 88599 | X-Sieve                                                         | CMU 
> Sieve 2.2
>  70106 | X-Virus-Scanned                                                 | 
> amavisd-new at goelsen.net
>  68771 | X-Virus-Scanned                                                 | 
> amavisd-new at zmi.at
>  62133 | X-Virus-Scanned                                                 | by 
> amavisd-new at datamatix.at
>  49643 | Precedence                                                      | 
> bulk
>  24697 | X-Virus-Checked                                                 | 
> Checked by ClamAV on apache.org
>  24537 | X-Spam-Check-By                                                 | 
> apache.org
>  23075 | X-MSMail-Priority                                               | 
> Normal
>  19532 | X-Priority                                                      | 3 
> (Normal)
>  19261 | Mailing-List                                                    | 
> contact [EMAIL PROTECTED]; run by ezmlm
>  19259 | List-Post                                                       | 
> <mailto:[EMAIL PROTECTED]>
>  19258 | Delivered-To                                                    | 
> mailing list [EMAIL PROTECTED]
>  19258 | list-help                                                       | 
> <mailto:[EMAIL PROTECTED]>
>  19258 | List-Id                                                         | 
> <users.spamassassin.apache.org>
>  19258 | list-unsubscribe                                                | 
> <mailto:[EMAIL PROTECTED]>
>  19110 | Delivered-To                                                    | 
> [EMAIL PROTECTED]
>  19110 | Sender                                                          | 
> [EMAIL PROTECTED]
>   4874 | List-Archive                                                    | 
> <http://sourceforge.net/mailarchive/forum.php?forum=egroupware-german>
>   4873 | List-Unsubscribe                                                | 
> <https://lists.sourceforge.net/lists/listinfo/egroupware-german>,    
> <mailto:[EMAIL PROTECTED]
>   4873 | List-Subscribe                                                  | 
> <https://lists.sourceforge.net/lists/listinfo/egroupware-german>,    
> <mailto:[EMAIL PROTECTED]
>   2473 | List-Subscribe                                                  | 
> <http://www.links2linux.de/cgi-bin/mailman/listinfo/packman>,        
> <mailto:[EMAIL PROTECTED]
>   2466 | List-Unsubscribe                                                | 
> <http://www.links2linux.de/cgi-bin/mailman/listinfo/packman>,        
> <mailto:[EMAIL PROTECTED]
>  2456 | List-Unsubscribe                                                | 
> <http://links2linux.de/cgi-bin/mailman/listinfo/packman-adm>,        
> <mailto:[EMAIL PROTECTED]
>   2456 | List-Subscribe                                                  | 
> <http://links2linux.de/cgi-bin/mailman/listinfo/packman-adm>,        
> <mailto:[EMAIL PROTECTED]
> 
> You can see that especially for mailing lists,
> compressing headers can be *very* space saving (those
> lines are not even finished on the right side..).
> Plus, I've got 160.000 headervalues which are empty strings.

Yuk. I think you've proven our point.

> 
> select count(*) from dbmail_messages;
>  121212
> select count(*) from dbmail_physmessage;
>  156133
> select count(*) from dbmail_headername;
>   2386
> select count(*) from dbmail_headervalue;
>  4551894
> 
> So 4,5 million values for 121k messages with only 2400 
> header names. Amazeing.
> BTW, Is it OK to have more physmessages than messages?

No. It's a known bug there are some checks missing from the maintenance code.

> 
>> create table dbmail_headervalue (
>>      id char(64) not null primary key, -- sha1 digest of
>>      headervalue varchar(255) not null default '',
> 
> Can't the headervalue be much longer than 255? Currently we have
> headervalue text DEFAULT ''::text NOT NULL

Ah, yes. My tables have remnants of the initial setup done in 2.1.5.

> 
> Received: from mailsrv.zmi.at (mailsrv1.zmi.at [212.69.162.198])
>       (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
>       (Client CN "mailsrv.zmi.at", Issuer "power4u.zmi.at" (not verified))
>       by power2u.goelsen.net (Postfix) with ESMTP id 0E7F677A43
>       for <[EMAIL PROTECTED]>; Tue, 19 Jun 2007 09:49:21 +0200 (CEST)
> 
> I did a quick look, they are stored in one entry, so probably this field 
> should be
> varchar(30000) or something. Anybody got the correct RFC for reference?

No. TEXT is just fine.

> 
> And a
> select length(headervalue) from dbmail_headervalue order by 
> length(headervalue) desc;
> shows there are a lot of longer entries:
> 
>  length
> --------
>   14890
>    9660
>    7985
>    7985
>    7768
>    7768
>    7625
>    7596
>    7596
>    6791
>    6595
> [snip]

That's totally absurd. I seriously doubt headervalues that long are valid. They
may even indicate problems with the mime-parser.
-- 
  ________________________________________________________________
  Paul Stevens                                      paul at nfg.nl
  NET FACILITIES GROUP                     GPG/PGP: 1024D/11F8CD31
  The Netherlands________________________________http://www.nfg.nl
_______________________________________________
DBmail mailing list
DBmail@dbmail.org
https://mailman.fastxs.nl/mailman/listinfo/dbmail

Reply via email to