On 4/26/06, Paolo Cravero as2594 <[EMAIL PROTECTED]> wrote:
> Should be worth investigating whether keeping message bodies on
> filesystem allows for faster purges. Using the new "quar_loc" field a
> web quarantine management system (WQMS? :-) ) should be able to display
> the file, right?
>
On 4/26/06, Paolo Cravero as2594 <[EMAIL PROTECTED]> wrote:
> Should be worth investigating whether keeping message bodies on
> filesystem allows for faster purges. Using the new "quar_loc" field a
> web quarantine management system (WQMS? :-) ) should be able to display
> the file, right?
>

I think this would be fairly difficult to implement. How would a WQMS
access the file in order to display it? The quarantine management
interface may not necessarily may not have direct access to the file
system. Also consider an environment with several hosts quarantining
to their own local disks. The only usable way I see this happening is
have all amavisd hosts quarantining to an NFS share, where the
quarantine management system has access to as well. But even then it
is a lot of trouble.

Also the cleanup script which deletes the SQL logging records would
first have to do a full sequence scan of the 'msgs' table (using the
'time_num' column as a filter i suppose) and store the list of
filenames stored in the 'quar_loc' column to a temporary file, delete
the SQL records, then use the temporary file as a list of files to be
deleted off the filesystem.

> >> We have a 50GB (fifty GigaB) quarantine table on the production server,
> >> and the last quarantine purge run took 14 hours to complete. All traffic
> >> was queued because of DB locks and timeouts. We clean every second day
> >> for spam >30 days and viruses >15 days, but will probably move to
> >> once-a-week.

I dont know what is going on here, but there should be absolutely no
locks on the database when a cleanup occurs. Nor timeouts. Database
engines aside, the fundamentals of how we are selecting our rows for
deletion would mean that no inserts (from amavisd) should even have
any association looking at the rows we are cleaning out. This is also
supported by the fact your MySQL logs say 'lock_time: 0' for the slow
queries.

>
> Yesterday stats from MySQL's slow-queries log:
> # Query_time: 28676  Lock_time: 0  Rows_sent: 0  Rows_examined: 6452078
> DELETE quarantine FROM quarantine LEFT JOIN msgs USING(mail_id)
>    WHERE msgs.mail_id IS NULL;
> --
> # Query_time: 12868  Lock_time: 0  Rows_sent: 0  Rows_examined: 6428954
> DELETE msgrcpt    FROM msgrcpt    LEFT JOIN msgs USING(mail_id)
>    WHERE msgs.mail_id IS NULL;
> --
> # Query_time: 3225  Lock_time: 0  Rows_sent: 0  Rows_examined: 991368
> DELETE FROM maddr
>    WHERE NOT EXISTS (SELECT sid FROM msgs    WHERE sid=id)
>
> That's a total of 12.4 hours.
>
> Issuing an "EXPLAIN" on those queries the engine shows that no index is
> used, so the whole tablespace is scanned. Running the cleanup hourly
> would just clog the database. But I'm no DB expert, so any input is welcome.

A whole seq scan on the table being manipulated is necessary for the
LEFT JOIN comparison. Basically what we are saying here is "For every
record in the 'msgrcpt' table, check if the same mail_id exists in the
'msgs' table, if it does not, delete the record in the 'msgrcpt'
table". In order for this to work, we must iterate through every
record in the 'msgrcpt' table, hence the full table (or seq) scan.

>
> >> Has anyone done some tests?
> >
> > I'm beginning to apreciate PostgreSQL  :)
>
> I am familiar with MySQL, and I am not the one that should manage the
> antispam filter when fully in production. We might switch to PG if the
> speedup is real and train sysadmins consequently.
>

I would recommend you try the CASCADING DELETE, it cant hurt, and it
does not require any down time.


-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0709&bid&3057&dat1642
_______________________________________________
AMaViS-user mailing list
AMaViS-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/amavis-user
AMaViS-FAQ:http://www.amavis.org/amavis-faq.php3
AMaViS-HowTos:http://www.amavis.org/howto/

Reply via email to