On 4/26/06, Paolo Cravero as2594 <[EMAIL PROTECTED]> wrote: > Should be worth investigating whether keeping message bodies on > filesystem allows for faster purges. Using the new "quar_loc" field a > web quarantine management system (WQMS? :-) ) should be able to display > the file, right? > On 4/26/06, Paolo Cravero as2594 <[EMAIL PROTECTED]> wrote: > Should be worth investigating whether keeping message bodies on > filesystem allows for faster purges. Using the new "quar_loc" field a > web quarantine management system (WQMS? :-) ) should be able to display > the file, right? >
I think this would be fairly difficult to implement. How would a WQMS access the file in order to display it? The quarantine management interface may not necessarily may not have direct access to the file system. Also consider an environment with several hosts quarantining to their own local disks. The only usable way I see this happening is have all amavisd hosts quarantining to an NFS share, where the quarantine management system has access to as well. But even then it is a lot of trouble. Also the cleanup script which deletes the SQL logging records would first have to do a full sequence scan of the 'msgs' table (using the 'time_num' column as a filter i suppose) and store the list of filenames stored in the 'quar_loc' column to a temporary file, delete the SQL records, then use the temporary file as a list of files to be deleted off the filesystem. > >> We have a 50GB (fifty GigaB) quarantine table on the production server, > >> and the last quarantine purge run took 14 hours to complete. All traffic > >> was queued because of DB locks and timeouts. We clean every second day > >> for spam >30 days and viruses >15 days, but will probably move to > >> once-a-week. I dont know what is going on here, but there should be absolutely no locks on the database when a cleanup occurs. Nor timeouts. Database engines aside, the fundamentals of how we are selecting our rows for deletion would mean that no inserts (from amavisd) should even have any association looking at the rows we are cleaning out. This is also supported by the fact your MySQL logs say 'lock_time: 0' for the slow queries. > > Yesterday stats from MySQL's slow-queries log: > # Query_time: 28676 Lock_time: 0 Rows_sent: 0 Rows_examined: 6452078 > DELETE quarantine FROM quarantine LEFT JOIN msgs USING(mail_id) > WHERE msgs.mail_id IS NULL; > -- > # Query_time: 12868 Lock_time: 0 Rows_sent: 0 Rows_examined: 6428954 > DELETE msgrcpt FROM msgrcpt LEFT JOIN msgs USING(mail_id) > WHERE msgs.mail_id IS NULL; > -- > # Query_time: 3225 Lock_time: 0 Rows_sent: 0 Rows_examined: 991368 > DELETE FROM maddr > WHERE NOT EXISTS (SELECT sid FROM msgs WHERE sid=id) > > That's a total of 12.4 hours. > > Issuing an "EXPLAIN" on those queries the engine shows that no index is > used, so the whole tablespace is scanned. Running the cleanup hourly > would just clog the database. But I'm no DB expert, so any input is welcome. A whole seq scan on the table being manipulated is necessary for the LEFT JOIN comparison. Basically what we are saying here is "For every record in the 'msgrcpt' table, check if the same mail_id exists in the 'msgs' table, if it does not, delete the record in the 'msgrcpt' table". In order for this to work, we must iterate through every record in the 'msgrcpt' table, hence the full table (or seq) scan. > > >> Has anyone done some tests? > > > > I'm beginning to apreciate PostgreSQL :) > > I am familiar with MySQL, and I am not the one that should manage the > antispam filter when fully in production. We might switch to PG if the > speedup is real and train sysadmins consequently. > I would recommend you try the CASCADING DELETE, it cant hurt, and it does not require any down time. ------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid0709&bid&3057&dat1642 _______________________________________________ AMaViS-user mailing list AMaViS-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/amavis-user AMaViS-FAQ:http://www.amavis.org/amavis-faq.php3 AMaViS-HowTos:http://www.amavis.org/howto/