http://adc-archmbox.sourceforge.net/ is a nice tool for this sort of thing too, and it's perl as well.
Ken A
Pacific.Net

Jeff A. Earickson wrote:
Try the following perl script.  Not by me, but modified to some
extent by me.

Jeff Earickson
Colby College

On Mon, 20 Mar 2006, Daniel Senie wrote:

Date: Mon, 20 Mar 2006 16:22:53 -0500
From: Daniel Senie <[EMAIL PROTECTED]>
To: Paul Carpenter <[EMAIL PROTECTED]>,
    Subscribers of Qpopper <qpopper@lists.pensive.org>
Subject: Re: Tools for dealing with mboxes and stale .pop files?

At 03:51 PM 3/20/2006, Paul Carpenter wrote:
Google for a python script called "garbmail".  I use it to scan
mailboxes for mail over a set age and delete just the old messages.

I use garbmail. Sure wish it were in Perl. Lots of things I'd like to tweak. Mostly, I'd like it to do a lot less at a given time (I'd prefer to run it individually for mailboxes, rather than having it sweep for me). I don't particularly enjoy working in Python though. One of these days I'll rewrite what I need in Perl or C I suppose.



> > On Thu, 9 Mar 2006, Randall Gellens wrote:
> >
> > > There are scripts that have been posted in the past that let you
> cull old
> > > mail.  Something else to think about.


------------------------------------------------------------------------

#!/usr/bin/perl --                                           -*-perl-*-
#
#!/opt/perl5.debug/bin/perl --                                       -*-perl-*-
#
# Copyright (c) Information Systems, The Press Association Limited 1993
# Portions Copyright (c) Computer Newspaper Services Limited 1993
# All rights reserved.
# # License to use, copy, modify, and distribute this work and its
# documentation for any purpose and without fee is hereby granted,
# provided that you also ensure modified files carry prominent notices
# stating that you changed the files and the date of any change, ensure
# that the above copyright notice appear in all copies, that both the
# copyright notice and this permission notice appear in supporting
# documentation, and that the name of Computer Newspaper Services not
# be used in advertising or publicity pertaining to distribution or use
# of the work without specific, written prior permission from Computer
# Newspaper Services.
# # By copying, distributing or modifying this work (or any derived work)
# you indicate your acceptance of this license and all its terms and
# conditions.
# # THIS SOFTWARE IS PROVIDED "AS IS", WITHOUT ANY WARRANTIES OF ANY KIND,
# EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO ANY IMPLIED
# WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND
# NONINFRINGEMENT OF THIRD PARTY RIGHTS.  THE ENTIRE RISK AS TO THE QUALITY
# AND PERFORMANCE OF THE SOFTWARE, INCLUDING ANY DUTY TO SUPPORT OR
# MAINTAIN, BELONGS TO THE LICENSEE.  SHOULD ANY PORTION OF THE SOFTWARE
# PROVE DEFECTIVE, THE LICENSEE (NOT THE COPYRIGHT OWNER) ASSUMES THE
# ENTIRE COST OF ALL SERVICING, REPAIR AND CORRECTION.  IN NO EVENT SHALL
# THE COPYRIGHT OWNER BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL
# DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR
# PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS
# ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF
# THIS SOFTWARE.
#
#
# $Id: expire_mail,v 1.1 1993/06/03 10:43:26 phil Exp $
#

#
# Information Systems Engineering Group
# Phil Male
#

local($_expire_mail_rcsid) = '$Id: expire_mail,v 1.1 1993/06/03 10:43:26 phil 
Exp $';
local($_copyright) = 'Copyright (c) Information Systems, The Press Association 
Limited 1993';

require "getopts.pl";                 # option handling
require "timelocal.pl";                       # time conversion
require "ctime.pl";                   # ctime for pseudo-mailing
require "stat.pl";                    # file status

# Perl mail expire.
# This program removes old messages from system mailboxes.
# It assumes the format of mailboxes to be standard
# sendmail format mail with a blank line followed by a `From ' line
# starting each and every message. Mailbox locking is via flock.
# Works under SunOS.
#
# Options as follows:
# -v                    verbose output
# -V                    display version information and quit
# -d                    debug mode (no change to mailbox)
# -l                    display messages for crontab output
# -z                    do not delete zero length mailboxes
# -t                    do not reset access and modification times on mailbox
# -o                    always open mailbox, never just test modification date
# -M                    append a message detailing deleted messages for the user
# -T                    do not record delivery of mail summary on mailbox date
# -W                    append warning about what would be deleted (imples 
debug)
# -a days               messages whose age is greater than days are expired
# -O days               messages whose age is greater than days are expired
# -u user               only consider messages from user (regexp)
# -S read|old   only consider messages with status `old', or `read'
# -X deleted    only consider messages with X-status `deleted'
# -s subject            only consider messages with subject (regexp)
#
# Based on expire_mail by Steve Mitchell ([EMAIL PROTECTED])
#
# Changes: # "status deleted" added by Jeff Earickson, 10/22/2004
#  skip deletion of PINE/IMAP FOLDER INTERNAL DATA msgs, 11/8/2005


#####
#
# Definitions
#
#####

# site postmaster - XXX change this as required
$postmaster = "[EMAIL PROTECTED]";

# current user
$me = getlogin || (getpwuid($<))[0] || "unknown";
$home = $ENV{'HOME'};

# default mailbox for a user - XXX change this as required
$default_mailbox = $ENV{'MAILBOX'} || "/var/mail/$me";

#----------------------------------------------------------------------
# notice to append to list of deleted messages
#---modified for Colby
$notice = "
Hello,

The messages listed below, which you had previously read and which were
more than $age days old, have been deleted from your system mailbox on Colby's mail server by the system's mail expiry agent. If you accessed
your mailbox via POP (eg, Eudora or Outlook) within the last week then any
email more than 7 days old has been deleted.  For information
about Colby's policy on old email, please see section 14 of the webpage:

   http://www.colby.edu/info.tech/policies/html/email.policy.html

If you are a POP user, this webpage will explain why messages you
thought that you had already deleted are listed below.  Please note the
discussion about the \"Leave Mail on Server\" option; turn this option off
if you do not need it.

For further information about the Mail Expiry Agent see:

   http://www.colby.edu/administration_cs/its/support/email/expiry_agent.cfm

Any message deleted by the Mail Expiry Agent is GONE.  Do not ask for it back.
Save important messages elsewhere.  Do not leave them in your INBOX.

Fight mailbox bloat!  A big INBOX slows down your email reading ability
by consuming more computer resources.  Save messages you want to keep
to alternate folders if you are a webmail user.  Delete unwanted messages.
Turn off the \"Leave Mail on Server\" option if you are a Eudora user.

If you don't want to see this message every day, keep your mailbox
smaller than 50 megabytes!!

Thank You, The Folks in Information Technology Services";

#----------------------------------------------------------------------
# warning about old mail messages
#---modified for Colby
$warning= "
Hello,

The messages listed below, which you have previously read and which have
delivery dates more than 30 days old, should be deleted from your system mailbox on Colby's mail server.

After January 1, 1999, any mail message on the Colby mail server
previously marked as \"read\" by the mailer software, and greater than 30 days old will be automatically deleted from your system mailbox. A copy will *not* be saved anywhere -- the message will be GONE. Unread messages, no matter how old, will not be deleted.

Please see the webpage:

   http://www.colby.edu/info.tech/policies/email.policy.html

for further information about Colby's policy on old email. If you are a Eudora user, this webpage will explain why messages you thought that you had already deleted are listed below. Please note the discussion about the \"Leave Mail on Server\" option; turn this option off if you do not need it.

If you read your mail on colby0, please delete old mail or file it in your personal mail folders. And please read your e-mail on a regular
basis so it doesn't stack up.

Thank You, The Folks in Information Technology Services";

#----------------------------------------------------------------------

# set the umask for temp files
umask( 0700 );

# make stdout unbuffered
select(STDOUT); $| = 1;

$LOCK_EX = 2;                           # lock
$LOCK_UN = 8;                           # unlock
$START_TIME = time;                     # time right now
$SEC_PER_DAY = 24 * 60 * 60;            # seconds in a day
$line_buffer = "";                    # empty line buffer

# month numbers
$mon_num{'Jan'} = 0;
$mon_num{'Feb'} = 1;
$mon_num{'Mar'} = 2;
$mon_num{'Apr'} = 3;
$mon_num{'May'} = 4;
$mon_num{'Jun'} = 5;
$mon_num{'Jul'} = 6;
$mon_num{'Aug'} = 7;
$mon_num{'Sep'} = 8;
$mon_num{'Oct'} = 9;
$mon_num{'Nov'} = 10;
$mon_num{'Dec'} = 11;

#####
#
# Support
#
#####

# line buffer for look-ahead

sub get_line
{
        local( $line ) = "";                  # line to return

        if( ! ($line_buffer eq "") ) {
                $line = $line_buffer;
                $line_buffer = "";
        } else {
                $line = <MBOX>;
        }
        return $line;
}

# read message from mailbox

sub read_message
{
        local( $msg ) = "";                   # message to send back
        local( $prev_blank ) = 1;               # assume previous line blank
        local( $seen_from ) = 0;                # seen a from line
        local( $line ) = "";                  # current line

        # reset some globals
        $msg_status = "";
        $msg_xstatus = "";
        $msg_subject = "";
        $msg_date = "";

        while( $line = &get_line ) {
if( $line =~ /^From\s+([^\s]+)\s+(.*)$/ ) {
                        # if previous line was blank, then legal from line
                        if( $prev_blank ) {
                                # if already seen a legal from line, then this 
is next message
                                if( $seen_from ) {
                                        # pushback this from line
                                        $line_buffer = $line;
                                        return $msg;
                                }
                                $seen_from++;
                                # From line found, extract information
                                ( $msg_from, $msg_date ) = ( $1, $2 );
                                $msg_stamp = &rctime( $msg_date );
                                $msg_age = &days_old( $msg_stamp );
                                #print STDERR "msg_date = $msg_date, msg_stamp = 
$msg_stamp, msg_age = $msg_age\n";
                        }
                } elsif( $line =~ /^[Ss]tatus: ([A-Za-z]+)/ ) {
                        ( $msg_status ) = ( $1 );
                } elsif( $line =~ /^X-[Ss]tatus: ([A-Za-z]+)/ ) {
                        ( $msg_xstatus ) = ( $1 );
                } elsif( $line =~ /^[Ss]ubject: (.*)$/ ) {
                        ( $msg_subject ) = ( $1 );
                }

                # set previous line
                if( $line =~ /^$/ ) {
                        $prev_blank = 1;
                } else {
                        $prev_blank = 0;
                }

                $msg .= $line;
        }

        return $msg;
}

# write a message into a mailbox
sub write_message
{
        print TMPF "@_";
}

# parse the ctime string into a time value
# From line contains local time

sub rctime
{
        local( $pt ) = @_;                      # time to convert
        local( $ct ) = -1;                      # converted time

        if($pt =~ 
/^([A-Za-z]+)\s+([A-Za-z]+)\s+([0-9]+)\s+([0-9:]+)\s+([0-9]+)/ ) {
                ( $day, $mon, $mday, $time, $year ) = ( $1, $2, $3, $4, $5 );
                ( $hour, $min, $sec ) = split( ':', $time );
                if( $year > 1900 ) { $year -= 1900; }
                $ct = &timelocal($sec,$min,$hour,$mday,$mon_num{$mon},$year);
        }
        return $ct;
}

# age in days
sub days_old
{
        local( $agev ) = @_;                    # time to convert

        return( ( $START_TIME - $agev ) / $SEC_PER_DAY );
}

# basename
sub basename
{
        local( $path ) = @_;                    # path to find the base of
        local( $base ) = rindex( $path, "/" );

        if( $base < 0 ) {
                $base = $path;
        } else {
                $base = substr($path, $base + 1);
        }

        return $base;
}

# usage message
sub usage
{
        print STDERR "usage: expire_mail [-vlV] [-zotTMW] [-d] \n";
        print STDERR "{ [-O days] [-u user] [-S read|old] [-s subject] [-X deleted]} 
mailbox...\n";
        exit 0;
}

#####
#
# Main
#
#####

&Getopts( 'VvO:a:ou:zdS:s:MtTlWX:' ) || &usage;

# compat
$opt_a = $opt_O if ($opt_O && !$opt_a);

# check version
if( $opt_V ) {
        print "expire_mail: mail expiry agent\n";
        print "expire_mail: $_expire_mail_rcsid\n";
        &usage;
}

# use default mailbox if non supplied
if( $#ARGV < $[ ) {
        $ARGV[0] = "$default_mailbox";
}

# decode status option
if( $opt_S ) {
        if( $opt_S eq "old" ) {
                $opt_S = "O";
        } elsif( $opt_S eq "read" ) {
                $opt_S = "R";
        } else {
                print STDERR "expire_mail: status may only be one of `old', 
`unread'\n";
                &usage;
        }
}

# decode X-status option
if( $opt_X ) {
        if( $opt_X eq "deleted" ) {
                $opt_X = "D";
        } else {
                print STDERR "expire_mail: X-status may only be `deleted'\n";
                &usage;
        }
}

# check we are actually doing some processing
if( !$opt_a && !$opt_u && !$opt_S && !$opt_s && !$opt_X) {
        print STDERR "expire_mail: must specify at least one of -O, -u, -S, -s or 
-X\n";
        &usage;
}

# warning mode imples debug mode
if( $opt_W ) { $opt_d = 1; }

# debug mode implies verbose mode
if( $opt_d ) { $opt_v = 1; }

# foreach mailbox...
while( $mailbox = shift ) {

        if( $opt_v && !$opt_W ) { print STDOUT "Checking mailbox $mailbox\n"; }

        # does mailbox exist
        if( ! -f $mailbox ) { next; }

        # stat the mailbox
        @sb = &Stat($mailbox);

        # can it be deleted now?
        if( !$opt_o && $opt_a ) {
                # check the modification date
                $age = &days_old(@sb[$ST_MTIME]);
                if( $age > $opt_a ) {
                        if( $opt_v ) { print STDOUT "Expiring mailbox 
$mailbox\n"; }
                        if( !$opt_d ) {
                                if( $opt_z ) {
open( MBOX, ">$mailbox" ) || print STDERR "expire_mail: failed to truncate $mailbox\n";
                                        close( MBOX );
                                } else {
                                        unlink( $mailbox ) ||
                                        print STDERR "expire_mail: failed to remove 
$mailbox\n";
                                }
                        }
                        next;
                }
        }

        # open the mailbox
        if( !open( MBOX, "+<$mailbox" ) ) {
                print STDERR "expire_mail: unable to open $mailbox\n";
                next;
        }

        # lock the mailbox
        if( !flock( MBOX, $LOCK_EX ) ) {
                print STDERR "expire_mail: unable to lock $mailbox\n";
                close( MBOX );
                next;
        }

        # open the temporary file
        $tmpname = "$mailbox.exp$$";
        if( !open( TMPF, "+>$tmpname" ) ) {
                print STDERR "expire_mail: unable to create temporary file for 
$mailbox\n";
                close( MBOX );
                next;
        }
        unlink( $tmpname );

        # init counters
        $count = 0;
        $exp = 0;

        # read each message in turn
        while( $msg = &read_message ) {

                $count++;

                #---skip IMAP/PINE FOLDER INTERNAL DATA messages
                if($msg_from =~ /MAILER_DAEMON/ &&
                   $msg_subject =~ /DON\'T DELETE THIS MESSAGE -- FOLDER 
INTERNAL DATA/)
                {
                        if( $opt_v && !$opt_W ) {
                                #print STDOUT "\tMsg #$count: INTERNAL DATA\r";
                                print STDOUT "\tMsg #$count: INTERNAL DATA \n";
                        }
                        &write_message( $msg );
                        next;
                }

                # looking for specific from users
                if( $opt_u ) {
                        if( ! ($msg_from =~ /$opt_u/) ) {
                                if( $opt_v && !$opt_W ) {
                                        #print STDOUT "\tMsg #$count: from   
\r";
                                        print STDOUT "\tMsg #$count: from   \n";
                                }
                        &write_message( $msg );
                        next;
                        }
                }

                # check message status
                if( $opt_S ) {
                        if( !($msg_status =~ /$opt_S/) ) {
                                if( $opt_v && !$opt_W ) {
                                        #print STDOUT "\tMsg #$count: status   
\r";
                                        print STDOUT "\tMsg #$count: status   
\n";
                                }
                                &write_message( $msg );
                                next;
                        }
                }

                # check message X-status
                if( $opt_X ) {
                        if( !($msg_xstatus =~ /$opt_X/) ) {
                                if( $opt_v && !$opt_W ) {
                                        #print STDOUT "\tMsg #$count: status   
\r";
                                        print STDOUT "\tMsg #$count: xstatus   
\n";
                                }
                                &write_message( $msg );
                                next;
                        }
                }

                # check message subject
                if( $opt_s ) {
                        if( ! ($msg_subject =~ /$opt_s/) ) {
                                if( $opt_v && !$opt_W ) {
                                        #print STDOUT "\tMsg #$count: subject   
\r";
                                        print STDOUT "\tMsg #$count: subject   
\n";
                                }
                        &write_message( $msg );
                        next;
                        }
                }

                # only other thing to check is message age
                if( $opt_a ) {
                        if( $msg_age <= $opt_a ) {
                                if( $opt_v && !$opt_W ) {
                                        #print STDOUT "\tMsg #$count: newer   
\r";
                                        print STDOUT "\tMsg #$count: newer   
\n";
                                }
                                &write_message( $msg );
                                next;
                        }
                }

                # log the expiry
                if( $opt_v && !$opt_W ) {
                        #print STDOUT "\tMsg #$count: expired   \r";
                        print STDOUT "\tMsg #$count: expired   \n";
                }

                # copy message across if in debug
                if( $opt_d ) {
                        &write_message( $msg );
                        if($opt_W) {
                                # record the mail message from and subject line
                                $pad = ' ' x (25 - length($msg_from) );
                                $npad = ' ' x ( 4 - length($count) );
                                $subjects[$exp] = "$npad$count $msg_from$pad 
$msg_date\n     $msg_subject\n";
                        }
                } else {
                        # record the mail message from and subject line
                        $pad = ' ' x (25 - length($msg_from) );
                        $npad = ' ' x ( 4 - length($count) );
                        $subjects[$exp] = "$npad$count $msg_from$pad $msg_date\n     
$msg_subject\n";
                }

                # increment the expired message count
                $exp++;
        }

        if( !$opt_d ) {

                # if sending mail to the owner of the mailbox, append message 
on the end
                if( $opt_M && $exp > 0 ) {
                        chop( $ct = &ctime(time) );
                        $to = &basename( $mailbox );
                        chop( $fromdate = `date +\"%a %b %d %X %Y\"` );
                        print TMPF "From $postmaster  $fromdate\n";
                        print TMPF "Date: $ct\n";
                        print TMPF "From: $postmaster (Mail Expiry Agent)\n";
                        print TMPF "Reply-To: $postmaster\n";
                        print TMPF "To: $to\n";
                        print TMPF "Subject: Expired Mail Summary\n\n";
                        print TMPF "$notice\n\n";
                        # fitted to $subjects layout
                        print TMPF " Msg From & Subject            Dated\n\n";
                        foreach $msg ( @subjects ) {
                                print TMPF "$msg\n";
                        }

                        if( !$opt_T ) {
                                # set the modification time for the mailbox to 
be now
                                @sb[$ST_MTIME] = time;
                        }
                }

                # copy data back into mailbox to preserve permissions, creation 
time
                # and user and group id

                # zero length the mailbox
                truncate( MBOX, 0 );
                # *** START Critical
                # any data to copy?
                if( $exp <= $count ) {
                        # restart both files
                        seek(MBOX, 0, 0);
                        seek(TMPF, 0, 0);
                        # copy file into mailbox, better with sysread/syswrite?
                        while( <TMPF> ) {
                                print MBOX $_;
                        }
                } elsif( !$opt_z ) {
                        unlink( $mailbox );
                }
                # *** END Critical

        } else {
                # if sending warning to the owner of the mailbox, append warning
                if( $opt_W && $exp > 0 ) {
                        chop( $ct = &ctime(time) );
                        $to = &basename( $mailbox );
                        chop( $fromdate = `date +\"%a %b %d %X %Y\"` );
printf("fromdate = $fromdate\n");
                        print TMPF "From $postmaster  $fromdate\n";
                        print TMPF "Date: $ct\n";
                        print TMPF "From: $postmaster (Mail Expiry Agent)\n";
                        print TMPF "Reply-To: $postmaster\n";
                        print TMPF "To: $to\n";
                        print TMPF "Subject: Please delete old mail from system 
mailbox\n\n";
                        print TMPF "$warning\n\n";
                        # fitted to $subjects layout
                        print TMPF " Msg From & Subject            Dated\n\n";
                        foreach $msg ( @subjects ) {
                                print TMPF "$msg\n";
                        }

                        if( !$opt_T ) {
                                # set the modification time for the mailbox to 
be now
                                @sb[$ST_MTIME] = time;
                        }

                        # copy data back into mailbox to preserve permissions, 
creation time
                        # and user and group id
        
                        # zero length the mailbox
                        truncate( MBOX, 0 );
                        # *** START Critical
                        # any data to copy?
                        if( $exp <= $count ) {
                                # restart both files
                                seek(MBOX, 0, 0);
                                seek(TMPF, 0, 0);
                                # copy file into mailbox, better with 
sysread/syswrite?
                                while( <TMPF> ) {
                                        print MBOX $_;
                                }
                        } elsif( !$opt_z ) {
                                unlink( $mailbox );
                        }
                        # *** END Critical
                }
        }

        # unlock mailbox
        flock( MBOX, $LOCK_UN );

        # close files
        close( MBOX );
        close( TMPF );

        # reset access and modification dates
        # if we have sent mail, then the modification time is the time of the 
mail
        if( !$opt_t ) {
                utime( @sb[$ST_ATIME], @sb[$ST_MTIME], $mailbox );
        }

        # show counters
        if( $opt_v || ( $opt_l && $exp ) ) {
                print "$mailbox contained $count messages, expired $exp 
messages\n";
        }
}

Reply via email to