[Nmh-workers] 1.5 release and better repl/MIME handling

Ken Hornstein Fri, 30 Mar 2012 20:51:29 -0700

Okay, so I am thinking that perhaps we are close to ready to start the
1.5 release cycle; I don't have anything else on my plate that I'd like
to see in there.  So if anyone else has any code they would like to see in
nmh for 1.5, now is the time they should speak up!


And just so there's no confusion ... what I really mean is, "if YOU
have any code"; in other words, if you have code you have written
or would like to write, then you should definitely speak up.  If
you are talking about code you would like someone ELSE to write,
then I would kindly suggest you NOT speak up at this time; I think
it would be more appropriate for those things to wait until after
1.5 is branched.  Bug reports are of course welcome even without code.

Now of course the one thing that people have repeatedly asked about
is better MIME handling when dealing with replies.  Believe me, I feel
your pain.  But I think I have a reasonable solution and I'd like some
feedback on it.

I've been living with "par" as my formatproc for a while now, and for
simple cases it's been great.  But it has some unfortunate side effects.
It works "okay" on more modern messages designed to be reflowed (although
I still get q-p equal signs sprinkled throughout the message).  But for
already-repl'able messages, especially ones with source code in them,
it really chews them up.  So it's good for some things, not so good
for others.

But that got me to thinking - maybe I could have a more intelligent
formatproc that could handle MIME?  Since the formatproc only has
access to the message body (don't ask; that is NOT easily fixable),
you missed out on some crucial headers that described the MIME
content.  To make a long story short, I extended the mhl filter
syntax so you could give arguments to the filter program, and the
arguments are mh-format strings that have access to the message
headers.

Based on THAT support, I've cooked up a semi-intelligent MIME filter,
written in perl, and I've appended it below.  The comments at the
top explain how to use it.  It should do the "right" thing with
text/plain content and put markers in for non-text content.  It
should also handle nested multipart messages.  Feedback is welcome;
you'll need a pretty recent nmh (March 20th or newer) to make a go
with it.  You'll also need the MailTools perl package as well.  It's
probably a bit fragile; I haven't tested it all that much yet, but
it worked for some of the more complicated messages that I have.

If people think it's a good idea I wouldn't mind shipping this with
nmh, but I'm not sure where is should go; right now nmh doesn't have
a dependency on perl.  We could always put it in the "docs" directory.
Anyway, please let me know what you think!

--Ken

#!/usr/bin/perl
#
# replyfilter - A reply filter for nmh
#
# The idea behind this program is that it will act as a format filter
# for nmh.  It will try to extract out all text/plain parts and format
# them if necessary using a filter program.
#
# To use this program, configure nmh in the following way (nmh 1.5 or later):
#
# - Put the path to this program in your .mh_profile under formatproc:
# 
#   formatproc: replyfilter
#
# - Create a mhl reply filter that consists of the following line:
#
#   
body:nocomponent,format,nowrap,formatarg="%(trim{content-type})%(putstr)",formatarg="%(trim{content-transfer-encoding})%(putstr)",formatarg=">"
#
#   To decode this a bit:
#
#   body        - Output the "body" component
#   nocomponent - Don't output a component prefix (normally here we use a
#                 component prefix of ">" as a quote character, but we're
#                 going to have replyfilter do that).
#   nowrap      - Don't wrap lines if they exceed the column width
#   formatarg   - Arguments to fmtproc.  The first argument is the value of
#                 the Content-type header; the second is the value of the
#                 Content-Transfer-Encoding header.  The last "formatarg"
#                 is used as your quoting prefix.  Replace it with whatever
#                 you want.
#

use Mail::Field;
use MIME::Head;
use MIME::QuotedPrint;
use MIME::Base64;

#
# The program we use to format quoted-printable or other "long" text
#

$filterprogram = 'par';

die "Usage: $0 Content-type content-transfer-encoding quote-prefix\n"
                                if $#ARGV != 2;

if ($ARGV[0] ne "") {
        $content_type = Mail::Field->new('Content-Type', $ARGV[0]);
}

$encoding = $ARGV[1] eq "" ? '7bit' : lc($ARGV[1]);
$quoteprefix = $ARGV[2];

#
# The simplest case: if we have a single type of text/plain, send it
# to our format subroutine.
#

if ($ARGV[0] eq "" || $content_type->type eq 'text/plain') {
        process_text(\*STDIN, $encoding);
        exit 0;
}

#
# Alright, here's what we need to do.
#
# Find any text/plain parts and decode them.  Decode them via base64 or
# quoted-printable, and feed them to our formatting filter when appropriate.
# Put markers in the output for other content types.
#

($type) = (split('/', $content_type->type));

if ($type eq 'multipart') {

        #
        # For multipart messages we have to do a little extra.
        # Eat the MIME prologue (everything up until the first boundary)
        #

        $boundary = $content_type->boundary;

        if ($boundary eq '') {
                print "No boundary in Content-Type header!\n";
                eat_part(\*STDIN);
                exit 1;
        }

        while (<STDIN>) {
                last if match_boundary($_, $boundary);
        }

        if (eof(STDIN)) {
                print "Unable to find boundary in message\n";
                exit 1;
        }
} else {
        undef $boundary;
}

process_part(\*STDIN, $content_type->type, $encoding, $boundary);

if ($boundary) {
        #
        # Eat the MIME eplilog
        #
        eat_part(\*STDIN);
}

exit 0;

#
# Handled encoded text.  I think we can assume if the encoding is q-p
# or base64 to feed it into a formatting filter.
#

sub process_text (*$;$)
{
        my ($input, $encoding, $boundary) = @_;
        my $text, $filterpid, $prefixpid, $finread, $finwrite;
        my $foutread, $foutwrite, $decoder, $ret;

        #
        # In the simple case, just spit out the text prefixed by the
        # quote character
        #

        if ($encoding ne 'base64' && $encoding ne 'quoted-printable') {
                while (<$input>) {
                        $ret = match_boundary($_, $boundary);
                        if (defined $ret) {
                                return $ret;
                        }
                        print $quoteprefix, $_;
                }
                return 'EOF';
        } elsif ($encoding eq 'base64') {
                $decoder = \&decode_base64;
        } elsif ($encoding eq 'quoted-printable') {
                $decoder = \&decode_qp;
        } else {
                warn "Unknown encoding: $encoding\n";
                return 'EOF';
        }

        #
        # Okay, assume that the encoding will make it so that we need
        # to filter it.  Open a pipe to our filter program.
        #
        # We fork a copy of ourselves to read the output from the filter
        # program and prefix the quote character.
        #

        pipe($finread, $finwrite) || die "pipe() failed: $!\n";
        pipe($foutread, $foutwrite) || die "pipe() (second) failed: $!\n";

        if ($filterpid = fork) {
                #
                # Close the pipes in the parent that we're not using
                #

                close($finread);
                close($foutwrite);
        } elsif (defined $filterpid) {
                #
                # Close our ununsed filehandles
                #

                close($finwrite);
                close($foutread);

                #
                # Dup() down the filehandles to standard input and output
                #

                open(STDIN, "<&", $finread) ||
                                        die "dup(filterin) failed: $!\n";
                open(STDOUT, ">&", $foutwrite) ||
                                        die "dup(filterout) failed: $!\n";

                #
                # Close our copies.
                #

                close($finread);
                close($foutwrite);

                #
                # Exec our filter
                #

                exec $filterprogram ||
                                die "Unable to exec $filterprogram: $!\n";
        } else {
                die "Fork for $filterprogram failed: $!\n";
        }

        #
        # Fork our output handler.
        #

        if ($prefixpid = fork) {
                #
                # We don't need these anymore
                #
                close($foutread);

        } elsif (defined $prefixpid) {
                #
                # Read from foutwrite, and output (with prefix) to stdout
                #

                close($finwrite);

                while (<$foutread>) {
                        print STDOUT $quoteprefix, $_;
                }

                exit 0;
        }

        #
        # Decode our input, and send it to our filter program
        #

        while (<$input>) {
                if ($ret = match_boundary($_, $boundary)) {
                        last;
                }
                print $finwrite (&$decoder($_));
        }

        if (! defined $ret) {
                $ret = 'EOF';
        }

        close($finwrite);
        waitpid $filterpid, 0;
        warn "Filter process exited with ", $? >> 8, "\n" if $?;
        waitpid $prefixpid, 0;
        warn "Pipe reader process exited with ", $? >> 8, "\n" if $?;

        return $ret;
}

#
# Decide what to do, based on what kind of content it is.
#

sub process_part (*$$$;$)
{
        my ($input, $content_type, $encoding, $boundary, $name) = @_;
        my ($type, $subtype) = (split('/', $content_type, -1), '');

        if ($type eq 'text') {
                #
                # If this is a text part, right now we only deal with
                # plain parts.  We should be able to handle other types
                # of text parts in the future, hopefully.
                #
                if ($subtype eq 'plain') {
                        return process_text($input, $encoding, $boundary);
                } else {
                        print ">>> $content_type content\n";
                        return eat_part($input, $boundary);
                }
        } elsif ($type eq 'multipart') {
                return process_multipart($input, $subtype, $boundary);
        } else {
                #
                # Other types we're not sure what to do with right now
                # Just put a marker in there
                #

                print ">>> $content_type attachment";
                if (defined $name) {
                        print ", name=$name";
                }
                print "\n";

                return eat_part($input, $boundary);
        }
}

#
# Process a multipart message.
#
# When called, we should be right after the beginning of the first
# boundary marker.  So we should be pointed at header lines which describe
# the content of this part
#

sub process_multipart ($$$)
{
        my ($input, $subtype, $boundary) = @_;
        my $altout;

        while (1) {
                my $encoding, $type, $end, $name;

                #
                # Use the Mail::Header package to read in any headers
                # corresponding to this part
                #

                my $head = Mail::Header->new($input, (MailFrom => 'IGNORE'));

                #
                # Extract out any Content-Type, Content-Transfer-Encoding, and
                # Content-Disposition headers
                #

                my $ctype = Mail::Field->extract('Content-Type', $head);
                my $cte = Mail::Field->extract('Content-Transfer-Encoding',
                                               $head);
                my $cdispo = Mail::Field->extract('Content-Disposition', $head);

                $type = defined $ctype ? $ctype->type : 'text/plain';
                $encoding = defined $cte ? $cte->param('_') : '7bit';
                $name = defined $cdispo ? $cdispo->param('filename') : undef;

                #
                # Special handling for multipart/alternative; pick
                # the "first" one we can handle (which is usually
                # text/plain) and silently eat the rest, but output a
                # warning if we can't handle anything.
                #

                if ($altout) {
                        $end = eat_part($input, $boundary)

                } else {
                        my $subboundary = $boundary;
                        my $maintype = (split('/', $type))[0];

                        if ($maintype eq 'multipart') {
                                $subboundary = $ctype->boundary;
                                #
                                # Go until we find our beginning of this
                                # part
                                #
                                my $subend = eat_part($input, $subboundary);

                                if ($subend ne 'EOP') {
                                        print ">>> WARNING: malformed ",
                                                "nested multipart\n";
                                        return $subend;
                                }
                        }

                        $end = process_part($input, $type, $encoding,
                                            $subboundary, $name);

                        if ($subtype eq 'alternative' && ! defined $altout &&
                            $type eq 'text/plain') {
                                $altout = 1;
                        }

                        #
                        # Since we changed the semantics of $boundary
                        # above for nested multiparts, if we are
                        # handling a nested multipart then find the end
                        # of our current part
                        #

                        if ($maintype eq 'multipart') {
                                $end = eat_part($input, $boundary);
                        }

                }

                if ($end eq 'EOM' || $end eq 'EOF') {
                        if ($subtype eq 'alternative' && !defined $altout) {
                                print ">>>multipart/alternative: no suitable ",
                                        "parts\n";
                        }
                        return $end;
                }
        }
}

#
# "Eat" a MIME part; consume content until we hit the boundary or EOF
#

sub eat_part ($$)
{
        my ($input, $boundary) = @_;
        my $ret;

        #
        # If we weren't given a boundary just eat input until EOF
        #

        if (! defined $boundary) {
                while (<$input>) { }
                return 'EOF';
        }

        #
        # Otherwise, consume data until we hit our boundary
        #

        while (<$input>) {
                if ($ret = match_boundary($_, $boundary)) {
                        return $ret;
                }
        }

        return 'EOF';
}

#
# Match a line against the boundary string
#

sub match_boundary($$)
{
        my ($_, $boundary) = @_;

        if (substr($_, 0, 2) eq '--') {
                s/[ \t\r\n]+\Z//;
                if ($_ eq "--$boundary") {
                        return 'EOP';
                } elsif ($_ eq "--$boundary--") {
                        return 'EOM';
                }
        }

        return undef;
}

_______________________________________________
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

[Nmh-workers] 1.5 release and better repl/MIME handling

Reply via email to