Eduard Bloch wrote:
> my program mail-expire uses this module to split mbox files into
> individual messages. Sometimes, however, the end of file is reported too
> early and data is _lost_ because of that. I did not try to investigate
> the issue yet, test data is in:
> http://people.debian.org/~blade/debian-user-german.Apr_2006.bz2
> and the current version of the script is attached, with debugging output
> enabled. If you look at that, it stops splitting the contents at <[EMAIL 
> PROTECTED]> and returns the rest as one big message.

Looks like the problem here is the mime boundary header parsing. The header
looks like this:

Content-Type: multipart/signed; boundary=Sig_vBdOhvW1OXTFVp5Uz7Tcu_+;
 protocol="application/pgp-signature"; micalg=PGP-SHA1

Note the lack of quotation of the boundary string. The library parses it
with this:

    # Are nonquoted parameter values allowed to have spaces? I assume not.
    if ($content_type_header =~ /boundary *= *"([^"]*)"/i ||
        $content_type_header =~ /boundary *= *\b(\S+)\b/i)

This matches "Sig_vBdOhvW1OXTFVp5Uz7Tcu_" out of the string, leaving off
the "+" at the end. This doesn't conform to RFC 2046 which allows
boundary to contain:

     boundary := 0*69<bchars> bcharsnospace

     bchars := bcharsnospace / " "

     bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" /
                      "+" / "_" / "," / "-" / "." /
                      "/" / ":" / "=" / "?"

(And yes, even nonquoted spaces are legal AFAICS..)

This should work better, it passes the test suite and successfully
parses the mailbox from this bug report.

Index: Grep.pm
===================================================================
--- Grep.pm     (revision 12420)
+++ Grep.pm     (working copy)
@@ -177,9 +177,8 @@
     my $content_type_header = $1;
     $content_type_header =~ s/$endline//g;
 
-    # Are nonquoted parameter values allowed to have spaces? I assume not.
     if ($content_type_header =~ /boundary *= *"([^"]*)"/i ||
-        $content_type_header =~ /boundary *= *\b(\S+)\b/i)
+        $content_type_header =~ /boundary *= *([-0-9A-Za-z'()+_,.\/:=? 
]*[-0-9A-Za-z'()+_,.\/:=?])/i)
     {
       return $1
     }
Index: Perl.pm
===================================================================
--- Perl.pm     (revision 12420)
+++ Perl.pm     (working copy)
@@ -248,9 +248,8 @@
     my $content_type_header = $1;
     $content_type_header =~ s/$endline//g;
 
-    # Are nonquoted parameter values allowed to have spaces? I assume not.
     if ($content_type_header =~ /boundary *= *"([^"]*)"/i ||
-        $content_type_header =~ /boundary *= *\b(\S+)\b/i)
+        $content_type_header =~ /boundary *= *([-0-9A-Za-z'()+_,.\/:=? 
]*[-0-9A-Za-z'()+_,.\/:=?])/i)
     {
       return $1
     }

-- 
see shy jo

Attachment: signature.asc
Description: Digital signature

Reply via email to