[Email::MIME] Message structure destroyed by unnecessary content type parameter
Hello, I am using Email::MIME to analyze messages. This works fine, but I found that the mail structure analysis fails (partly) when a mail program adds additional parameters to the content type header field of a multipart message, like so: Content-Type: multipart/related ;boundary="E12E79A3A5A642B5BDEDBECB78526EFD" ;type="text/html" Here, to my knowledge the type parameter makes no sense, so I would say the composing program was not right to add it. Nevertheless, it was a surprise to see the type parameter causes Email::MIME to analyse the message differently. Without the type parameter, all related subparts are recognized. With the type parameter, all the subparts are ignored. Please see the attached script for a demonstration. It just visualizes the detected message structure in a simple way. Here are the results: - without type parameter in the content type: multipart/mixed ;boundary="172DB807A50B41808B5EAA882A381470" multipart/alternative ;boundary="B9B88CD9AA87425DB7C36965BA5C5717" text/plain; charset="Windows-1252" multipart/related ;boundary="E12E79A3A5A642B5BDEDBECB78526EFD" text/html; charset="Windows-1252" text/plain text/plain - with type parameter in the content type: multipart/mixed ;boundary="172DB807A50B41808B5EAA882A381470" multipart/alternative ;boundary="B9B88CD9AA87425DB7C36965BA5C5717" text/plain; charset="Windows-1252" multipart/related ;boundary="E12E79A3A5A642B5BDEDBECB78526EFD" ; type="text/html" text/plain Is it possible to fix this so that regardless of additional content type parameters (except "boundary") the message structure is recognized completely? Thank you in advance Jochen Stenzel P.S.: The demo script analyzes an included demo message without the type parameter if called without arguments, and a message *with* a type parameter if called with any argument. email-mime-ct-issue.pl Description: Binary data
Re: [Email::MIME] [Resent with registered address] Message structure destroyed by unnecessary content type parameter
Hello, I debugged my demo program and now I assume the problem is located in the line concatenation of Email::MIME. The additional parameter is located on a separate line: > Content-Type: multipart/related > ;boundary="E12E79A3A5A642B5BDEDBECB78526EFD" > ;type="text/html" . The module concatenates the lines before parsing them, but preserves some whitespace. The attributes part then reads *like* this: boundary="E12E79A3A5A642B5BDEDBECB78526EFD" ; type="text/html" Please note the whitespace before the semi-colon. This whitespace causes trouble when Email::MIME::_extract_ct_attribute_value() has found an attribute value and tries to find out if everything is read: /^;/ and last; Because the remaining string reads qq( ; type="text/html") at this time, the method continues value aggregation and adds the additional whitespace subsequently. Then when the value is used as a boundary string it does not match because of the additional whitespace. So, as a quick patch I suggest something like diff --git a/MIME/ContentType.pm b/MIME/ContentType.pm index a2afb45..89f6615 100644 --- a/MIME/ContentType.pm +++ b/MIME/ContentType.pm @@ -71,7 +71,7 @@ sub _extract_ct_attribute_value { # EXPECTS AND MODIFIES $_ my $sub = $1; $sub =~ s/^["']//; $sub =~ s/["']$//; $value .= $sub; }; -/^;/ and last; +/^\s*;/ and last; /^([$tspecials])/ and do { carp "Unquoted $1 not allowed in Content-Type!"; return; but I am not sure if this is the correct place. Possibly the concatenation could be modified (but then, we still had problems with valid whitespace around semi-colons within a line), or possibly the function should return when it read a quoted value, as this value should be complete. Another approach was to patch Email::MIME itself where it uses the boundary value, and to delete trailing whitespace: diff --git a/MIME.pm b/MIME.pm index 26a3c1f..f2d2428 100644 --- a/MIME.pm +++ b/MIME.pm @@ -341,6 +341,10 @@ sub parts_multipart { my $self = shift; my $boundary = $self->{ct}->{attributes}->{boundary}; + # boundary values happen to have trailing whitespace sometimes, remove it + # - jstenzel, 2009-12-16 + $boundary =~ s/\s+$//; + # Take a message, join all its lines together. Now try to Email::MIME->new # it with 1.861 or earlier. Death! It tries to recurse endlessly on the # body, because every time it splits on boundary it gets itself. Obviously but this is fairly late and will affect boundary values only. With both patches, the demo script analyzes the demo message correctly now. Could the maintainers please have a look to see if one of these patches is sufficient? Thanks and regards Jochen
[Email::MIME] [Resent with registered address] Message structure destroyed by unnecessary content type parameter
Hello, I am sorry I sent the message below with a non-registered mail address. Regards Jochen Hello, I am using Email::MIME to analyze messages. This works fine, but I found that the mail structure analysis fails (partly) when a mail program adds additional parameters to the content type header field of a multipart message, like so: Content-Type: multipart/related ;boundary="E12E79A3A5A642B5BDEDBECB78526EFD" ;type="text/html" Here, to my knowledge the type parameter makes no sense, so I would say the composing program was not right to add it. Nevertheless, it was a surprise to see the type parameter causes Email::MIME to analyse the message differently. Without the type parameter, all related subparts are recognized. With the type parameter, all the subparts are ignored. Please see the attached script for a demonstration. It just visualizes the detected message structure in a simple way. Here are the results: - without type parameter in the content type: multipart/mixed ;boundary="172DB807A50B41808B5EAA882A381470" multipart/alternative ;boundary="B9B88CD9AA87425DB7C36965BA5C5717" text/plain; charset="Windows-1252" multipart/related ;boundary="E12E79A3A5A642B5BDEDBECB78526EFD" text/html; charset="Windows-1252" text/plain text/plain - with type parameter in the content type: multipart/mixed ;boundary="172DB807A50B41808B5EAA882A381470" multipart/alternative ;boundary="B9B88CD9AA87425DB7C36965BA5C5717" text/plain; charset="Windows-1252" multipart/related ;boundary="E12E79A3A5A642B5BDEDBECB78526EFD" ; type="text/html" text/plain Is it possible to fix this so that regardless of additional content type parameters (except "boundary") the message structure is recognized completely? Thank you in advance Jochen Stenzel P.S.: The demo script analyzes an included demo message without the type parameter if called without arguments, and a message *with* a type parameter if called with any argument. email-mime-ct-issue.pl Description: Binary data
Email::ARF problems
Hi. I'm trying to use Email::ARF on incoming abuse reports. In particular the ones I'm processing come from AOL. I can parse them and create an Email::ARF::Report object, but when trying to get the body of the original message, I get something that looks quoted-printable encoded - lines end in =, and it's generally not as I'd expect. This makes it difficult to parse out my ids identifying the specific message being complained about. Should I be able to see the decoded body? -- Michael Stevens Dianomi Ltd 18 Buckingham Gate London SW1E 6LB Tel: 020 7802 5530 Fax: 020 7630 7356 www.dianomi.com The information in this message and any attachment is intended for the addressee and is confidential and may be subject to legal privilege. Dianomi Ltd, Registered Office: One America Square, Crosswall, London. EC3N 2SG. Registered in England and Wales with Company Registration Number 4513809. VAT registration number: 809754988