On 06/08/2016 08:12 PM, Eric Sunshine wrote:
On Wed, Jun 8, 2016 at 1:58 PM, Junio C Hamano <gits...@pobox.com> wrote:
Samuel GROOT <samuel.gr...@grenoble-inp.org> writes:
+sub parse_email {
+     my %mail = ();
+     my $fh = shift;
+     my $last_header;

+     # Unfold and parse multiline header fields
+     while (<$fh>) {
+             last if /^\s*$/;

You stop at the end of fields before the message body starts.

+             s/\r\n|\n|\r//;

The pattern is not anchored at the right end of the string;
intended?  Is it worth worrying about a lone '\r'?

Thanks, I think you covered pretty much everything I was going to say.
I'd just add that if the matching is going to be kept loose like this
(rather than anchoring it), then s/[\r\n]+//g might be easier to read,
but it's a minor point.

Indeed s/[\r\n]+//g is way better, it works even if there's a CR in the middle of the line.

+             if (/^([^\s:]+):[\s]+(.*)$/) {
+                     $last_header = lc($1);
+                     @{$mail{$last_header}} = ()
+                             unless defined $mail{$last_header};
+                     push @{$mail{$last_header}}, $2;

+             } elsif (/^\s+\S/ and defined $last_header) {
+                     s/^\s+/ /;
+                     push @{$mail{$last_header}}, $_;

Even though the comment said "unfold", you do not really do the
unfolding here and the caller can (if it wants to) figure out where
one logical header was folded in the original into multiple physical
lines, because you are returning an arrayref.

Also, the comment about folding lines should be moved down the part of
the code which is actually (supposed to be) doing the folding rather
than having the comment at the top of the loop.

Will do in next re-roll.

+     # Separate body from header
+     $mail{"body"} = [(<$fh>)];
+
+     return \%mail;

The name of the local thing is not observable from the caller, but
because this is "parse-email-header" and returns "header fields"
without reading the "mail", perhaps call it %header instead?

If there is (for some reason) a mail header named 'body', then this
assignment of the body portion of the message will overwrite it.
Perhaps this function should instead return multiple values: the
header hash, and the message body.

I will drop the body part in re-roll.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to