Bug#61342: your mail

Don Armstrong Wed, 16 Mar 2005 17:08:06 -0800

On Wed, 16 Mar 2005, Colin Watson wrote:
> > Index: Debbugs/MIME.pm
> > ===================================================================
> > RCS file: /cvs/debbugs/source/Debbugs/MIME.pm,v
> > retrieving revision 1.1
> > diff -u -r1.1 MIME.pm
> > --- Debbugs/MIME.pm 3 Aug 2003 09:46:30 -0000       1.1
> > +++ Debbugs/MIME.pm 11 Jan 2005 16:43:49 -0000
> > @@ -2,19 +2,23 @@
> >  
> >  use strict;
> >  
> > -use File::Path;
> > -use MIME::Parser;
> > -use Exporter ();
> > -use vars qw($VERSION @ISA @EXPORT_OK);
> > +use base qw(Exporter);
> > +use vars qw($VERSION @EXPORT_OK @EXPORT);
> >  
> >  BEGIN {
> >      $VERSION = 1.00;
> >  
> > -    @ISA = qw(Exporter);
> > -    @EXPORT_OK = qw(parse);
> > +    @EXPORT_OK = qw(parse de_rfc1522 getmailbody);
> > +    @EXPORT = qw(getmailbody);
> >  }
> >  
> > -sub getmailbody ($);
> 
> Hm, not sure I like using @EXPORT rather than the deliberate forward
> declaration; plus, getmailbody is not really meant to be exported,
> because it's an internal function (despite the fact that
> scripts/process.in and scripts/service.in currently use another version
> of it, since Debbugs::MIME represents an incomplete modularisation
> effort). The rest of the Exporter cleanup looks OK, though.


Right. I think I missed that one. Yeah, just strip out the @EXPORT
part then. [It really shouldn't every be used anyway, since exporting
things by default is broken, IMO.]
 
> > +Turn RFC-1522 names into the UTF-8 equivalent. [#61342 et al]
> 
> Bug reference should probably live in a comment rather than being
> exported to the documentation etc.

Yeah, that's just a personal thing. I usually stick more detailed
documentation into POD than most people do.

> > +
> > +=cut
> > +
> > +# Set up the default rfc1522 decoder, which turns all charsets that
> > +# are supported into the appropriate UTF-8 charset.
> > +MIME::WordDecoder->default(new MIME::WordDecoder(['*' => sub {my 
> > ($data,$charset) = @_;
> > +                                                         $charset =~ 
> > s/^(UTF)\-(\d+)/$1$2/i;
> > +                                                         return $data 
> > unless utf8_supported_charset($charset);
> > +                                                         return 
> > to_utf8({-string  => $data,
> > +                                                                         
> > -charset => $charset,
> > +                                                                        });
> > +                                                    },
> > +                                            ],));
> 
> This should be in a BEGIN block, I think. I've also reformatted to 80
> columns.

It doesn't matter in this case. I didn't make it a BEGIN block,
because nothing should be calling it until after all BEGIN blocks have
been run. [And I tend not to stick things in BEGIN blocks unless they
actually have to be there.]

> I realise it's a database format change, but I'd really prefer to
> have the metadata files be pure UTF-8, so that we don't have to
> process them for display every time, and to make things like
> searching easier. We can always write a migration script.

I think that's the optimal solution too. However, this patch at least
will work now, and we can move to pure UTF-8 later.

> > +           # XXX This is broken. The BTS shouldn't be writing
> > +           # HTML into the log... but it does.
> > +           $this .= de_rfc1522($_);
>
> Not sure that qualifies for an XXX comment here. The right place to
> note the design flaw is really in Debbugs/Log.pm, which I've done.
> Debbugs::Log also provides a better basis for designing the
> migration to a new, less broken-by-design record type.
> 
> Again, I think the html record should really be in UTF-8, and a
> migration script written.

Yeah, no argument here. That XXX comment was basically a warning for
anyone looking at that segment of code, because that string really
isn't just a rfc1522 header. [And I didn't want to make changes in
unrelated parts of the code if I could avoid it.]

> And again. Basically, I think mail text should be preserved as-is in
> the log (since it means you can send out notifications by mail and
> preserve the MIME structure as it was received), but everything else
> counts as metadata and should be in a canonical form.

Definetly.

> I'd suggest instead simply:
> 
> -     $header{$v} = $_;
> +     $header{$v} = de_rfc1522($_);
> 
> ... and likewise in scripts/service.in. That way, all metadata
> trivially gets decoded with very little effort, and is stored in
> canonical form as suggested above.

Then we'll also have to do the migration. This bit of code won't hurt
if the header has already been run through de_rfc1522, though.
 
> > @@ -430,6 +434,7 @@
> >  X-$gProject-PR-Package: $data->{package}
> >  X-$gProject-PR-Keywords: $data->{keywords}
> >  Reply-To: [EMAIL PROTECTED]
> > +Content-Type: text/plain; charset="utf-8"
> >  
> >  This is an automatic notification regarding your $gBug report
> >  #$ref: $data->{subject},
> 
> This gets more complicated, and is one of the reasons I'd been
> avoiding this work. The right way to do this, and fix a few other
> bugs along the way, is to attach the text of the original report
> using MIME rather than simply appending it; that way, it can have
> its original content-type, including character set. So, I'll avoid
> this part for the time being. We won't be making anything
> significantly worse by doing this in two stages anyway, since it's
> already broken.
> 
> Do we need to re-RFC1522-encode headers in outgoing messages?

Yes... which is one of the reasons why I didn't play with the RFC1522
encoding in the log.


Don Armstrong

-- 
I never until now realized that the primary job of any emoticon is to
say "excuse me, that didn't make any sense." ;-P
 -- Cory Doctorow

http://www.donarmstrong.com              http://rzlab.ucr.edu


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Bug#61342: your mail

Reply via email to