On Wed, 16 Mar 2005, Colin Watson wrote: > > Index: Debbugs/MIME.pm > > =================================================================== > > RCS file: /cvs/debbugs/source/Debbugs/MIME.pm,v > > retrieving revision 1.1 > > diff -u -r1.1 MIME.pm > > --- Debbugs/MIME.pm 3 Aug 2003 09:46:30 -0000 1.1 > > +++ Debbugs/MIME.pm 11 Jan 2005 16:43:49 -0000 > > @@ -2,19 +2,23 @@ > > > > use strict; > > > > -use File::Path; > > -use MIME::Parser; > > -use Exporter (); > > -use vars qw($VERSION @ISA @EXPORT_OK); > > +use base qw(Exporter); > > +use vars qw($VERSION @EXPORT_OK @EXPORT); > > > > BEGIN { > > $VERSION = 1.00; > > > > - @ISA = qw(Exporter); > > - @EXPORT_OK = qw(parse); > > + @EXPORT_OK = qw(parse de_rfc1522 getmailbody); > > + @EXPORT = qw(getmailbody); > > } > > > > -sub getmailbody ($); > > Hm, not sure I like using @EXPORT rather than the deliberate forward > declaration; plus, getmailbody is not really meant to be exported, > because it's an internal function (despite the fact that > scripts/process.in and scripts/service.in currently use another version > of it, since Debbugs::MIME represents an incomplete modularisation > effort). The rest of the Exporter cleanup looks OK, though.
Right. I think I missed that one. Yeah, just strip out the @EXPORT part then. [It really shouldn't every be used anyway, since exporting things by default is broken, IMO.] > > +Turn RFC-1522 names into the UTF-8 equivalent. [#61342 et al] > > Bug reference should probably live in a comment rather than being > exported to the documentation etc. Yeah, that's just a personal thing. I usually stick more detailed documentation into POD than most people do. > > + > > +=cut > > + > > +# Set up the default rfc1522 decoder, which turns all charsets that > > +# are supported into the appropriate UTF-8 charset. > > +MIME::WordDecoder->default(new MIME::WordDecoder(['*' => sub {my > > ($data,$charset) = @_; > > + $charset =~ > > s/^(UTF)\-(\d+)/$1$2/i; > > + return $data > > unless utf8_supported_charset($charset); > > + return > > to_utf8({-string => $data, > > + > > -charset => $charset, > > + }); > > + }, > > + ],)); > > This should be in a BEGIN block, I think. I've also reformatted to 80 > columns. It doesn't matter in this case. I didn't make it a BEGIN block, because nothing should be calling it until after all BEGIN blocks have been run. [And I tend not to stick things in BEGIN blocks unless they actually have to be there.] > I realise it's a database format change, but I'd really prefer to > have the metadata files be pure UTF-8, so that we don't have to > process them for display every time, and to make things like > searching easier. We can always write a migration script. I think that's the optimal solution too. However, this patch at least will work now, and we can move to pure UTF-8 later. > > + # XXX This is broken. The BTS shouldn't be writing > > + # HTML into the log... but it does. > > + $this .= de_rfc1522($_); > > Not sure that qualifies for an XXX comment here. The right place to > note the design flaw is really in Debbugs/Log.pm, which I've done. > Debbugs::Log also provides a better basis for designing the > migration to a new, less broken-by-design record type. > > Again, I think the html record should really be in UTF-8, and a > migration script written. Yeah, no argument here. That XXX comment was basically a warning for anyone looking at that segment of code, because that string really isn't just a rfc1522 header. [And I didn't want to make changes in unrelated parts of the code if I could avoid it.] > And again. Basically, I think mail text should be preserved as-is in > the log (since it means you can send out notifications by mail and > preserve the MIME structure as it was received), but everything else > counts as metadata and should be in a canonical form. Definetly. > I'd suggest instead simply: > > - $header{$v} = $_; > + $header{$v} = de_rfc1522($_); > > ... and likewise in scripts/service.in. That way, all metadata > trivially gets decoded with very little effort, and is stored in > canonical form as suggested above. Then we'll also have to do the migration. This bit of code won't hurt if the header has already been run through de_rfc1522, though. > > @@ -430,6 +434,7 @@ > > X-$gProject-PR-Package: $data->{package} > > X-$gProject-PR-Keywords: $data->{keywords} > > Reply-To: [EMAIL PROTECTED] > > +Content-Type: text/plain; charset="utf-8" > > > > This is an automatic notification regarding your $gBug report > > #$ref: $data->{subject}, > > This gets more complicated, and is one of the reasons I'd been > avoiding this work. The right way to do this, and fix a few other > bugs along the way, is to attach the text of the original report > using MIME rather than simply appending it; that way, it can have > its original content-type, including character set. So, I'll avoid > this part for the time being. We won't be making anything > significantly worse by doing this in two stages anyway, since it's > already broken. > > Do we need to re-RFC1522-encode headers in outgoing messages? Yes... which is one of the reasons why I didn't play with the RFC1522 encoding in the log. Don Armstrong -- I never until now realized that the primary job of any emoticon is to say "excuse me, that didn't make any sense." ;-P -- Cory Doctorow http://www.donarmstrong.com http://rzlab.ucr.edu -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]