On Thu, Mar 27, 2008 at 08:45:29AM -0700, Duncan Coutts wrote: > Wed Mar 26 20:17:40 PDT 2008 Duncan Coutts <[EMAIL PROTECTED]> > * Make UTF-8 decoding errors in .cabal files non-fatal > Previously we checked for invalid UTF-8 in the first phase of the > parser, which splitting the file up into nested sections and fields. > This meant the whole parser falls over if there is invalid UTF-8 > anywhere in the file. Sadly there are already packages on hackage > with invalid UTF-8 so we would fail when parsing the hackage index. > The solution is to move the check into the parsing of the individual > fields and making it a warning not an error. We most typically get > invalid UTF-8 in free text fields like author name, copyright, > description etc so this should work out ok usually. > We now get pretty decent error messages, like: > Warning: hsx.cabal:5: Invalid UTF-8 text in the 'author' field. > The warning type is now structured so that hackage will be able to > distinguish general non-fatal warnings from UTF-8 decoding problems > which really should be fatal errors for package uploads.
These invalid UTF-8 strings are usually valid Latin-1 in people's names, which the web interface needs to show. So would it be possible give the warning, but either to treat bytes that comprise an encoding error as Latin-1 Chars, or to reparse a string (or file) with UTF errors as a Latin-1 string? In almost all cases, the problematic sequence is a single non-ASCII byte surrounded by ASCII bytes. _______________________________________________ cabal-devel mailing list [email protected] http://www.haskell.org/mailman/listinfo/cabal-devel
