Re: patch applied (cabal): Make UTF-8 decoding errors in .cabal files non-fatal

Ross Paterson Thu, 27 Mar 2008 10:20:35 -0700

On Thu, Mar 27, 2008 at 04:39:17PM +0000, Duncan Coutts wrote:
> Can't we just reject them with the error message and ask people to fix the
> latin-1 sequences and re-upload using proper UTF-8?


The problem is that there are packages there now with .cabal files
assuming Latin-1.  Stopping more of them from getting in is fine, but
we need to display the ones that are there correctly.

Hmm, after considering a few schemes it's probably simplest to introduce
strict enforcement on upload and retroactively patch the existing Latin-1
packages to UTF.  Naughty, but a one-off.

> You suggested previously that we should add a warning for the cases where an
> isolated latin-1 char in someone's name turns out to be valid UTF-8 (but
> encoding for an unexpected char). I think that's a good idea. Obviously that'd
> want to be a non-fatal warning. Hmm, I now can't find the note where you made
> that suggestion. Can you give more details on how that check would work 
> exactly?

The common case is ASCII char, non-ASCII char, ASCII char.  That's not a
valid UTF-8 sequence, but fromUTF is erroneously accepting it.  It needs
to tighten up to keep these errors out.

Incidentally, a UTF decoder is also supposed to reject non-minimal
encodings, e.g. a 3-byte encoding for a Char that can be encoded in
2 bytes.  That's to force canonical encodings for security.

_______________________________________________
cabal-devel mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/cabal-devel

Re: patch applied (cabal): Make UTF-8 decoding errors in .cabal files non-fatal

Reply via email to