On Thu, May 27, 2010 at 18:04:36 +0100, Eric Kow wrote:
> readFileName :: FileNameFormat -> B.ByteString -> FileName
> readFileName OldFormat = fp2fn . decodeWhite . unpackPSFromUTF8
> readFileName NewFormat = fp2fn . decodeWhite . BC.unpack
> 
> formatFileName :: FileNameFormat -> FileName -> Doc
> formatFileName OldFormat = packedString . packStringToUTF8 . encodeWhite . 
> fn2fp
> formatFileName NewFormat = text                            . encodeWhite . 
> fn2fp

> So in the OldFormat we seem to assume that Darcs.Patch.FileName uses Unicode
> filenames encoded in UTF-8.  Does this mean that in the NewFormat, we just
> treat filenames as just sequences of bytes? 

Argh, sorry I'm being super super sloppy in my thinking here.

Darcs.Patch.FileName is just a newtype around String so as far as Darcs
is concerned, FileNames are Unicode Strings.  The question is just what
happens when you encode/decode.

In the old-style, whenever we have a bytestring that corresponds to a
filename, we assume it was UTF-8 encoded (which seems wrong, which may
be the motivation behind the switch).  In the new-style, we *seem* to
assume it's a superset of ISO 8859-1 (ie. each octet = Unicode code
point, unless Data.ByteString.Char really is checking to see if the code
point corresponds to holes in the code page, which I).

But in any case, Unicode inside.
It's the conversion to/from that worries me...

Anyway, I'm going to shut up before I add any misinformation to the mix.
Hopefully Reinier is paying attention and will yell at any stupid things
I've said,

-- 
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
PGP Key ID: 08AC04F9

Attachment: signature.asc
Description: Digital signature

_______________________________________________
darcs-users mailing list
[email protected]
http://lists.osuosl.org/mailman/listinfo/darcs-users

Reply via email to