At 11:29 16/02/04 +0000, Ross Paterson wrote:
On Mon, Feb 16, 2004 at 10:20:30AM -0000, Simon Marlow wrote:
...
> It shouldn't be too hard to fix this, at least for Latin-1 (full
> Unicode would be somewhat harder).  I'll add it to the TODO list.

While Haskell's source charset is specified as Unicode, Haskell source
files don't specify the byte encoding they use, so any source file using
non-ASCII characters isn't portable.  Entrenching Latin-1 would make the
move to Unicode more difficult.

Ah, yes. I was going to suggest that for generating XHTML, it should be easy enough to generate &#xxxx; expansions, but that doesn't take account of not knowing the input encoding. Maybe the XML conventions for encoding designation (UTF-8, UTF-16 big-endian, UTF-16 little-endian) might be applicable?


Also:
> Defaulting to Latin-1 may be sensible, though?

It may seem so to western europeans, but others may differ.
A case could be made for UTF-8.

I tend to agree. Further, the choice of defaulting to Latin-1 seems a strange one when much of the rest of the world (well, the networking world) seems to be moving towards more universal character set encodings. For example, URIs, XML, and UTF-8 has been the IETF "preferred option" since early 1998:


[[
    Protocols MUST be able to use the UTF-8 charset, which consists of
    the ISO 10646 coded character set combined with the UTF-8
    character encoding scheme, as defined in [10646] Annex R
    (published in Amendment 2), for all text.

    Protocols MAY specify, in addition, how to use other charsets or
    other character encoding schemes for ISO 10646, such as UTF-16,
    but lack of an ability to use UTF-8 is a violation of this policy;
    such a violation would need a variance procedure ([BCP9] section
    9) with clear and solid justification in the protocol
    specification document before being entered into or advanced upon
    the standards track.

    For existing protocols or protocols that move data from existing
    datastores, support of other charsets, or even using a default
    other than UTF-8, may be a requirement. This is acceptable, but
    UTF-8 support MUST be possible.

    When using other charsets than UTF-8, these MUST be registered in
    the IANA charset registry, if necessary by registering them when
    the protocol is published.
]]
-- http://www.ietf.org/rfc/rfc2277.txt

#g


------------ Graham Klyne For email: http://www.ninebynine.org/#Contact

_______________________________________________
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell

Reply via email to