Quoting Marco Antoniotti ([EMAIL PROTECTED]):
> following this thread, it seems to me that what would be extremely 
> valuable would be a reasoned comparison of how different CL 
> implementations (including the commercial ones) support Unicode w.r.t. 
> the ANSI standard.

Yes.

Some possible questions (and partial answers for cmucl, acl, and sbcl):

  * Which range of characters does an implementation support?
    CMUCL: 8 bit
    Allegro: 8 bit / 16 bit (different images shipped)
    SBCL: 8 bit / 24 bit (compilation option)


  * Unicode support:
    CMUCL: no
    Allegro: Certainly support for 16-bit unicode (I think that's
      Unicode 1.0).  Not sure about Unicode 2.0.  To represent more
      characters, some implementions use UTF-16 internally (Java does
      that these days).  I think Allegro allows use of surrogates in its
      strings, but I'm not sure how much its string-related functions
      actually understand about them.
    SBCL: yes


  * External formats:
    CMUCL: n/a, the character code is written as an octet
    Allegro: ISO-8859-1 to -9, ISO-8859-14 and -15, koi8-r, CP 1250-58, UTF-8
      big5, gb2312, euc, CP 874, 932, 936, 949, 950, jis, shiftjis
      Support for composed external formats: CRLF line endings
    SBCL: fill me in


  * Routines to map octet sequences to strings and back.
    CMUCL: no
    Allegro: string-to-octets, octets-to-string, string-to-native,
    SBCL: string-to-octets, octets-to-string
     (albeit with fewer helpful keyword arguments)

    flexi-streams provide an emulation for this

    Compare and discuss different APIs here.


  * Support for bivalent streams (particularly helpful on unicode-aware
    Lisps, because characters don't trivially map to octets any more):
    CMUCL: don't know
    Allegro: yes (simple streams)
    SBCL: yes (fd streams; simple streams bivalent but not quite finished)

    flexi-streams provide an emulation for this


  * Support for (setf external-format) (like bivalent streams, this is
    something that could, e.g. help XML parsers that don't know the
    external format before having read the XML declaration)

    CMUCL: no (?)
    Allegro: yes
    SBCL: yes (?)

    flexi-streams provide this


I'm sure there are other interesting features and issues.  Perhaps
sb-unicode people have something to say to this?

> Now: don't look at me for actually doing this.  I have no time.  I just 
> think it is a good idea.

Me too, but perhaps the above helps.


d.
_______________________________________________
Gardeners mailing list
[email protected]
http://www.lispniks.com/mailman/listinfo/gardeners

Reply via email to