Hi following this thread, it seems to me that what would be extremely valuable would be a reasoned comparison of how different CL implementations (including the commercial ones) support Unicode w.r.t. the ANSI standard.
Now: don't look at me for actually doing this. I have no time. I just think it is a good idea. Cheers -- Marco On Jan 17, 2006, at 1:12 PM, David Lichteblau wrote: > Quoting Peter K.Lee ([EMAIL PROTECTED]): >>> * What does the "char-sets" column mean? It says "UTF-8 w/o >>> Unicode" for >>> cxml; I can't make sense of that. >> Me neither. :) But that is how it is reported in the cxml page. > > I take that to mean that the CXML documentation is not elaborate enough > on this. Do you have a suggestion where in the documentation to write > more about it? What kind of information would you have liked to see? > >> Other parsers make cursory notes about character sets it supports as >> well. I'd be happy to update the column to make it more sane if >> someone can shed some light on what it really means... > > Well, partly I was asking what the column was meant to be about. > > UTF-8 is not a character set, it's an encoding. > > * The "character set" XML parsers use is, by definition, Unicode. > Every XML parser must deal with Unicode. > > * A different question is which "encodings" a parser supports. Now, > every > parser is required by the spec to support both UTF-8 and and UTF-16. > If it doesn't, that's a topic for a bugs section, not so much for a > features comparision. In a feature comparison, it would be > interesting > to know which *other* encodings a parser supports. > > For example, CXML seems to support iso-8859-n and koi8-r (hmm, > whatever > that is :-)) in addition to UTF-8 and UTF-16. > > (Ideally, an XML parser in Lisp [an a Unicode-ware implementation] > would support all external formats supported by the host Lisp, but > that can be a portability issue.) > > * Yet another question is which encodings the serializer supports. > > For example, CXML has built-in support for UTF-8 serializer (even > on > non-unicode aware Lisps) and leaves all other encodings to the host > Lisp. (Prepend your own XML declarations and use a character > stream > sink with the external-format you need.) > >>> * Somehow I'd like a column "Makes an effort to conform to the >>> standards". AFAIK only CL-XML and CXML qualify for a "yes" there. >> >> I'm not exactly sure how to quantify "making an effort to conform to >> the standards". It appears that XML syntax is a particular standard >> that all the XML parsing libraries conform to, and the rest of the > > Well, there is a indeed standard for XML 1.0 > http://www.w3.org/TR/REC-xml/ > and there is a very good test suite for that standard > http://www.w3.org/XML/Test/ > >> "techniques" of parsing vary widely. If the XML parser does not do >> validation, > > No, there are validating and non-validating parsers. The XML test > suite > has tests for both of them. It's fine for a parser to state that it > doesn't support validation, it is still a conforming non-validating > parser. > >> or provide the W3C DOM API, does that mean it is not >> making an effort to conform to the standards? > > A XML parser does not have to implement DOM by any means. It is > definitely an optional feature. If it does claim to implement it, it > should pass the DOM test suite, however. > > Same for XML namespaces. That is also an optional, separate > specification and covered by specially tagged tests in the XML > conformance test suite. > > >> -Peter > > Thanks, > David > _______________________________________________ > Gardeners mailing list > [email protected] > http://www.lispniks.com/mailman/listinfo/gardeners > -- Marco Antoniotti http://bioinformatics.nyu.edu/~marcoxa NYU Courant Bioinformatics Group tel. +1 - 212 - 998 3488 715 Broadway 10th FL fax. +1 - 212 - 998 3484 New York, NY, 10003, U.S.A. _______________________________________________ Gardeners mailing list [email protected] http://www.lispniks.com/mailman/listinfo/gardeners
