> > 5 years ago. Plaintext remains a far more powerful concept and XML is > > mostly a markup mechanism designed to overcome deficiencies in ASCII > > that appears rather clumsy in a pure Unicode plaintext world. > > This characterization of XML is about as silly as it is possible to get. > You might just as well claim that XML is designed to overcome world > hunger. The one place where XML tries to overcome a specifically-ASCII > limitation is where it *mandates Unicode support*.
I think you are misinterpreting the quotation. What is meant (judging from parts not quoted here) is: XML character entities, language tags and the like (as opposed to document type specific elements such as "field 47 of a patent application") are ... rather clumsy ... Indeed there are many levels between character coding and document structures and imho even the way SGML/XML handle document structures are debatable. It seems thus a good idea to standardise some of the in-between things independently of SGML. STX is one approach. Another would be to have a generic way of assigning characteristics to a certain chunk of text, quite independent of whether these are language tags or something else, making the semantics user-defined. The basic form of this imho universal construction is esc begin sep arg0 sep arg1 [ sep arg2 [ sep ... ] ] end where only 'esc' needs to be a unicode-defined character. 'begin' and 'end' could be any user-chosen bracket pair and 'sep' could be any character that is by its position defined to be the argument separator character. arg0 would usually refer to something defined in the user's hypertext system, e.g. a language tag or an emphasis tag. There could be any number of arguments, from zero to infinite, but in most cases one would have one argument: a piece of normal unicode text (which may not contain the 'sep' character). Thus, assuming I define esc: % begin, end: ( ) sep: | I could have the following expansions this is %(bold|not) true. ==> this is <bold>not</bold> true. SGML stands for %(SGML) ==> SGML stants for Standard Hypertext Markup Language My %(ref|mlht|system) system also uses this syntax ==> My <a href="http://mlht.ffii.org">system</a> also ... or, mixed language text for perfect multilingual typesetting in the following manner: %(lang|ja|Nihonjin no tame no %(lang|zh|Zhongwen) kyoukasyo) Imho the content side of 'lang' 'ja' etc does not need to be in Unicode. It is enough to have only the 'esc' symbol, i.e. the 'universal functional expression prefix', as a Unicode character and leave the rest to users to fill with life, perhaps proposing a few handy conventions such as the one above. Mapping such conventions to XML is of course easy, and I would also propose directly integrating them into STX (structured text). Indeed some books have been typeset in STX (e.g. the new Zope Book), which is very close to plain text. Extending STX with the above-proposed 'universal function expression' syntax would make it so powerful that any SGML/XML-based markup would be very rarely needed. -- Hartmut Pilch http://phm.ffii.org/ Protecting Innovation against Patent Inflation http://swpat.ffii.org/ 100,000 signatures against software patents http://www.noepatents.org/ -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/