On Tuesday 07 February 2006 01:11, Andreas L Delmelle wrote: > On Feb 6, 2006, at 08:17, Manuel Mall wrote: > >> [ME:] > > > > <snip/> > > > >> A preserved carriage return can be treated the same way as a > >> linefeed, under the very exceptional condition that it survives > >> white- > >> space handling: > >> * white-space-treatment="ignore-if-*" > >> * the CR does not follow/precede a linefeed > >> * it is the first character in a sequence of whitespace, so > >> it survives white-space-collapse > > > > Shouldn't a CR always survive whitespace handling? > > Not always: > If white-space-treatment="preserve" then any XML whitespace other > than a linefeed is converted into a normal space. IMO, the editors > put it this way because of the possibility of Windows-specific line- > endings, where a linefeed is followed by a CR. > > > For a starters it is fairly difficult to get a CR out of a XML > > parser. > > Difficult? It's simply a characters event, just like any other... >
From the XML spec: <quote> S (white space) consists of one or more space (#x20) characters, carriage returns, line feeds, or tabs. White Space [3] S ::= (#x20 | #x9 | #xD | #xA)+ Note: The presence of #xD in the above production is maintained purely for backward compatibility with the First Edition. As explained in 2.11 End-of-Line Handling, all #xD characters literally present in an XML document are either removed or replaced by #xA characters before any other processing is done. The only way to get a #xD character to match this production is to use a character reference in an entity value literal. ... 2.11 End-of-Line Handling XML parsed entities are often stored in computer files which, for editing convenience, are organized into lines. These lines are typically separated by some combination of the characters CARRIAGE RETURN (#xD) and LINE FEED (#xA). To simplify the tasks of applications, the XML processor MUST behave as if it normalized all line breaks in external parsed entities (including the document entity) on input, before parsing, by translating both the two-character sequence #xD #xA and any #xD that is not followed by #xA to a single #xA character. <quote/> To me this means unless you define an entity <!ENTITY cr "
" > and then later reference it as &cr; you never get a CR out of an XML parser (even on Windows). > > Cheers, > > Andreas Regards Manuel
