On Feb 6, 2006, at 08:17, Manuel Mall wrote:

[ME:]
<snip/>
A preserved carriage return can be treated the same way as a
linefeed, under the very exceptional condition that it survives white-
space handling:
  * white-space-treatment="ignore-if-*"
  * the CR does not follow/precede a linefeed
  * it is the first character in a sequence of whitespace, so
    it survives white-space-collapse


Shouldn't a CR always survive whitespace handling?

Not always:
If white-space-treatment="preserve" then any XML whitespace other than a linefeed is converted into a normal space. IMO, the editors put it this way because of the possibility of Windows-specific line- endings, where a linefeed is followed by a CR.

For a starters it is fairly difficult to get a CR out of a XML parser.

Difficult? It's simply a characters event, just like any other...

Only if the CR is hidden in an entity reference can it survive.
Also, as Simon pointed out in some other contribution, whitespace handling is designed to deal with pretty printing and readable XML layout introduced whitespace. A CR preserved by the XML parser certainly does not fall into
that category.

Oh yes it does... Remember that not all our users are unix/linux- based, which means for Windows users, you're likely to get the sequence '&#x0A;&#x0D;' as line-terminator, while Mac-users saving a source file with native line-endings will simply get a '&#x0D;'. (UTF-8 encoding is recommended, but not enforced... An XML file can be any encoding the parser supports on top of the UTF-8 minimum.)

A carriage-return can survive white-space-handling, for instance, in the following case (suppose Mac-encoding):

<fo:block>
  First line, then a CR&#x0D;     some spaces, and more text
</fo:block>

The CR (which isn't necessarily a Numerical Character Reference, but could be just the byte '0D') is not converted into a space (white- space-treatment="ignore-if-surrounding-linefeed").
It does not precede or follow a linefeed.
It is the first character in a sequence of whitespace, so no matter what the value of white-space-collapse, it will survive...

I am also not aware that the XSL-FO spec mentions CR as falling under whitespace. IMO
for whitespace handling CR is just a non whitespace character.

Nope, it does fall into the category of XML whitespace. There are exactly four of those: &#x09; (tab), &#x0A; (linefeed), &#x0D; (carriage-return) and &#x20; (space). If you don't believe me, it's indeed not in the XSL-FO Rec, but you might want to check the XML Recommendation...

So, we only need to consider what fop layout should do if it encounters a
CR. I would say, keep it simple, throw it away and log a warning.

Now, what about a tab character under the same circumstances? Do we
use an elastic width of X spaces optimum, where X is purely
conventional?


Similar considerations as for CR apply to TAB.

...

Cheers,

Andreas

Reply via email to