On Thu, Oct 24, 2013 at 8:17 AM, Sandro Magi <[email protected]> wrote:

> On 24/10/2013 10:48 AM, Jonathan S. Shapiro wrote:
>
>> What I think is happening with the editing applications is that they are
>> "cleaning" the input so that the editor can behave in a sane fashion. If NL
>> is insignificant for rendering purposes, it can be a little mind bending to
>> do cursor management properly.
>>
>
> But the inherent rendering behaviour is irrelevant, since the rendering is
> (and should be) completely defined by the linked CSS. Therefore there is no
> sane way to auto-format HTML without possibly introducing rendering changes.
>
> In which case, you might as well just use NL to format all opening/closing
> tags, except for the small number of default exceptions where formatting
> inherently matters, and we expect it to, like textarea and pre.


Except that there are contexts where white space really cannot be removed,
and in the absence of CSS you don't actually *know* that textarea and
pre have significant whitespace.

But it's more subtle than that. Consider:

  <em>emphasized </em> text   -- two spaces in there

  <ol>
     ...
     </li>   -- the CR here could be significant input, or maybe not.
   <ol>

For humans reading the raw input you want those carriage returns, but from
an XML perspective they are significant text. To make matters worse, the
HTML DTD is very permissive about where CDATA (text) is permitted in the
content model. What we sort of want here is something that says "allow
whitespace, but ignore it, in any element that doesn't take CDATA". That's
information you can extract out of the DTD, assuming you know what DTD is
in use. But the sloppiness in HTML combined with the tolerance of browsers
conspires to make authors incredibly sloppy about this.

The OSDoc processing chain actually has a cleanup pass specifically to deal
with white space where it doesn't belong. It goes to some lengths to remove
white space from elements whose content models do not (or at least
*should*not) permit CDATA (like OL and LI)) and to normalize some
whitespace rules
on horizontal elements (like moving leading and trailing whitespace outside
things like <em>. But it's using heuristics for all of that.
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to