On Thu, Oct 24, 2013 at 7:30 AM, Sandro Magi <[email protected]> wrote:

> On 24/10/2013 10:10 AM, Jonathan S. Shapiro wrote:
>
>>
>> One consequence is that HTML is not inherently a line-oriented format.
>> You /can/ maintain a file in line-oriented form if you do so by hand, but
>> the HTML editors generally won't.
>>
>
> Seems silly, considering an LF doesn't take any more storage than a space
> char, and the output is then human readable, and by other tools (like diff
> as you mentioned).


There's nothing *wrong* with emitting NL's, but the tool has no way to know
when to insert them. Paragraph flow/fill is a consequence of rendering; it
isn't part of the data model. You're walking a DOM tree, possibly without
access to a DTD, so you don't necessarily know which elements have
significant white space and which don't.  And even if you do, the rendering
properties of the space can be altered by the CSS white-space property.

In XML, the standard specifically says that the lexer/parser *can't* remove
white space. I'm not sure what the rule was in SGML.

What I think is happening with the editing applications is that they are
"cleaning" the input so that the editor can behave in a sane fashion. If NL
is insignificant for rendering purposes, it can be a little mind bending to
do cursor management properly. Did I click before or after the NL? Should
the current selection be able to include something I can't see is there?
For WYSIWYG purposes, does an NL render as a line break (which is what
<br/> is for) or not? If it doesn't, how does the user know where they are?
 The smart move here is "show white space" for NL and TAB, but that
confuses a lot of users.

>From the editor perspective, you also don't want CR to be an input
character. In most contexts, you want CR to mean "end all current elements
out to the most closely containing vertical element, start a new vertical
element of the same type, and insert any non-conditional elements that the
DTD requires for that vertical element". Which is actually pretty tricky,
since the DTD doesn't tell you what elements are vertical and what elements
are horizontal. That's a CSS property, and no two CSS specifications need
to agree about the answer (possibly with reason).

All of which is a very long-winded way of saying that WYSIWYG editing
[X]HTML/XML is a very hard problem with a lot of context ambiguity. It
isn't an accident that useful free HTML editors still don't exist.

shap
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to