Billy Wong wrote:
On 1/25/06, Lachlan Hunt <[EMAIL PROTECTED]> wrote:
I'm not saying it won't break anything, but every single change we make
to the parsing could possibly break any number of the billions of pages
on the web in any number of browsers.

But using your method (swapping inline node and block node) would
break presently valid and correct webpages.

Such pages are invalid because inline-level elements are not allowed to contain block-level elements. HTML pages containing the following:

<span>
  <div>...</div>
</span>

could be considered well-formed (if you apply the concept of well-formedness to HTML, even though it's not formally defined for it), but it's certainly not valid according to any official DTD.

If breaking things is unavoidable, I prefer breaking things which are written 
incorrectly.

No-one is intending to break anything that is written correctly.

My idea is very extreme but simple and effecient:
    Parse the page regardless of what between "</" & ">".  See what's
written inside the close-tag merely a visual clue.

Example: <span><div>X</span>Y</div>
+ span
  + div
    + #text: X
  + #text: Y

I'm kind of confused by what you're trying to do there. You seem to be implicitly closing the div immediately before the span. But then the Y doesn't seem to be a child of the span at all in the markup, it looks like it should be a child of the div, yet in your DOM, it's not a child of the div, but is of the span.

The DOM look equivalent to this markup:

  <span><div>X</div>Y</span>

which is insane.  It would make a little more sense if it were like this:

  + span
    + div
      + #text: X
  + #text: Y

In other words, it would be equivlant to this markup:

<span><div>X</div></span>Y

That is actually quite sane and is what OpenSP does with invalid HTML,. regardless of which elements are used (presumably according to some SGML rules), but it would not be compatible with the current state of the web at all, and so is not a real option.

To correctly written webpages, this should pose no problems.  To
incorrect webpages, they deserve it since the point they ask the UA to
use "standard mode".

In theory, that sounds nice, but you have to remember:

  "to a rough approximation, all the content on the Web is errorneous,
   invalid, or non-conformant." -- Hixie

So, to say "they deserve it" to 100% of the web (roughly speaking) isn't really an option, unfortunately. It's ok to say it to the most pathological of cases that depend on one particular browser's insane and undefined error recovery techniques, yet already breaks in everything else, but not to the whole web.

--
Lachlan Hunt
http://lachy.id.au/

Reply via email to