On 23 May 2009, at 13:34, Julian Reschke wrote:
For this to make sense in real HTML implementations, the
definition should be in terms of the document layer rather than
the byte layer.
Disagreed. Many implementations never build a DOM. We're not only
talking about browsers here.
By "DOM" I generally mean any kind of tree structure of elements
and attributes, either as an explicit data structure (DOM, XOM,
ElementTree) or implicit (SAX). Would any RDFa implementation *not*
parse the input HTML into that kind of structure and operate over
the elements and attributes as distinct objects? (e.g. would they
just use regular expressions over the input byte stream? That seems
quite infeasible to me...)
Depends on the definition of "tree structure". I've been involved in
code that just uses a tokenizer and specialized stack, and
implementations like these will not do the re-arranging of elements
the HTML5 spec specifies for some kinds of broken input.
Still specifying it relative to a DOM is still not problem, as you can
incur the elements and text nodes from the token stream, until you
reach the point where you are required by HTML 5 to throw a fatal
error (i.e., when you can no longer parse per spec with the stream, as
you can't reorder the elements).
--
Geoffrey Sneddon
<http://gsnedders.com/>