On 23 May 2009, at 13:34, Julian Reschke wrote:

For this to make sense in real HTML implementations, the definition should be in terms of the document layer rather than the byte layer.

Disagreed. Many implementations never build a DOM. We're not only talking about browsers here.
By "DOM" I generally mean any kind of tree structure of elements and attributes, either as an explicit data structure (DOM, XOM, ElementTree) or implicit (SAX). Would any RDFa implementation *not* parse the input HTML into that kind of structure and operate over the elements and attributes as distinct objects? (e.g. would they just use regular expressions over the input byte stream? That seems quite infeasible to me...)

Depends on the definition of "tree structure". I've been involved in code that just uses a tokenizer and specialized stack, and implementations like these will not do the re-arranging of elements the HTML5 spec specifies for some kinds of broken input.

Still specifying it relative to a DOM is still not problem, as you can incur the elements and text nodes from the token stream, until you reach the point where you are required by HTML 5 to throw a fatal error (i.e., when you can no longer parse per spec with the stream, as you can't reorder the elements).


--
Geoffrey Sneddon
<http://gsnedders.com/>


Reply via email to