> Just chiming in here to say that while we don't offer a createFragment() in > this proposal, it's possible to parse fragments by passing the > LIBXML_HTML_NOIMPLIED option. Alternatively, in the future I plan to offer > innerHTML which you could use then in conjunction with > createDocumentFragment().
It’s not my understanding that this is right here, because fragment parsing implies more than having or not having the HTML and BODY elements implicitly. > Sets HTML_PARSE_NOIMPLIED flag, which turns off the automatic adding of >implied html/body... elements. The HTML5 spec defines fragment parsing as starting within a context node which exists within a broader document. For example, many people will parse a string of HTML that should form the contents of an LI element. They are grabbing that HTML from a database somewhere, from user input. If that HTML contains “</li>” then our behavior diverges. In a fragment parser it would close out the list we started with but in full document parsing mode the end tag would be ignored, a parse error. If the goal is to ensure that user input doesn’t break out and change the page, then it’s important to use fragment parsing and grab the inner contents of that LI context node. This can be valuable to have as a tool to guard against injection attacks or against accidentally breaking the page someone is building, because the fragment parser is aware of its environment. It becomes even more important when parsing within RCDATA or RAWTEXT sections. For example, if wanting to parse and analyze or manipulate a web page’s title then the parser should treat everything as plaintext until it reaches the end or encounters a closing TITLE tag. If trying to do this with `createFromString()` then it’s up to the caller to remember to prepend and then remove the environment, `createFromString( ‘<title>’ . $page_title . ‘</title>’ )`. The fragment parser would be similar in practice, but more explicit and hard to misunderstand in these circumstances. This is complicated stuff. I understand that the spec provides for a wide variety of use-cases and needs, and that it’s hard to pin down exactly what a spec-compliant parser is supposed to do in all situations (it depends), so I’m only wanting to share from the perspective of people doing a lot of small HTML manipulation. There’s not much code out there using the fragment parser, but I can’t help but think that part of the reason is because it’s not exposed where it ought to be. Have a great weekend! Dennis Snell >