On , Hallvord R. M. Steen <hallv...@opera.com> wrote:
Hi, a question related to the evolving draft on http://www.w3.org/TR/clipboard-apis/ (which actually is slightly better styled on http://dev.w3.org/2006/webapi/clipops/clipops.html - I should figure out why ;-)) We want to enable some sort of access to HTML code if the user pastes formatted text from an application that places HTML on the clipboard. However, the browser will have to implement some security restrictions (see relevant section of the spec - though it's just a draft), and in some cases process the HTML to deal with embedded data when there are sub-parts on the clipboard. To handle both security algorithms and any embedded data, the browser will probably need to parse the HTML. So actually, when you call event.clipboardData.getData('text/html') the browser will get HTML from the clipboard, parse it, do any work required by security and data embeds rules on the DOM, and then serialize the code (possibly after modifying the DOM) to pass it on to the script. Of course the script will want to do its own processing, which will probably at some point require parsing the code again.. So, to make things more efficient - would it be interesting to expose the DOM tree from the browser's internal parsing? For example, we could define event.clipboardData.getDocumentFragment() which would return a parsed and when applicable sanitized view of any markup the implementation supports from the clipboard. Rich text editors could now use normal DOM methods to dig through the contents and remove/add cruft as they wish on the returned document fragment, before doing an appendChild() and preventing the event's default action. Thoughts?
This is already covered by doing x=createElement;x.innerHTML=foo;traverse x Regarding simplifying the pasted html to remove stuff that could be malicious, consider a rogue app that injects a script in the clipboard and expects the user to hit paste on his bank site. There is little the user agent can do but to provide quick and easy methods to sanatize this. There is already the toStaticHTML API that IE implements. I would suggest supporting and implementing it. Or even add a sister property of innerHTML, innerStaticHTML which would not return scripts or event handlers on reading, and would parse out those when setting.