Sam Ruby wrote:
Julian Reschke wrote:
(<http://atompub.org/2005/01/27/draft-ietf-atompub-format-05.html#rfc.section.3.1.1>)
The spec currently says:
"If the value of "type" is "HTML", the content of the Text construct MUST NOT contain child elements, and SHOULD be suitable for handling by software that knows HTML. The HTML markup must be escaped; for example, "<br>" as "<br>". The HTML markup SHOULD be such that it could validly appear directly within an HTML <DIV> element. Receiving software which displays the content MAY use the markup to aid in displaying it."
Is there anything that we can say about what recipients should do if they are not prepared to tag-soup-parse HTML content (such as something based on XSLT1 in Mozilla or running in a size-constrained environment (does MIDP come with an HTML parser)? Skip the entry? Do not display the content? Display the content including the escaped markup as plain text?
I would suggest stripping the tags. In Perl, something like this:
s/<.*?>//g
Thanks. Are we 100% confident that whatever results from that replacement can be safely embedded? For instance, what about <script> tags? Can they contain potentially dangerous code that would execute without being referenced from somewhere?
Shouldn't we at least give content producers the hint that producing XHTML content is preferred over HTML? (sorry if I'm opening a can of worms here)
Depending on the target environment, stripping the elements in XHTML may also be appropriate.
Sure, but for XHTML, the XML parser already contains the necessary machinery.
Anyway, the spec offers to alternatives (HTML and XHTML) that cover similar use cases. I think it would be good if it made a recommendation at least for those cases, where the producer already has XHTML content.
Best regards, Julian
-- <green/>bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760