Firefox 2.0 includes an "RSS feed sniffer". What this means is that if a file looks like RSS, regardless of its content type (tests show this happening with text/plain and image/jpeg, but not text/html), it will be treated as RSS and either displayed (previewed?) internally or passed to a third party RSS reader app, including all inline images (potentially web bugs).
It also supports a rather wider range of standards than our current filter does, and the CSS filter is (as recently demonstrated) not a really secure whitelist filter (the HTML filter is). The above auto-detection algorithm goes something like this: Search for "<rss", "<feed", or "<rdf:RDF". If this starts at the beginning of the file, return true. If this is preceded by at least one HTML tag, which starts with "<?" or "<!", return true. Otherwise return false. (All this only considers the first 512 bytes of the file). What are our options? We can run the same detection algorithm firefox does, on _all_ content from fproxy, and force to disk anything which matches. This relies on the detection algorithm remaining the same in the future, which is not reliable. I will implement this hack later today, but we shouldn't rely on it. A more ambitious strategy: - Whitelist browsers (and other clients) which show text/plain as plain text. All other browsers get an HTML-ised version of the plain text, with all potentially dangerous characters encoded. - Parse all the image formats we currently pass through unchallenged. - Update the CSS filter to a proper whitelisting parser. PRO: Would work not only with Firefox 2.0 but also with Internet Exploder. While we're at it we could do worse than upgrading our filters to cope with modern web standards (in decreasing priority / increasing effort order): - The HTML filter is HTML 4.01. We can upgrade this to XHTML 1.1, which is supported by FF 2.0, relatively easily. - FF 2.0 supports CSS 2.1, and bits of CSS 3.0; we support CSS 2.0. We should upgrade our filter at the same time as making it a whitelisting filter. - Parse and filter RSS feeds. - FF 2.0 supports SVG. It would be reasonably easy to write a filter for (pure) SVG. - FF 2.0 and IE both support XSLT. XSLT is turing-complete. The most practical way to deal with it is to run the transformation using JAXP on the node, filter the transformed data, and then return it to the browser. This is equivalent to running it on the web server, which is a common practice. It should be rather easy to do. The alternative, incorporating the entirety of our content filter into the XSLT stylesheet to run after it has, is technically possible but impractical. XSLT would allow freesite authors to do all sorts of fun things, without javascript, such as "exclude this category from the Recently Added list, then sort it by creation time". - XPath is used primarily to serve XSLT. It may be used in documents to point to fragments or inline them however... it can produce text output, not sure whether that output can be treated as tags? - FF supports MathML; this will not be implemented unless a volunteer comes forward. - A javascript filter is entirely feasible IMHO, even though javascript is turing complete; all we have to do is feed the generated data back to the node. In some instances this can be optimised out. But this is a low priority item IMHO. - XML in general (custom per-site types etc) would require further research; we don't know how a particular XML document will be handled, in general. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: Digital signature URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20061130/aa09d413/attachment.pgp>
