Firefox 2.0 includes an "RSS feed sniffer". What this means is that if a
file looks like RSS, regardless of its content type (tests show this
happening with text/plain and image/jpeg, but not text/html), it will be
treated as RSS and either displayed (previewed?) internally or passed to
a third party RSS reader app, including all inline images (potentially
web bugs).

It also supports a rather wider range of standards than our current
filter does, and the CSS filter is (as recently demonstrated) not a
really secure whitelist filter (the HTML filter is).

The above auto-detection algorithm goes something like this:

Search for "<rss", "<feed", or "<rdf:RDF". If this starts at the
beginning of the file, return true. If this is preceded by at least one
HTML tag, which starts with "<?" or "<!", return true. Otherwise return
false. (All this only considers the first 512 bytes of the file).

What are our options? We can run the same detection algorithm firefox
does, on _all_ content from fproxy, and force to disk anything which
matches. This relies on the detection algorithm remaining the same in
the future, which is not reliable. I will implement this hack later
today, but we shouldn't rely on it.

A more ambitious strategy:

- Whitelist browsers (and other clients) which show text/plain as plain
  text. All other browsers get an HTML-ised version of the plain text,
  with all potentially dangerous characters encoded.
- Parse all the image formats we currently pass through unchallenged.
- Update the CSS filter to a proper whitelisting parser.

PRO: Would work not only with Firefox 2.0 but also with Internet
Exploder.

While we're at it we could do worse than upgrading our filters to cope
with modern web standards (in decreasing priority / increasing effort
order):
- The HTML filter is HTML 4.01. We can upgrade this to XHTML 1.1, which
  is supported by FF 2.0, relatively easily.
- FF 2.0 supports CSS 2.1, and bits of CSS 3.0; we support CSS 2.0. We
  should upgrade our filter at the same time as making it a whitelisting
  filter.
- Parse and filter RSS feeds.
- FF 2.0 supports SVG. It would be reasonably easy to write a filter for
  (pure) SVG.
- FF 2.0 and IE both support XSLT. XSLT is turing-complete. The most
  practical way to deal with it is to run the transformation using JAXP
  on the node, filter the transformed data, and then return it to the
  browser. This is equivalent to running it on the web server, which is
  a common practice. It should be rather easy to do. The alternative,
  incorporating the entirety of our content filter into the XSLT
  stylesheet to run after it has, is technically possible but
  impractical.

  XSLT would allow freesite authors to do all sorts of fun things,
  without javascript, such as "exclude this category from the Recently
  Added list, then sort it by creation time".
- XPath is used primarily to serve XSLT. It may be used in documents to
  point to fragments or inline them however... it can produce text
  output, not sure whether that output can be treated as tags?
- FF supports MathML; this will not be implemented unless a volunteer
  comes forward.
- A javascript filter is entirely feasible IMHO, even though javascript
  is turing complete; all we have to do is feed the generated data back
  to the node. In some instances this can be optimised out. But this is
  a low priority item IMHO.
- XML in general (custom per-site types etc) would require further
  research; we don't know how a particular XML document will be handled,
  in general.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20061130/aa09d413/attachment.pgp>

Reply via email to