On Fri, May 7, 2010 at 11:43 PM, Spencer Jackson <spencerandrewjack...@gmail.com> wrote: > On Fri, 2010-05-07 at 12:40 +0100, Matthew Toseland wrote: >> On Thursday 06 May 2010 20:40:03 Spencer Jackson wrote: >> > Hi guys, just wanted to touch base. Anyway, I'm working on resolving bug >> > number 3571( https://bugs.freenetproject.org/view.php?id=3571 ). To >> > summarize, the filter tends to reorder attributes at semirandom when >> > they get parsed. While the structure which holds the parsed attribute is >> > a LinkedHashMap, meaning we should be able to stuff in values and pull >> > them out in the same order, the put functions are called in the derived >> > verifier's overrided sanitizeHash methods. These methods extract an >> > attribute, sanitize it, then place it in the Map. The problem is, they >> > are extracted out of the original order, meaning they get pulled out of >> > the Map in the wrong order. To fix this, I created a callback object >> > which the derived classes pass to the baseclass. The baseclass may then >> > parse all of the attributes in order, invoking the callback to >> > sanitize.If an attribute's contents fails to be processed, an exception >> > may be thrown, so that the attribute will not be included in the final >> > tag. >> >> It is important that only attributes that are explicitly parsed and >> understood are passed on, and that it doesn't take extra per-sanitiser work >> to achieve this. Will this be the case? >> > > Yeah, this should be the case. Attributes which don't have a callback > stored simply aren't parsed. I am starting, however, to think this > approach might be overkill. Here I have a different take: > http://github.com/spencerjackson/fred-staging/tree/HTMLAttributeReorder > Instead of running a callback in the base class, I simply create the > attributes, in order, with null content. Then, in the overloaded methods > on the child classes I repopulate them with the correct data. This > preserves the original order of the attributes, while minimizing the > amount of new code that needs to be written. What do you think? Which > solution do you think is preferable?
Do attributes without content still get written? Is that always valid? Not writing them isn't always valid; see eg bug 4125: current code happily removes required attributes from <meta> tags, thus breaking valid pages. Depending how much cleaning of the HTML filtering system you want to do... Has using something like JTidy ( http://jtidy.sourceforge.net/ ) been discussed? That way you wouldn't have to worry about what's valid or invalid HTML, merely the security aspects of valid HTML that are unique to Freenet. Evan Daniel _______________________________________________ Devl mailing list Devl@freenetproject.org http://osprey.vm.bytemark.co.uk/cgi-bin/mailman/listinfo/devl