[freenet-dev] Attribute reordering in HTML filter

Evan Daniel Sat, 8 May 2010 00:09:07 -0400

On Fri, May 7, 2010 at 11:43 PM, Spencer Jackson
<spencerandrewjackson at gmail.com> wrote:
> On Fri, 2010-05-07 at 12:40 +0100, Matthew Toseland wrote:
>> On Thursday 06 May 2010 20:40:03 Spencer Jackson wrote:
>> > Hi guys, just wanted to touch base. Anyway, I'm working on resolving bug
>> > number 3571( https://bugs.freenetproject.org/view.php?id=3571 ). To
>> > summarize, the filter tends to reorder attributes at semirandom when
>> > they get parsed. While the structure which holds the parsed attribute is
>> > a LinkedHashMap, meaning we should be able to stuff in values and pull
>> > them out in the same order, the put functions are called in the derived
>> > verifier's overrided sanitizeHash methods. These methods extract an
>> > attribute, sanitize it, then place it in the Map. The problem is, they
>> > are extracted out of the original order, meaning they get pulled out of
>> > the Map in the wrong order. To fix this, I created a callback object
>> > which the derived classes pass to the baseclass. The baseclass may then
>> > parse all of the attributes in order, invoking the callback to
>> > sanitize.If an attribute's contents fails to be processed, an exception
>> > may be thrown, so that the attribute will not be included in the final
>> > tag.
>>
>> It is important that only attributes that are explicitly parsed and 
>> understood are passed on, and that it doesn't take extra per-sanitiser work 
>> to achieve this. Will this be the case?
>>
>
> Yeah, this should be the case. ?Attributes which don't have a callback
> stored simply aren't parsed. I am starting, however, to think this
> approach might be overkill. ?Here I have a different take:
> http://github.com/spencerjackson/fred-staging/tree/HTMLAttributeReorder
> Instead of running a callback in the base class, I simply create the
> attributes, in order, with null content. Then, in the overloaded methods
> on the child classes I repopulate them with the correct data. This
> preserves the original order of the attributes, while minimizing the
> amount of new code that needs to be written. What do you think? Which
> solution do you think is preferable?


Do attributes without content still get written?  Is that always
valid?  Not writing them isn't always valid; see eg bug 4125: current
code happily removes required attributes from <meta> tags, thus
breaking valid pages.

Depending how much cleaning of the HTML filtering system you want to
do...  Has using something like JTidy ( http://jtidy.sourceforge.net/
) been discussed?  That way you wouldn't have to worry about what's
valid or invalid HTML, merely the security aspects of valid HTML that
are unique to Freenet.

Evan Daniel

[freenet-dev] Attribute reordering in HTML filter

Reply via email to