[freenet-dev] Attribute reordering in HTML filter

Spencer Jackson Sun, 9 May 2010 07:06:43 -0500

On Sat, May 8, 2010 at 10:35 PM, Evan Daniel <evanbd at gmail.com> wrote:


> On Sat, May 8, 2010 at 11:38 AM, Matthew Toseland
> <toad at amphibian.dyndns.org> wrote:
> > On Saturday 08 May 2010 05:09:07 Evan Daniel wrote:
> >> On Fri, May 7, 2010 at 11:43 PM, Spencer Jackson
> >> <spencerandrewjackson at gmail.com> wrote:
> >> > On Fri, 2010-05-07 at 12:40 +0100, Matthew Toseland wrote:
> >> >> On Thursday 06 May 2010 20:40:03 Spencer Jackson wrote:
> >> >> > Hi guys, just wanted to touch base. Anyway, I'm working on
> resolving bug
> >> >> > number 3571( https://bugs.freenetproject.org/view.php?id=3571 ).
> To
> >> >> > summarize, the filter tends to reorder attributes at semirandom
> when
> >> >> > they get parsed. While the structure which holds the parsed
> attribute is
> >> >> > a LinkedHashMap, meaning we should be able to stuff in values and
> pull
> >> >> > them out in the same order, the put functions are called in the
> derived
> >> >> > verifier's overrided sanitizeHash methods. These methods extract an
> >> >> > attribute, sanitize it, then place it in the Map. The problem is,
> they
> >> >> > are extracted out of the original order, meaning they get pulled
> out of
> >> >> > the Map in the wrong order. To fix this, I created a callback
> object
> >> >> > which the derived classes pass to the baseclass. The baseclass may
> then
> >> >> > parse all of the attributes in order, invoking the callback to
> >> >> > sanitize.If an attribute's contents fails to be processed, an
> exception
> >> >> > may be thrown, so that the attribute will not be included in the
> final
> >> >> > tag.
> >> >>
> >> >> It is important that only attributes that are explicitly parsed and
> understood are passed on, and that it doesn't take extra per-sanitiser work
> to achieve this. Will this be the case?
> >> >>
> >> >
> >> > Yeah, this should be the case.  Attributes which don't have a callback
> >> > stored simply aren't parsed. I am starting, however, to think this
> >> > approach might be overkill.  Here I have a different take:
> >> >
> http://github.com/spencerjackson/fred-staging/tree/HTMLAttributeReorder
> >> > Instead of running a callback in the base class, I simply create the
> >> > attributes, in order, with null content. Then, in the overloaded
> methods
> >> > on the child classes I repopulate them with the correct data. This
> >> > preserves the original order of the attributes, while minimizing the
> >> > amount of new code that needs to be written. What do you think? Which
> >> > solution do you think is preferable?
> >>
> >> Do attributes without content still get written?  Is that always
> >> valid?  Not writing them isn't always valid; see eg bug 4125: current
> >> code happily removes required attributes from <meta> tags, thus
> >> breaking valid pages.
> >>
> >> Depending how much cleaning of the HTML filtering system you want to
> >> do...  Has using something like JTidy ( http://jtidy.sourceforge.net/
> >> ) been discussed?  That way you wouldn't have to worry about what's
> >> valid or invalid HTML, merely the security aspects of valid HTML that
> >> are unique to Freenet.
> >
> > IMHO sajack's solution is acceptable, you will have to just use null to
> indicate no attribute and "" to indicate an attribute with no value? Or is
> there a difference between attributes with an empty value and attributes
> with no value?
> >
>
> >It sounds fine to me, provided it doesn't take validating html and
> >make it stop validating.  Or at least does so no more than the current
> >code.
> >
> >I'm asking what will happen when the attribute has null content
> >because the filter couldn't find anything to fill it with; does that
> >get written as <tag attribute=""> or <tag> or something else?
> >Whichever it is, do we know that the result will be valid html?
> >
> >The current filter turns eg
> ><meta http-equiv="Content-type"
> content="application/xhtml+xml;charset=UTF-8" />
> >into
> ><meta />
> >
> >The first is valid xhtml, the second is not.  Run the w3c validator
> >against my flog, both filtered an unfiltered, for details.  So, how
> >will the new filter handle cases like this, where filter code hasn't
> >been completely written for all relevant aspects?
>
> >Evan Daniel
>

Ah, okay, then I am aware of this. I think that removing these attributeless
tags has to be done on a tag by tag basis. There are plenty of tags which
are valid with no attributes.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20100509/ec6cb2f8/attachment.html>

[freenet-dev] Attribute reordering in HTML filter

Reply via email to