Re: [freenet-dev] Attribute reordering in HTML filter

Matthew Toseland Mon, 10 May 2010 16:27:06 -0700

On Sunday 09 May 2010 02:35:57 Spencer Jackson wrote:
> tOn Sat, May 8, 2010 at 10:38 AM, Matthew Toseland <
> t...@amphibian.dyndns.org> wrote:
> 
> > On Saturday 08 May 2010 05:09:07 Evan Daniel wrote:
> > > On Fri, May 7, 2010 at 11:43 PM, Spencer Jackson
> > > <spencerandrewjack...@gmail.com> wrote:
> > > > On Fri, 2010-05-07 at 12:40 +0100, Matthew Toseland wrote:
> > > >> On Thursday 06 May 2010 20:40:03 Spencer Jackson wrote:
> > > >> > Hi guys, just wanted to touch base. Anyway, I'm working on resolving
> > bug
> > > >> > number 3571( https://bugs.freenetproject.org/view.php?id=3571 ). To
> > > >> > summarize, the filter tends to reorder attributes at semirandom when
> > > >> > they get parsed. While the structure which holds the parsed
> > attribute is
> > > >> > a LinkedHashMap, meaning we should be able to stuff in values and
> > pull
> > > >> > them out in the same order, the put functions are called in the
> > derived
> > > >> > verifier's overrided sanitizeHash methods. These methods extract an
> > > >> > attribute, sanitize it, then place it in the Map. The problem is,
> > they
> > > >> > are extracted out of the original order, meaning they get pulled out
> > of
> > > >> > the Map in the wrong order. To fix this, I created a callback object
> > > >> > which the derived classes pass to the baseclass. The baseclass may
> > then
> > > >> > parse all of the attributes in order, invoking the callback to
> > > >> > sanitize.If an attribute's contents fails to be processed, an
> > exception
> > > >> > may be thrown, so that the attribute will not be included in the
> > final
> > > >> > tag.
> > > >>
> > > >> It is important that only attributes that are explicitly parsed and
> > understood are passed on, and that it doesn't take extra per-sanitiser work
> > to achieve this. Will this be the case?
> > > >>
> > > >
> > > > Yeah, this should be the case.  Attributes which don't have a callback
> > > > stored simply aren't parsed. I am starting, however, to think this
> > > > approach might be overkill.  Here I have a different take:
> > > >
> > http://github.com/spencerjackson/fred-staging/tree/HTMLAttributeReorder
> > > > Instead of running a callback in the base class, I simply create the
> > > > attributes, in order, with null content. Then, in the overloaded
> > methods
> > > > on the child classes I repopulate them with the correct data. This
> > > > preserves the original order of the attributes, while minimizing the
> > > > amount of new code that needs to be written. What do you think? Which
> > > > solution do you think is preferable?
> > >
> > > Do attributes without content still get written?  Is that always
> > > valid?  Not writing them isn't always valid; see eg bug 4125: current
> > > code happily removes required attributes from <meta> tags, thus
> > > breaking valid pages.
> >
> 
> Odd. I'm looking at the code for MetaTagVerifier, and I can't see any code
> branches in which, if the 'content' attribute is defined, it is failed to be
> added to the LinkedHashMap unless nothing else is added either... I'm not on
> my home computer, so I'll have to test this tomorrow. Does it happen to all
> <meta> tags? Oh. Do you mean, if there are no attributes, the tag will still
> exist, but be empty? I could alter MetaTagVerifier to return null if this is
> the case, and remove the tag from the final output. Would that fix this?
> 
> 
> > >
> > > Depending how much cleaning of the HTML filtering system you want to
> > > do...  Has using something like JTidy ( http://jtidy.sourceforge.net/
> > > ) been discussed?  That way you wouldn't have to worry about what's
> > > valid or invalid HTML, merely the security aspects of valid HTML that
> > > are unique to Freenet.
> >
> That might be nice... but wouldn't we have the same problem in that it would
> be hard to diff the output of the filter against the input for debugging
> purposes? What do other people think about this? It would make life much
> easier...


IMHO this is out of scope for GSoC, will lead to large diffs, will be a lot of 
work and pull in a lot of third party code. Bad idea at the moment. 

But the more fundamental issue is that we MUST have a WHITELIST ONLY filter: 
Nothing is passed through without somebody going through and writing a filter 
or declaring that that attribute is harmless. This is directly opposed to what 
you said above.
> 
> > >IMHO sajack's solution is acceptable, you will have to just use null to
> > indicate no attribute and "" to indicate an >attribute with no value? Or is
> > there a difference between attributes with an empty value and attributes
> > with no >value?
> >
>  Apparently, HTML supports attribute minimization, but XHTML does not. In
> other words, 'compact' is valid HTML, but not valid XHTML, which needs
> 'compact="compact"'. ( http://www.w3.org/TR/xhtml1/#h-4.5 ) For boolean
> values, according to (
> http://www.w3.org/TR/html401/intro/sgmltut.html#h-3.3.4.2 ) attributes
> should either exist, without an '=', or be equal to the attribute's name if
> true, and nonexistent if false. XHTML will require the attribute be equal to
> its name, if true. So yes, there is a difference.
> Okay. How's this. Step one, for all attributes in the tag, create the same
> attributes in the same order in the sanitized tag, all equal to null. Parse
> the tag, replacing the null values, if new values exist. Now that we're
> done, we iterate through all the attributes in the parsed map. If the
> attribute is null, discard it. If the attribute is simply empty, check for
> whether the HTML parse context says we're parsing XHTML. If no, pass through
> the minimized attribute. If yes, discard it.

signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Devl mailing list
Devl@freenetproject.org
http://osprey.vm.bytemark.co.uk/cgi-bin/mailman/listinfo/devl

Re: [freenet-dev] Attribute reordering in HTML filter

Reply via email to