[freenet-dev] Attribute reordering in HTML filter

Spencer Jackson Mon, 10 May 2010 22:07:12 -0500

On Tue, 2010-05-11 at 00:29 +0100, Matthew Toseland wrote:
> On Sunday 09 May 2010 04:56:48 Evan Daniel wrote:
> > On Sat, May 8, 2010 at 9:35 PM, Spencer Jackson
> > <spencerandrewjackson at gmail.com> wrote:
> > > tOn Sat, May 8, 2010 at 10:38 AM, Matthew Toseland
> > > <toad at amphibian.dyndns.org> wrote:
> > >>
> > >> On Saturday 08 May 2010 05:09:07 Evan Daniel wrote:
> > >> > On Fri, May 7, 2010 at 11:43 PM, Spencer Jackson
> > >> > <spencerandrewjackson at gmail.com> wrote:
> > >> > > On Fri, 2010-05-07 at 12:40 +0100, Matthew Toseland wrote:
> > >> > >> On Thursday 06 May 2010 20:40:03 Spencer Jackson wrote:
> > >> > >> > Hi guys, just wanted to touch base. Anyway, I'm working on
> > >> > >> > resolving bug
> > >> > >> > number 3571( https://bugs.freenetproject.org/view.php?id=3571 ). 
> > >> > >> > To
> > >> > >> > summarize, the filter tends to reorder attributes at semirandom
> > >> > >> > when
> > >> > >> > they get parsed. While the structure which holds the parsed
> > >> > >> > attribute is
> > >> > >> > a LinkedHashMap, meaning we should be able to stuff in values and
> > >> > >> > pull
> > >> > >> > them out in the same order, the put functions are called in the
> > >> > >> > derived
> > >> > >> > verifier's overrided sanitizeHash methods. These methods extract 
> > >> > >> > an
> > >> > >> > attribute, sanitize it, then place it in the Map. The problem is,
> > >> > >> > they
> > >> > >> > are extracted out of the original order, meaning they get pulled
> > >> > >> > out of
> > >> > >> > the Map in the wrong order. To fix this, I created a callback
> > >> > >> > object
> > >> > >> > which the derived classes pass to the baseclass. The baseclass may
> > >> > >> > then
> > >> > >> > parse all of the attributes in order, invoking the callback to
> > >> > >> > sanitize.If an attribute's contents fails to be processed, an
> > >> > >> > exception
> > >> > >> > may be thrown, so that the attribute will not be included in the
> > >> > >> > final
> > >> > >> > tag.
> > >> > >>
> > >> > >> It is important that only attributes that are explicitly parsed and
> > >> > >> understood are passed on, and that it doesn't take extra 
> > >> > >> per-sanitiser work
> > >> > >> to achieve this. Will this be the case?
> > >> > >>
> > >> > >
> > >> > > Yeah, this should be the case.  Attributes which don't have a 
> > >> > > callback
> > >> > > stored simply aren't parsed. I am starting, however, to think this
> > >> > > approach might be overkill.  Here I have a different take:
> > >> > >
> > >> > > http://github.com/spencerjackson/fred-staging/tree/HTMLAttributeReorder
> > >> > > Instead of running a callback in the base class, I simply create the
> > >> > > attributes, in order, with null content. Then, in the overloaded
> > >> > > methods
> > >> > > on the child classes I repopulate them with the correct data. This
> > >> > > preserves the original order of the attributes, while minimizing the
> > >> > > amount of new code that needs to be written. What do you think? Which
> > >> > > solution do you think is preferable?
> > >> >
> > >> > Do attributes without content still get written?  Is that always
> > >> > valid?  Not writing them isn't always valid; see eg bug 4125: current
> > >> > code happily removes required attributes from <meta> tags, thus
> > >> > breaking valid pages.
> > >
> > >
> > > Odd. I'm looking at the code for MetaTagVerifier, and I can't see any code
> > > branches in which, if the 'content' attribute is defined, it is failed to 
> > > be
> > > added to the LinkedHashMap unless nothing else is added either... I'm not 
> > > on
> > > my home computer, so I'll have to test this tomorrow. Does it happen to 
> > > all
> > > <meta> tags? Oh. Do you mean, if there are no attributes, the tag will 
> > > still
> > > exist, but be empty? I could alter MetaTagVerifier to return null if this 
> > > is
> > > the case, and remove the tag from the final output. Would that fix this?
> > 
> > As mentioned in the other reply, the content filter alters my flog from
> > <meta http-equiv="Content-type" 
> > content="application/xhtml+xml;charset=UTF-8" />
> > to
> > <meta />
> > 
> > I haven't done a detailed analysis of why.
> 
> That is very strange. It shouldn't, it detects the MIME type from this.


Well, two things cause that. Firstly, the filter appears to be looking
for "text/html" and text/html only. Perhaps XHTML should be added to
this? Also, when there are no attributes, nothing happens to the tag.
There is currently no way to say "This is impossible. Remove this tag:
it makes no sense to the browser without content." The latter is fixed
in my local git repo. Lemme push it, and I send you a pull request.

Spencer

[freenet-dev] Attribute reordering in HTML filter

Reply via email to