On Sat, May 8, 2010 at 10:35 PM, Evan Daniel <evanbd at gmail.com> wrote:
> On Sat, May 8, 2010 at 11:38 AM, Matthew Toseland > <toad at amphibian.dyndns.org> wrote: > > On Saturday 08 May 2010 05:09:07 Evan Daniel wrote: > >> On Fri, May 7, 2010 at 11:43 PM, Spencer Jackson > >> <spencerandrewjackson at gmail.com> wrote: > >> > On Fri, 2010-05-07 at 12:40 +0100, Matthew Toseland wrote: > >> >> On Thursday 06 May 2010 20:40:03 Spencer Jackson wrote: > >> >> > Hi guys, just wanted to touch base. Anyway, I'm working on > resolving bug > >> >> > number 3571( https://bugs.freenetproject.org/view.php?id=3571 ). > To > >> >> > summarize, the filter tends to reorder attributes at semirandom > when > >> >> > they get parsed. While the structure which holds the parsed > attribute is > >> >> > a LinkedHashMap, meaning we should be able to stuff in values and > pull > >> >> > them out in the same order, the put functions are called in the > derived > >> >> > verifier's overrided sanitizeHash methods. These methods extract an > >> >> > attribute, sanitize it, then place it in the Map. The problem is, > they > >> >> > are extracted out of the original order, meaning they get pulled > out of > >> >> > the Map in the wrong order. To fix this, I created a callback > object > >> >> > which the derived classes pass to the baseclass. The baseclass may > then > >> >> > parse all of the attributes in order, invoking the callback to > >> >> > sanitize.If an attribute's contents fails to be processed, an > exception > >> >> > may be thrown, so that the attribute will not be included in the > final > >> >> > tag. > >> >> > >> >> It is important that only attributes that are explicitly parsed and > understood are passed on, and that it doesn't take extra per-sanitiser work > to achieve this. Will this be the case? > >> >> > >> > > >> > Yeah, this should be the case. Attributes which don't have a callback > >> > stored simply aren't parsed. I am starting, however, to think this > >> > approach might be overkill. Here I have a different take: > >> > > http://github.com/spencerjackson/fred-staging/tree/HTMLAttributeReorder > >> > Instead of running a callback in the base class, I simply create the > >> > attributes, in order, with null content. Then, in the overloaded > methods > >> > on the child classes I repopulate them with the correct data. This > >> > preserves the original order of the attributes, while minimizing the > >> > amount of new code that needs to be written. What do you think? Which > >> > solution do you think is preferable? > >> > >> Do attributes without content still get written? Is that always > >> valid? Not writing them isn't always valid; see eg bug 4125: current > >> code happily removes required attributes from <meta> tags, thus > >> breaking valid pages. > >> > >> Depending how much cleaning of the HTML filtering system you want to > >> do... Has using something like JTidy ( http://jtidy.sourceforge.net/ > >> ) been discussed? That way you wouldn't have to worry about what's > >> valid or invalid HTML, merely the security aspects of valid HTML that > >> are unique to Freenet. > > > > IMHO sajack's solution is acceptable, you will have to just use null to > indicate no attribute and "" to indicate an attribute with no value? Or is > there a difference between attributes with an empty value and attributes > with no value? > > > > >It sounds fine to me, provided it doesn't take validating html and > >make it stop validating. Or at least does so no more than the current > >code. > > > >I'm asking what will happen when the attribute has null content > >because the filter couldn't find anything to fill it with; does that > >get written as <tag attribute=""> or <tag> or something else? > >Whichever it is, do we know that the result will be valid html? > > > >The current filter turns eg > ><meta http-equiv="Content-type" > content="application/xhtml+xml;charset=UTF-8" /> > >into > ><meta /> > > > >The first is valid xhtml, the second is not. Run the w3c validator > >against my flog, both filtered an unfiltered, for details. So, how > >will the new filter handle cases like this, where filter code hasn't > >been completely written for all relevant aspects? > > >Evan Daniel > Ah, okay, then I am aware of this. I think that removing these attributeless tags has to be done on a tag by tag basis. There are plenty of tags which are valid with no attributes. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20100509/ec6cb2f8/attachment.html>
