On Sunday 09 May 2010 02:35:57 Spencer Jackson wrote: > tOn Sat, May 8, 2010 at 10:38 AM, Matthew Toseland < > t...@amphibian.dyndns.org> wrote: > > > On Saturday 08 May 2010 05:09:07 Evan Daniel wrote: > > > On Fri, May 7, 2010 at 11:43 PM, Spencer Jackson > > > <spencerandrewjack...@gmail.com> wrote: > > > > On Fri, 2010-05-07 at 12:40 +0100, Matthew Toseland wrote: > > > >> On Thursday 06 May 2010 20:40:03 Spencer Jackson wrote: > > > >> > Hi guys, just wanted to touch base. Anyway, I'm working on resolving > > bug > > > >> > number 3571( https://bugs.freenetproject.org/view.php?id=3571 ). To > > > >> > summarize, the filter tends to reorder attributes at semirandom when > > > >> > they get parsed. While the structure which holds the parsed > > attribute is > > > >> > a LinkedHashMap, meaning we should be able to stuff in values and > > pull > > > >> > them out in the same order, the put functions are called in the > > derived > > > >> > verifier's overrided sanitizeHash methods. These methods extract an > > > >> > attribute, sanitize it, then place it in the Map. The problem is, > > they > > > >> > are extracted out of the original order, meaning they get pulled out > > of > > > >> > the Map in the wrong order. To fix this, I created a callback object > > > >> > which the derived classes pass to the baseclass. The baseclass may > > then > > > >> > parse all of the attributes in order, invoking the callback to > > > >> > sanitize.If an attribute's contents fails to be processed, an > > exception > > > >> > may be thrown, so that the attribute will not be included in the > > final > > > >> > tag. > > > >> > > > >> It is important that only attributes that are explicitly parsed and > > understood are passed on, and that it doesn't take extra per-sanitiser work > > to achieve this. Will this be the case? > > > >> > > > > > > > > Yeah, this should be the case. Attributes which don't have a callback > > > > stored simply aren't parsed. I am starting, however, to think this > > > > approach might be overkill. Here I have a different take: > > > > > > http://github.com/spencerjackson/fred-staging/tree/HTMLAttributeReorder > > > > Instead of running a callback in the base class, I simply create the > > > > attributes, in order, with null content. Then, in the overloaded > > methods > > > > on the child classes I repopulate them with the correct data. This > > > > preserves the original order of the attributes, while minimizing the > > > > amount of new code that needs to be written. What do you think? Which > > > > solution do you think is preferable? > > > > > > Do attributes without content still get written? Is that always > > > valid? Not writing them isn't always valid; see eg bug 4125: current > > > code happily removes required attributes from <meta> tags, thus > > > breaking valid pages. > > > > Odd. I'm looking at the code for MetaTagVerifier, and I can't see any code > branches in which, if the 'content' attribute is defined, it is failed to be > added to the LinkedHashMap unless nothing else is added either... I'm not on > my home computer, so I'll have to test this tomorrow. Does it happen to all > <meta> tags? Oh. Do you mean, if there are no attributes, the tag will still > exist, but be empty? I could alter MetaTagVerifier to return null if this is > the case, and remove the tag from the final output. Would that fix this? > > > > > > > > Depending how much cleaning of the HTML filtering system you want to > > > do... Has using something like JTidy ( http://jtidy.sourceforge.net/ > > > ) been discussed? That way you wouldn't have to worry about what's > > > valid or invalid HTML, merely the security aspects of valid HTML that > > > are unique to Freenet. > > > That might be nice... but wouldn't we have the same problem in that it would > be hard to diff the output of the filter against the input for debugging > purposes? What do other people think about this? It would make life much > easier...
IMHO this is out of scope for GSoC, will lead to large diffs, will be a lot of work and pull in a lot of third party code. Bad idea at the moment. But the more fundamental issue is that we MUST have a WHITELIST ONLY filter: Nothing is passed through without somebody going through and writing a filter or declaring that that attribute is harmless. This is directly opposed to what you said above. > > > >IMHO sajack's solution is acceptable, you will have to just use null to > > indicate no attribute and "" to indicate an >attribute with no value? Or is > > there a difference between attributes with an empty value and attributes > > with no >value? > > > Apparently, HTML supports attribute minimization, but XHTML does not. In > other words, 'compact' is valid HTML, but not valid XHTML, which needs > 'compact="compact"'. ( http://www.w3.org/TR/xhtml1/#h-4.5 ) For boolean > values, according to ( > http://www.w3.org/TR/html401/intro/sgmltut.html#h-3.3.4.2 ) attributes > should either exist, without an '=', or be equal to the attribute's name if > true, and nonexistent if false. XHTML will require the attribute be equal to > its name, if true. So yes, there is a difference. > Okay. How's this. Step one, for all attributes in the tag, create the same > attributes in the same order in the sanitized tag, all equal to null. Parse > the tag, replacing the null values, if new values exist. Now that we're > done, we iterate through all the attributes in the parsed map. If the > attribute is null, discard it. If the attribute is simply empty, check for > whether the HTML parse context says we're parsing XHTML. If no, pass through > the minimized attribute. If yes, discard it.
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ Devl mailing list Devl@freenetproject.org http://osprey.vm.bytemark.co.uk/cgi-bin/mailman/listinfo/devl