On Sat, May 8, 2010 at 11:38 AM, Matthew Toseland
<t...@amphibian.dyndns.org> wrote:
> On Saturday 08 May 2010 05:09:07 Evan Daniel wrote:
>> On Fri, May 7, 2010 at 11:43 PM, Spencer Jackson
>> <spencerandrewjack...@gmail.com> wrote:
>> > On Fri, 2010-05-07 at 12:40 +0100, Matthew Toseland wrote:
>> >> On Thursday 06 May 2010 20:40:03 Spencer Jackson wrote:
>> >> > Hi guys, just wanted to touch base. Anyway, I'm working on resolving bug
>> >> > number 3571( https://bugs.freenetproject.org/view.php?id=3571 ). To
>> >> > summarize, the filter tends to reorder attributes at semirandom when
>> >> > they get parsed. While the structure which holds the parsed attribute is
>> >> > a LinkedHashMap, meaning we should be able to stuff in values and pull
>> >> > them out in the same order, the put functions are called in the derived
>> >> > verifier's overrided sanitizeHash methods. These methods extract an
>> >> > attribute, sanitize it, then place it in the Map. The problem is, they
>> >> > are extracted out of the original order, meaning they get pulled out of
>> >> > the Map in the wrong order. To fix this, I created a callback object
>> >> > which the derived classes pass to the baseclass. The baseclass may then
>> >> > parse all of the attributes in order, invoking the callback to
>> >> > sanitize.If an attribute's contents fails to be processed, an exception
>> >> > may be thrown, so that the attribute will not be included in the final
>> >> > tag.
>> >>
>> >> It is important that only attributes that are explicitly parsed and 
>> >> understood are passed on, and that it doesn't take extra per-sanitiser 
>> >> work to achieve this. Will this be the case?
>> >>
>> >
>> > Yeah, this should be the case.  Attributes which don't have a callback
>> > stored simply aren't parsed. I am starting, however, to think this
>> > approach might be overkill.  Here I have a different take:
>> > http://github.com/spencerjackson/fred-staging/tree/HTMLAttributeReorder
>> > Instead of running a callback in the base class, I simply create the
>> > attributes, in order, with null content. Then, in the overloaded methods
>> > on the child classes I repopulate them with the correct data. This
>> > preserves the original order of the attributes, while minimizing the
>> > amount of new code that needs to be written. What do you think? Which
>> > solution do you think is preferable?
>>
>> Do attributes without content still get written?  Is that always
>> valid?  Not writing them isn't always valid; see eg bug 4125: current
>> code happily removes required attributes from <meta> tags, thus
>> breaking valid pages.
>>
>> Depending how much cleaning of the HTML filtering system you want to
>> do...  Has using something like JTidy ( http://jtidy.sourceforge.net/
>> ) been discussed?  That way you wouldn't have to worry about what's
>> valid or invalid HTML, merely the security aspects of valid HTML that
>> are unique to Freenet.
>
> IMHO sajack's solution is acceptable, you will have to just use null to 
> indicate no attribute and "" to indicate an attribute with no value? Or is 
> there a difference between attributes with an empty value and attributes with 
> no value?
>

It sounds fine to me, provided it doesn't take validating html and
make it stop validating.  Or at least does so no more than the current
code.

I'm asking what will happen when the attribute has null content
because the filter couldn't find anything to fill it with; does that
get written as <tag attribute=""> or <tag> or something else?
Whichever it is, do we know that the result will be valid html?

The current filter turns eg
<meta http-equiv="Content-type" content="application/xhtml+xml;charset=UTF-8" />
into
<meta />

The first is valid xhtml, the second is not.  Run the w3c validator
against my flog, both filtered an unfiltered, for details.  So, how
will the new filter handle cases like this, where filter code hasn't
been completely written for all relevant aspects?

Evan Daniel
_______________________________________________
Devl mailing list
Devl@freenetproject.org
http://osprey.vm.bytemark.co.uk/cgi-bin/mailman/listinfo/devl

Reply via email to