[cp-patches] Re: RFC: htmlAttributeSet and SmallHTMLAttributeSet

Audrius Meskauskas Fri, 03 Nov 2006 11:03:05 -0800

The HTML parser should emit attributes as HTML.Attribute objects andnot as strings.

This is true for the final, user - accessible parser interface.htmlAttributeSet works as part of the internal implementation, where theattribute strings are already extracted from the text but not yetconverted into the matching attribute constants. It is a highlyspecialized class that additionally handles the case insensitivity(following W3C HTML specification, both HTML tag and attribute names arecase insensitive). Direct replacement into the SimpleAttributeSet willbreak the case insensitivity, and you will need to rework the codeelsewhere to restore this.

Also, the HTML may contain the non standard attributes that have nocorresponding attribute constant. These should be handled as strings.

To produce the less garbage, htmlAttributeSet may become exposed to theuser via the AttributeSet interface that it implements. I did not seethis as a big problem, but, if needed, the intermediate class surely canbe instantiated.The Mauve and the GNU Classpath itself contain numerous automated testsfor HTML parser regressions. These should be used when working with theparser code.

Good luck.
Audrius

[cp-patches] Re: RFC: htmlAttributeSet and SmallHTMLAttributeSet

Reply via email to