Hello Audrius,

Am Freitag, den 03.11.2006, 19:03 +0100 schrieb Audrius Meskauskas:
> > The HTML parser should emit attributes as HTML.Attribute objects and 
> > not as strings. 
> 
> This is true for the final, user - accessible parser interface. 
> htmlAttributeSet works as part of the internal implementation, where the 
> attribute strings are already extracted from the text but not yet 
> converted into the matching attribute constants. It is a highly 
> specialized class that additionally handles the case insensitivity 
> (following W3C HTML specification, both HTML tag and attribute names are 
> case insensitive). Direct replacement into the SimpleAttributeSet will 
> break the case insensitivity, and you will need to rework the code 
> elsewhere to restore this.

I do call toLowerCase() for all attribute values when handling the names
to preserve the case insensitivity. This might be a little too
brute-force though. I'll look if I can improve this.

> Also, the HTML may contain the non standard attributes that have no 
> corresponding attribute constant. These should be handled as strings.

Indeed. I will think how to handle this.

> To produce the less garbage, htmlAttributeSet may become exposed to the 
> user via the AttributeSet interface that it implements. I did not see 
> this as a big problem, but, if needed, the intermediate class surely can 
> be instantiated. 
> The Mauve and the GNU Classpath itself contain numerous automated tests 
> for HTML parser regressions. These should be used when working with the 
> parser code.

I will test everything before committing so that my changes won't break
anything.

Thank you for your suggestions. It's a pity that you can't help
Classpath in the next time.

Cheers, Roman



Reply via email to