Hello Audrius, Am Freitag, den 03.11.2006, 19:03 +0100 schrieb Audrius Meskauskas: > > The HTML parser should emit attributes as HTML.Attribute objects and > > not as strings. > > This is true for the final, user - accessible parser interface. > htmlAttributeSet works as part of the internal implementation, where the > attribute strings are already extracted from the text but not yet > converted into the matching attribute constants. It is a highly > specialized class that additionally handles the case insensitivity > (following W3C HTML specification, both HTML tag and attribute names are > case insensitive). Direct replacement into the SimpleAttributeSet will > break the case insensitivity, and you will need to rework the code > elsewhere to restore this.
I do call toLowerCase() for all attribute values when handling the names to preserve the case insensitivity. This might be a little too brute-force though. I'll look if I can improve this. > Also, the HTML may contain the non standard attributes that have no > corresponding attribute constant. These should be handled as strings. Indeed. I will think how to handle this. > To produce the less garbage, htmlAttributeSet may become exposed to the > user via the AttributeSet interface that it implements. I did not see > this as a big problem, but, if needed, the intermediate class surely can > be instantiated. > The Mauve and the GNU Classpath itself contain numerous automated tests > for HTML parser regressions. These should be used when working with the > parser code. I will test everything before committing so that my changes won't break anything. Thank you for your suggestions. It's a pity that you can't help Classpath in the next time. Cheers, Roman