http://d.puremagic.com/issues/show_bug.cgi?id=5221
--- Comment #11 from Aziz K�ksal <aziz.koek...@gmail.com> 2011-01-29 08:53:01 PST --- Good to see somebody is working on this. :) Here's some info I could gather on the entities in that file: Almost all entities look like this: <!ENTITY AElig "Æ" ><!--LATIN CAPITAL LETTER AE --> The odd ones that stand out are: <!ENTITY AMP "&#38;" ><!--AMPERSAND --> // The name has a dot (you already noticed.) <!ENTITY b.Delta "𝚫" ><!--MATHEMATICAL BOLD CAPITAL DELTA --> // Notice the leading space in the value. <!ENTITY DotDot " ⃜" ><!--COMBINING FOUR DOTS ABOVE --> // The entity has two characters in its value. <!ENTITY bne "=⃥" ><!--EQUALS SIGN with reverse slash --> // Double chars + initial &. <!ENTITY nvlt "&#x0003C;⃒" ><!--LESS-THAN SIGN with vertical line --> I was wondering what constitutes valid names for HTML entities and after a long and hard search I found this page: http://www.w3.org/TR/REC-xml/#NT-Name Basically you can stick in a lot more different characters than just alphanumeric ones into an HTML entity. However, I would not recommend adjusting the lexer to recognise all of them. We should just allow only those that are actually in the list, because it keeps things a lot simpler. Therefore the "." char should be allowed as well. But what to do about those entities that define two replacement characters? Again, to keep things simple and efficient, let's just leave them out of the lookup-table in the compiler. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------