Tom Lane wrote:
Andrew Dunstan <[EMAIL PROTECTED]> writes:
I've just been looking at the state machine in wparser_def.c. I think
the processing for entities is also a few bob short in the pound. It
recognises decimal numeric character references, but nor hexadecimal
numeric character references. That's fairly silly since the HTML spec
specifically says the latter are "particularly useful". The rules for
named entities are also deficient w.r.t. digits, just like the case of
tags that Tom noticed. This isn't academic: HTML features a number of
named entities with digits in the name (sup2, frac14 for example).
In XML at least, legal names are defined by the following rules from the
spec:
...
[A-Za-z:_][A-Za-z0-9:_.-]*
I suggest we use that or something very close to it as the rule for
names in these patterns.
No objections here. Who wants to patch wparser_def?
I can get to it some time in the next week. - rather snowed under right now.
BTW, I'm also suspicious of the clause that allows <?xml ... it appears
that it will allow <?xfoo and <?XFOO also, which seems quite odd,
especially the latter.
cheers
andrew
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster