Vincent,
I'm trying to understand some of the issues we have with entities in the
XmlParser. Is there a special reason why entities are emitted as rawText and not text?
I think they should be emitted as text:
First, custom entities can be used to simply define some replacement text inside
documents (eg <!ENTITY version "1.0">).
Second, the resulting events should be consumable by all sinks, not just x(ht)ml
based ones. Consider for instance the text "&Æ" (where AElig is defined
as <!ENTITY AElig "Æ">). Currently it is emitted by the XhtmlBaseParser as
one text event "&" and one rawText event "Æ". This means that eg the Latex
Sink will produce wrong output (the AElig should be converted to "\AE" in latex).
IMO the resolved entity should be emitted in a format-independent way, eg as one
(unicode?) character, just like & is emitted as one character above. The
consuming sink then has to transform that into a format-specific representation.
WDYT?
-Lukas