In article <[EMAIL PROTECTED]>, Fredrik Lundh wrote: >Magnus Lie Hetland wrote: [snip] >with sgmlop 1.1, the following script > >class entity_handler: > def handle_entityref(self, entityref): > print "ENTITY", repr(entityref) > >parser = sgmlop.XMLParser() >parser.register(entity_handler()) >parser.feed("&-10;&/()=?;") > >prints: > >ENTITY '-10' >ENTITY '/()=?'
OK, thanks. I guess I just wasn't creative enough in my entity naming :) >> And another thing... For the case where a numeric reference is too >> high (i.e. it can't be translated into a Unicode character) -- is it >> possible to ignore it (or replace it, as with encode/decode)? > >if you don't do anything, it is ignored. > >if you specify a handle_charref hook, the part between &# and ; is passed >to that method. I see -- it's just if the default behaviour of transforming it to text kicks in that there is trouble? (That makes sense, of course.) >if you have a handle_entityref hook, but no handle_charref, the part between >& and ; is passed to handle_entityref. Strange. It doesn't seem to work that way for me... Here is an example: ...................................................................... from xml.parsers.sgmlop import SGMLParser, XMLParser, XMLUnicodeParser class Handler: def handle_data(self, data): print 'DATA', data def handle_entityref(self, data): print 'ENTITY', data for parser in [SGMLParser(), XMLParser(), XMLUnicodeParser()]: parser.register(Handler()) try: parser.feed('�') except Exception, e: print e ...................................................................... When I run this, I get: character reference � exceeds ASCII range character reference � exceeds ASCII range character reference � exceeds sys.maxunicode (0xffff) If I remove the handle_data, nothing happens. ></F> -- Magnus Lie Hetland Time flies like the wind. Fruit flies http://hetland.org like bananas. -- Groucho Marx -- http://mail.python.org/mailman/listinfo/python-list