On 4 May 2005 08:17:07 -0700, "Chris Curvey" <[EMAIL PROTECTED]> wrote:
>Here is the solution. Incidentally, the client is Cold Fusion. > I suspect your solution may be not be general, though it would seem to satisfy your use case. It seems to be true for python's latin-1 that all the first 256 character codes are acceptable and match unicode 1:1, even though the windows character map for lucida sans unicode font with latin-1 codes shows undefined-char boxes for codes 0x7f-0x9f. >>> sum(chr(i).decode('latin-1') == unichr(i) for i in xrange(256)) 256 >>> sum(unichr(i).encode('latin-1') == chr(i) for i in xrange(256)) 256 Not sure what to make of that. E.g. should unichr(0x7f).encode('latin-1') really be legal, or is it just expedient to have latin-1 serves as a kind of compressed utf_16_le? E.g., there's 256 Trues in these: >>> sum(unichr(i).encode('utf_16_le')[0] == chr(i) for i in xrange(256)) 256 >>> sum(unichr(i).encode('utf_16_le')[1] == '\x00' for i in xrange(256)) 256 Maybe we could have a 'u_as_str' or 'utf_16_le_lsbyte' codec for that, so the above would be spelled >>> sum(unichr(i).encode('u_as_str') == chr(i) for i in xrange(256)) # XXX >>> faked, not implemented 256 Utf-8 only goes half way: >>> sum(unichr(i).encode('utf-8') == chr(i) for i in xrange(256)) 128 <aside> What do you think, Martin? ;-) Maybe 'ubyte' or 'u256' would be a user-friendlier codec name? Or 'ustr'? </aside> >import re >import logging >import logging.config >import os >import SimpleXMLRPCServer > >logging.config.fileConfig("logging.ini") > >######################################################################## >class >LoggingXMLRPCRequestHandler(SimpleXMLRPCServer.CGIXMLRPCRequestHandler): > def __dereference(self, request_text): > entityRe = re.compile("((?P<er>&#x)(?P<code>..)(?P<semi>;))") What about entity ☺ ? Or the same in decimal: ☺ :) > for m in re.finditer(entityRe, request_text): > hexref = int(m.group(3),16) > charref = chr(hexref) unichr(hexref) would handle >= 256, if you used unicode. > request_text = request_text.replace(m.group(1), charref) > > return request_text > > >#------------------------------------------------------------------- > def handle_xmlrpc(self, request_text): > logger = logging.getLogger() > #logger.debug("************************************") > #logger.debug(request_text) ^^^^^^^^^^^^ I would suggest repr(request_text) for debugging, unless you know that your logger is going to do that for you. Otherwise a '%s' format may hide things that you'd like to know. > try: > #logger.debug("-------------------------------------") > request_text = self.__dereference(request_text) > #logger.debug(request_text) > request_text = request_text.decode("latin-1").encode('utf-8') AFAIK, XML can be encoded with many encodings other than latin-1, so you are essentially saying here that you know it's latin-1 somehow. Theoretically, your XML could start with something like <?xml encoding='UTF-8'?> and .decode("latin-1") is only going to "work" when the source is plain ascii. I wouldn't be surprised if that's what's happening up to the point where you __dereference, but str.replace doesn't care that you are potentially making a utf-8 encoding invalid by just replacing 8-bit characters with what is legal latin-1. after that, you are decoding your utf-8_clobbered_with_latin-1 as latin-1 anyway, so it "works". At least I think this is a consistent theory. See if you can get the client to send something with characters >128 that aren't represented as &#x..; to see if it's actually sending utf-8. > #logger.debug("************************************") > except Exception, e: > logger.error(request_text) again, suggest repr(request_text) > logger.error("had a problem dereferencing") > logger.error(e) > > SimpleXMLRPCServer.CGIXMLRPCRequestHandler.handle_xmlrpc(self, >request_text) >######################################################################## >class Foo: > def settings(self): > return os.environ > def echo(self, something): > logger = logging.getLogger() > logger.debug(something) repr it, unless you know ;-) > return something > def greeting(self, name): > return "hello, " + name > ># these are used to run as a CGI >handler = LoggingXMLRPCRequestHandler() >handler.register_instance(Foo()) >handler.handle_request() > Regards, Bengt Richter -- http://mail.python.org/mailman/listinfo/python-list