On Jan 11, 6:15 am, webcomm <rya...@gmail.com> wrote: > On Jan 9, 6:07 pm, John Machin <sjmac...@lexicon.net> wrote: > > > Yup, it looks like it's encoded in utf_16_le, i.e. no BOM as > > God^H^H^HGates intended: > > > >>> buff = open('data', 'rb').read() > > >>> buff[:100] > > > '<\x00R\x00e\x00g\x00i\x00s\x00t\x00r\x00a\x00t\x00i\x00o\x00n\x00> > > \x00<\x00B\x0 > > 0a\x00l\x00a\x00n\x00c\x00e\x00D\x00u\x00e\x00> > > \x000\x00.\x000\x000\x000\x000\x0 > > 0<\x00/\x00B\x00a\x00l\x00a\x00n\x00c\x00e\x00D\x00u\x00e\x00>\x00< > > \x00S\x00t\x0 > > 0a\x00t\x00' > > >>> buff[:100].decode('utf_16_le') > > There it is. Thanks. > > > u'<Registration><BalanceDue>0.0000</BalanceDue><Stat' > > > > But if I return it to my browser with python+django, > > > there are bad characters every other character > > > Please consider that we might have difficulty guessing what "return it > > to my browser with python+django" means. Show actual code. > > I did stop and consider what code to show. I tried to show only the > code that seemed relevant, as there are sometimes complaints on this > and other groups when someone shows more than the relevant code. You > solved my problem with decode('utf_16_le'). I can't find any > description of that encoding on the WWW... and I thought *everything* > was on the WWW. :)
Try searching using the official name UTF-16LE ... looks like a blind spot in the approximate matching algorithm(s) used by the search engine (s) that you tried :-( > I didn't know the data was utf_16_le-encoded because I'm getting it > from a service. I don't even know if *they* know what encoding they > used. I'm not sure how you knew what the encoding was. Actually looked at the raw data. Pattern appeared to be an alternation of 1 "meaningful" byte and one zero ('\x00') byte: => UTF16*. No BOM ('\xFE\xFF' or '\xFF\xFE') at start of file: => UTF16-?E. First byte is meaningful: => UTF16-LE. > > Please consider reading the Unicode HOWTO at > > http://docs.python.org/howto/unicode.html > > Probably wouldn't hurt, Definitely won't hurt. Could even help. > though reading that HOWTO wouldn't have given > me the encoding, I don't think. It wasn't intended to give you the encoding. Just read it. Cheers, John -- http://mail.python.org/mailman/listinfo/python-list