On Jan 28, 2:31 pm, John Machin <[EMAIL PROTECTED]> wrote: > On Jan 28, 2:53 pm, glacier <[EMAIL PROTECTED]> wrote: > > > > > Thanks,John. > > It's no doubt that you proved SAX didn't support GBK encoding. > > But can you give some suggestion on how to make SAX parse some GBK > > string? > > Yes, the same suggestion as was given to you by others very early in > this thread, the same as I demonstrated in the middle of proving that > SAX doesn't support a GBK-encoded input file. > > Suggestion: Recode your input from GBK to UTF-8. Ensure that the XML > declaration doesn't have an unsupported encoding. Your handler will > get data encoded as UTF-8. Recode that to GBK if needed. > > Here's a cut down version of the previous script, focussed on > demonstrating that the recoding strategy works. > > C:\junk>type gbksax2.py > import xml.sax, xml.sax.saxutils > import cStringIO > unistr = u''.join(unichr(0x4E00+i) + unichr(ord('W')+i) for i in > range(4)) > gbkstr = unistr.encode('gbk') > print 'This is a GBK-encoded string: %r' % gbkstr > utf8str = gbkstr.decode('gbk').encode('utf8') > print 'Now recoded as UTF-8 to be fed to a SAX parser: %r' % utf8str > xml_template = """<?xml version="1.0" encoding="%s"?><data>%s</ > data>""" > utf8doc = xml_template % ('utf-8', unistr.encode('utf8')) > f = cStringIO.StringIO() > handler = xml.sax.saxutils.XMLGenerator(f, encoding='utf8') > xml.sax.parseString(utf8doc, handler) > result = f.getvalue() > f.close() > start = result.find('<data>') + 6 > end = result.find('</data>') > mydata = result[start:end] > print "SAX output (UTF-8): %r" % mydata > print "SAX output recoded to GBK: %r" % > mydata.decode('utf8').encode('gbk') > > C:\junk>gbksax2.py > This is a GBK-encoded string: '[EMAIL PROTECTED]' > Now recoded as UTF-8 to be fed to a SAX parser: '\xe4\xb8\x80W > \xe4\xb8\x81X\xe4\xb8\x82Y\xe4\xb8\x83Z' > SAX output (UTF-8): '\xe4\xb8\x80W\xe4\xb8\x81X\xe4\xb8\x82Y > \xe4\xb8\x83Z' > SAX output recoded to GBK: '[EMAIL PROTECTED]' > > HTH, > John
Thanks a lot John:) I'll try it. -- http://mail.python.org/mailman/listinfo/python-list