On Jan 27, 9:18 pm, glacier <[EMAIL PROTECTED]> wrote: > On 1月24日, 下午4时44分, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote: > > > On Wed, 23 Jan 2008 19:49:01 -0800, glacier wrote: > > > My second question is: is there any one who has tested very long mbcs > > > decode? I tried to decode a long(20+MB) xml yesterday, which turns out > > > to be very strange and cause SAX fail to parse the decoded string. > > > That's because SAX wants bytes, not a decoded string. Don't decode it > > yourself. > > > > However, I use another text editor to convert the file to utf-8 and > > > SAX will parse the content successfully. > > > Because now you feed SAX with bytes instead of a unicode string. > > > Ciao, > > Marc 'BlackJack' Rintsch > > Yepp. I feed SAX with the unicode string since SAX didn't support my > encoding system(GBK).
Let's go back to the beginning. What is "SAX"? Show us exactly what command or code you used. How did you let this SAX know that the file was encoded in GBK? An argument to SAX? An encoding declaration in the first few lines of the file? Some other method? ... precise answer please. Or did you expect that this SAX would guess correctly what the encoding was without being told? What does "didn't support my encoding system" mean? Have you actually tried pushing raw undecoded GBK at SAX using a suitable documented method of telling SAX that the file is in fact encoded in GBK? If so, what was the error message that you got? How do you know that it's GBK, anyway? Have you considered these possible scenarios: (1) It's GBK but you are telling SAX that it's GB2312 (2) It's GB18030 but you are telling SAX it's GBK HTH, John -- http://mail.python.org/mailman/listinfo/python-list