Diez B. Roggisch wrote: > I've got to deal with a pretty huge XML-document, and to do so I use the > cElementTree.iterparse functionality. Working great. > > Only trouble: The guys creating that chunk of XML - well, lets just say > they are "encodingly challanged", so they don't produce utf-8, but only > cp1252 instead, together with some weird name (Windows-1252) for that. > That is not part of the standard codecs module. cp1252 is, of course. > > But that won't work for iterparse. So currently, I manually change the > encoding given to utf-8, and use a stream-recoder. > > However, I was wondering if I could teach cElementTree about that encoding > name. I tried to register cp1252 under the name Windows-1252, but had no > luck - cET won't buy it. > > Any suggestions?
Both my python2.3 and python2.4 interpreters seem to know "Windows-1252": >>> import codecs >>> codecs.open("windows.xml", encoding="windows-1252") <open file 'windows.xml', mode 'rb' at 0x403737e0> Maybe the problem lies in the python installation rather than cElementTree? Just guessing, though. Peter -- http://mail.python.org/mailman/listinfo/python-list