On Saturday 22 August 2009 08:13:33 joy99 wrote: > On Aug 22, 10:53 am, Stefan Behnel <stefan...@behnel.de> wrote: > > Rami Chowdhury wrote: > > >> I am using primarily UTF-8 based strings, like Hindi or Bengali. Can I > > >> use Python to help me in this regard? > > > > > > I can say from experience that Python on Windows (at least, Python 2.5 > > > on 32-bit Vista) works perfectly well with UTF-8 files containing > > > Bangla. I have had trouble with working with the data in IDLE, however, > > > which seems to prefer ASCII by default. > > > > Defaults almost never work for encodings. You have to be explicit: add an > > encoding declaration to the top of your source file if you use encoded > > literal strings in your code; use the codecs module with a suitable > > encoding to read encoded text files, and use an XML parser when reading > > XML. > > > > Stefan > > Dear Group, > Thanx for your reply. Python works perfectly for Hindi and Bangla with > Win XP. I never had a trouble. > Best Regards, > Subhabrata.
You might also want to have a look at lxml. It can much more than the XML module in the default distribution, uses ElementTree as well, and is backed by the kickass, fast libxml library (http://codespeak.net/lxml/). It will allow you to use XSLs, for instance. Regardless of whether you use lxml or not, have a look at etree.iterparse, it is invaluable when processing huge XML documents. Cheers, Emm -- http://mail.python.org/mailman/listinfo/python-list