[Wrapped to meet RFC1855 Netiquette Guidelines] On 2010-12-20, spaceman-spiff <ashish.mak...@gmail.com> wrote: > This is a rather long post, but i wanted to include all the details & > everything i have tried so far myself, so please bear with me & read > the entire boringly long post. > > I am trying to parse a ginormous ( ~ 1gb) xml file. [SNIP] > 4. I then investigated some streaming libraries, but am confused - there > is SAX[http://en.wikipedia.org/wiki/Simple_API_for_XML] , the iterparse > interface[http://effbot.org/zone/element-iterparse.htm]
I have made extensive use of SAX and it will certainly work for low memory parsing of XML. I have never used "iterparse"; so, I cannot make an informed comparison between them. > Which one is the best for my situation ? Your posed was long but it failed to tell us the most important piece of information: What does your data look like and what are you trying to do with it? SAX is a low level API that provides a callback interface allowing you to processes various elements as they are encountered. You can therefore do anything you want to the information, as you encounter it, including outputing and discarding small chunks as you processes it; ignoring most of it and saving only what you want to memory data structures; or saving all of it to a more random access database or on disk data structure that you can load and process as required. What you need to do will depend on what you are actually trying to accomplish. Without knowing that, I can only affirm that SAX will work for your needs without providing any information about how you should be using it. -- http://mail.python.org/mailman/listinfo/python-list