spaceman-spiff, 20.12.2010 21:29:
I am sorry i left out what exactly i am trying to do.

0. Goal :I am looking for a specific element..there are several 10s/100s 
occurrences of that element in the 1gb xml file.
The contents of the xml, is just a dump of config parameters from a packet 
switch( although imho, the contents of the xml dont matter)

I need to detect them&  then for each 1, i need to copy all the content b/w the 
element's start&  end tags&  create a smaller xml file.

Then cElementTree's iterparse() is your friend. It allows you to basically iterate over the XML tags while its building an in-memory tree from them. That way, you can either remove subtrees from the tree if you don't need them (to safe memory) or otherwise handle them in any way you like, such as serialising them into a new file (and then deleting them).

Also note that the iterparse implementation in lxml.etree allows you to specify a tag name to restrict the iterator to these tags. That's usually a lot faster, but it also means that you need to take more care to clean up the parts of the tree that the iterator stepped over. Depending on your requirements and the amount of manual code optimisation that you want to invest, either cElementTree or lxml.etree may perform better for you.

It seems that you already found the article by Liza Daly about high performance XML processing with Python. Give it another read, it has a couple of good hints and examples that will help you here.



