Re: Trying to parse a HUGE(1gb) xml file

Stefan Behnel Tue, 21 Dec 2010 00:37:25 -0800

spaceman-spiff, 20.12.2010 21:29:

I am sorry i left out what exactly i am trying to do.


0. Goal :I am looking for a specific element..there are several 10s/100s 
occurrences of that element in the 1gb xml file.
The contents of the xml, is just a dump of config parameters from a packet 
switch( although imho, the contents of the xml dont matter)

I need to detect them&  then for each 1, i need to copy all the content b/w the 
element's start&  end tags&  create a smaller xml file.

Then cElementTree's iterparse() is your friend. It allows you to basicallyiterate over the XML tags while its building an in-memory tree from them.That way, you can either remove subtrees from the tree if you don't needthem (to safe memory) or otherwise handle them in any way you like, such asserialising them into a new file (and then deleting them).

Also note that the iterparse implementation in lxml.etree allows you tospecify a tag name to restrict the iterator to these tags. That's usually alot faster, but it also means that you need to take more care to clean upthe parts of the tree that the iterator stepped over. Depending on yourrequirements and the amount of manual code optimisation that you want toinvest, either cElementTree or lxml.etree may perform better for you.

It seems that you already found the article by Liza Daly about highperformance XML processing with Python. Give it another read, it has acouple of good hints and examples that will help you here.


Stefan

--
http://mail.python.org/mailman/listinfo/python-list

Re: Trying to parse a HUGE(1gb) xml file

Reply via email to