[EMAIL PROTECTED] wrote: > I have an XML file which contains entries of the form: > > <idlist> > <myID>1</myID> > <myID>2</myID> > .... > <myID>10000</myID> > </idlist> > > Currently, I have written a SAX based handler that will read in all the > <myID></myID> entries and return a list of the contents of these > entries. However this is not scalable and for my purposes it would be > better if I could iterate over the list of <myID> nodes. Some thing > like: > > for myid in getMyIDList(document): > print myid
You can try lxml 1.1. http://cheeseshop.python.org/pypi/lxml/1.1alpha Some documentation is here: http://codespeak.net/svn/lxml/trunk/doc/api.txt I haven't tested it, but you should be able to do this: from lxml.etree import iterparse last = None for event, myid in iterparse(document_url, tag="myID"): print myid.text if last is not None: last.getparent().remove(last) last = myid Internally, iterparse builds up a tree, so the last three lines are there to remove the myid elements from the tree that were already handled. This saves a lot of memory for large documents. Stefan -- http://mail.python.org/mailman/listinfo/python-list