Alan Meyer, 27.12.2010 21:40:
On 12/21/2010 3:16 AM, Stefan Behnel wrote:
Adam Tauno Williams, 20.12.2010 20:49:
...
You need to process the document as a stream of elements; aka SAX.
IMHO, this is the worst advice you can give.
Why do you say that? I would have thought that using SAX in this
application is an excellent idea.
From my experience, SAX is only practical for very simple cases where
little state is involved when extracting information from the parse events.
A typical example is gathering statistics based on single tags - not a very
common use case. Anything that involves knowing where in the XML tree you
are to figure out what to do with the event is already too complicated. The
main drawback of SAX is that the callbacks run into separate method calls,
so you have to do all the state keeping manually through fields of the SAX
handler instance.
My serious advices is: don't waste your time learning SAX. It's simply too
frustrating to debug SAX extraction code into existence. Given how simple
and fast it is to extract data with ElementTree's iterparse() in a memory
efficient way, there is really no reason to write complicated SAX code instead.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list