Alan Meyer, 27.12.2010 21:40:
On 12/21/2010 3:16 AM, Stefan Behnel wrote:
Adam Tauno Williams, 20.12.2010 20:49:
...
You need to process the document as a stream of elements; aka SAX.

IMHO, this is the worst advice you can give.

Why do you say that? I would have thought that using SAX in this
application is an excellent idea.

From my experience, SAX is only practical for very simple cases where little state is involved when extracting information from the parse events. A typical example is gathering statistics based on single tags - not a very common use case. Anything that involves knowing where in the XML tree you are to figure out what to do with the event is already too complicated. The main drawback of SAX is that the callbacks run into separate method calls, so you have to do all the state keeping manually through fields of the SAX handler instance.

My serious advices is: don't waste your time learning SAX. It's simply too frustrating to debug SAX extraction code into existence. Given how simple and fast it is to extract data with ElementTree's iterparse() in a memory efficient way, there is really no reason to write complicated SAX code instead.

Stefan

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to