Re: Extracting data from an XML file

Ed Summers Mon, 05 Jan 2004 14:12:14 -0800

On Mon, Jan 05, 2004 at 03:54:09PM -0500, Eric Lease Morgan wrote:
> The code works, but is really slow. Can you suggest a way to improve my code
> or use some other technique for extracting things like author, title, and id
> from my XML?


It's slow because you're building a DOM for the entire document, and only 
using a piece of it. If you use a stream based parser like XML::SAX [1] you
should see some good speed improvement, and it won't use so much memory :)

XML::SAX uses XML::LibXML, but as a stream. Kip Hampton has a good article 
"High Performance XML Parsing with SAX" [2] which should provide some guidance 
in getting started with XML::SAX.  

SAX is a generally useful technique (in Java land too), and SAX filters are 
really neat tools to have in your toolbox. I used them heavily as part of 
Net::OAI::Harvester [3] since OAI responses can be arbitrarily large, and 
building a DOM for some of the responses could be harmful.

//Ed 

[1] http://search.cpan.org/perldoc?XML::SAX
[2] http://xml.com/pub/a/2001/02/14/perlsax.html
[3] http://search.cpan.org/perldoc?Net::OAI::Harvester

Re: Extracting data from an XML file

Reply via email to