On Mon, Jan 05, 2004 at 03:54:09PM -0500, Eric Lease Morgan wrote: > The code works, but is really slow. Can you suggest a way to improve my code > or use some other technique for extracting things like author, title, and id > from my XML?
It's slow because you're building a DOM for the entire document, and only using a piece of it. If you use a stream based parser like XML::SAX [1] you should see some good speed improvement, and it won't use so much memory :) XML::SAX uses XML::LibXML, but as a stream. Kip Hampton has a good article "High Performance XML Parsing with SAX" [2] which should provide some guidance in getting started with XML::SAX. SAX is a generally useful technique (in Java land too), and SAX filters are really neat tools to have in your toolbox. I used them heavily as part of Net::OAI::Harvester [3] since OAI responses can be arbitrarily large, and building a DOM for some of the responses could be harmful. //Ed [1] http://search.cpan.org/perldoc?XML::SAX [2] http://xml.com/pub/a/2001/02/14/perlsax.html [3] http://search.cpan.org/perldoc?Net::OAI::Harvester