Quoting David Kahn <[email protected]>: > Got a question hopefully someone can answer - > > I am working on functionality to match on certain nodes of a largish (65mb) > xml file. I implemented this with REXML and was 2 minutes and counting > before I killed the process. After this, I just opened the console and > loaded the file into a string and did a regex search for my data -- the > result was almost instantaneous. > > The question is, if I can get away with it, am I better off just going the > regex route, or is it really worth my while to investigate a faster XML > parser (I know REXML is notorious for being slow, but given how fast it was > to call a regex on the file, I am thinking that this will still be faster > than all parsers). >
Look at using LibXML::XML::Reader http://libxml.rubyforge.org/rdoc/index.html What most XML parsing libraries are doing is reading the entire XML file into memory, probably storing the raw text, parsing it, and creating an even bigger data structure for the whole file, then searching over it. Nokogiri at least does some of the searching in C, instead of Ruby (it uses libxml2). With LibXML::XML::Reader is possible (with some not very pretty code) to make one pass thru the XML file, parsing as you go, and create data structures for just the information of interest. Enormously faster. HTH, Jeffrey -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.

