Please quote when replying.  It is very hard to follow the discussion if 
you don't.

David Kahn wrote:
> Actually I and my client care how fast, even if it means more work and 
> tests
> to hedge accuracy. 

And by the time you do that extra work for correctness, you will have 
developed a system equivalent to REXML or Nokogiri, and likely with 
similar or worse performance.  You're fighting a losing battle here.

> I did try Nokogiri - which I liked getting to know, 
> but
> it also plods in at ~ 150 seconds which is just unacceptable for someone
> waiting at a browser.

Waiting at a browser?  Let me get this straight -- your app is trying to 
process a 65MB file in real time?  That's insane.  Do some of the 
processing in advance, or tell the user that he can expect a 2-minute 
wait (which is absolutely reasonable for that much data).

> That's what I was trying to get at with my 
> original
> post and should have provided more data, i.e. am I wasting time with
> unrealistic expectations for any XML parser in this endeavor.
> 
> Unless anyone can point out a more efficient search (code and example 
> xml
> below), it seems practical in absence of other ideas, to go the way of 
> regex
> at least to triangulate the data before throwing it to an xml parser to 
> get
> the details or put the data into a db (which I am trying to avoid).

Why are you trying to avoid putting the data into a DB?  Databases are 
designed for quick searches through lots of data -- in other words, 
exactly what you are doing.  XML really is not.  (You could try eXistDB, 
though.)

> 
> Below, the second line is what takes forever, understandably.
> gsa_epls_xml_doc = Nokogiri::HTML(doc_xml)
> @gsa_epls_xml_doc.xpath("//records/record[last='#{last_name}' and
> first='#{first_name}']").each do |possible_match_record| ...

I'm assuming gsa is Google Search Appliance.  Can't it do the searching 
itself and give you back only the records you need?

Best,
--
Marnen Laibow-Koser
http://www.marnen.org
[email protected]
-- 
Posted via http://www.ruby-forum.com/.

-- 
You received this message because you are subscribed to the Google Groups "Ruby 
on Rails: Talk" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/rubyonrails-talk?hl=en.

Reply via email to