You may also try the api/expat addon. Not necessarily faster because processing box in J is inherently slow. You may have to amend those addons to replace box by something else and specific to your applications.
Пт, 09 авг 2013, Dan Farmer писал(а): > Hi again, > > So I am trying to parse some large (2-9 GB) XML files for an idea I > had for using JDB. My plan was to use XSLT to flatten these things out > (they are deeply nested structures), but figured I'd do a quick and > easy test to make sure I had a reasonable grip on J's facilities > before diving in. > > Unfortunately with the code I came up with it is so slow that I don't > think it's even worth attempting, can anyone provide some tips on how > I can maybe speed this up? I read up on the J performance monitor and > clocked it, it said 75% of the time was spent in cdcallback (which > makes me think there's nothing I can do short of writing something in > C/C++, but maybe I'm wrong). Here's the code (loosely adapted from > Oleg & John Baker's examples for sax). > > For the record, I created two smaller test files (500KB and 6MB) and > the code below works correctly on both of those. I've also written > Python code using lxml's element tree module and it can process the 2 > GB file in about 60 seconds, I let this code run for 30 minutes and > then killed it. > > Any ideas? > > Thanks, > Dan > > require 'jmf' > require 'files dir' > require 'xml/sax' > > saxclass ‘xp’ > > startDocument=: 3 : 0 > ids=: '' > ) > > > startElement=: 4 : 0 > if. y-:,’Node’ do. > ids=: ids,< x getAttribute '_Id' > end. > ) > > > endDocument=: 3 : 0 > s: ids > ) > > NB. ========================================================= > cocurrent 'base' > > fn=: 'c:/data/test/2GBfile.xml’' > > unmap_jmf_ 'xfile' NB. Hokey, but for debugging > > JCHAR map_jmf_ 'xfile';fn > > process_xp_ xfile > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm -- regards, ==================================================== GPG key 1024D/4434BAB3 2008-08-24 gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
