Okay, I did some work on this purely as a learning experience for me. I wanted to learn SAX parsing, so I tried converting the screen widgets to SAX parsing.
I found a small public-domain framework that makes the whole process very easy. Since all of the model screen widgets subclass a single base class, I was able to hook them into the parsing framework by just having the base class subclass one of the framework classes. Model widgets that don't have sub-widgets just needed a new constructor. Model widgets that have sub-widgets needed a little extra code to handle the sub-widgets, but it was no more code than what already exists to handle the DOM version of the sub-widgets. Overall, it was pretty easy and I was surprised when it worked the very first time I tried it. If anyone is interested, I would be happy to post the POC code in Jira. Just let me know. -Adrian --- On Sat, 4/25/09, Adrian Crum <adrian.c...@yahoo.com> wrote: > From: Adrian Crum <adrian.c...@yahoo.com> > Subject: Re: Discussion: XML file parsing improvement > To: dev@ofbiz.apache.org > Date: Saturday, April 25, 2009, 10:28 PM > Adam and David, > > Thank you for your comments! I'll look into the entity > import code some more. > > Personally, I don't have an issue with importing large > XML files. I see it come up from time to time on the mailing > lists. I remember BJ Freeman had to write his own import > code because of some OFBiz limitation. > > I'll accept the widget files, scripts, and config files > are too small to optimize. Having event-driven parsing for > those might be an interesting experiment though. > > -Adrian > > > --- On Sat, 4/25/09, Adam Heath > <doo...@brainfood.com> wrote: > > > From: Adam Heath <doo...@brainfood.com> > > Subject: Re: Discussion: XML file parsing improvement > > To: dev@ofbiz.apache.org > > Date: Saturday, April 25, 2009, 9:52 PM > > Adrian Crum wrote: > > > OFBiz uses a lot of XML files. When each XML file > is > > > read, it is first parsed into a DOM Document, > then the > > > DOM Document is parsed into OFBiz Java objects. > This > > > two-step process consumes a lot of memory, and it > > > takes more time than it should. > > > > > > There is an alternative - what is called > event-driven > > > parsing. The XML parser can be set up to convert > XML > > > elements directly to the OFBiz Java objects - > > > bypassing the DOM Document build and parse steps. > > > Theoretically, this could provide a huge > performance > > > boost, and it would use less memory. In addition, > it > > > would solve the problem of huge XML files maxing > out > > > server memory during the parse process - like > with > > entity XML > > import/export. > > > > > > Has anyone else considered this? Do you think it > is > > > worth pursuing? > > > > What files are you talking about, that are so huge, > they > > can't be > > parsed with the simpler DOM model? > > > > entity data files are sax based already. > > > > widget files, scripts, config files are small, so > it's > > better to keep > > the simpler algo, as David suggested. > > > > additionally, I already did some memory profiling a > while > > back, and > > interned the long-lived strings from parsed xml. This > > actually > > reduced memory usage. > > > > Another thing, the widgets, scripts, config files are > read > > very > > infrequently, then cached. The time it takes to parse > them > > is not > > really a performance consideration. > > > > As an aside, how much swap do you have on your server? > > > Any? Is it > > being used? Then you don't have enough ram. If > your > > work-load is > > causing swap to be used, then you haven't > correctly > > identified your > > work load usage requirements. > > > > The same can be said for java maximum memory > allocation.