Adam and David,

Thank you for your comments! I'll look into the entity import code some more.

Personally, I don't have an issue with importing large XML files. I see it come 
up from time to time on the mailing lists. I remember BJ Freeman had to write 
his own import code because of some OFBiz limitation.

I'll accept the widget files, scripts, and config files are too small to 
optimize. Having event-driven parsing for those might be an interesting 
experiment though.

-Adrian


--- On Sat, 4/25/09, Adam Heath <doo...@brainfood.com> wrote:

> From: Adam Heath <doo...@brainfood.com>
> Subject: Re: Discussion: XML file parsing improvement
> To: dev@ofbiz.apache.org
> Date: Saturday, April 25, 2009, 9:52 PM
> Adrian Crum wrote:
> > OFBiz uses a lot of XML files. When each XML file is
> > read, it is first parsed into a DOM Document, then the
> > DOM Document is parsed into OFBiz Java objects. This
> > two-step process consumes a lot of memory, and it
> > takes more time than it should.
> > 
> > There is an alternative - what is called event-driven
> > parsing. The XML parser can be set up to convert XML
> > elements directly to the OFBiz Java objects -
> > bypassing the DOM Document build and parse steps.
> > Theoretically, this could provide a huge performance
> > boost, and it would use less memory. In addition, it
> > would solve the problem of huge XML files maxing out
> > server memory during the parse process - like with
> entity XML
> import/export.
> > 
> > Has anyone else considered this? Do you think it is
> > worth pursuing?
> 
> What files are you talking about, that are so huge, they
> can't be
> parsed with the simpler DOM model?
> 
> entity data files are sax based already.
> 
> widget files, scripts, config files are small, so it's
> better to keep
> the simpler algo, as David suggested.
> 
> additionally, I already did some memory profiling a while
> back, and
> interned the long-lived strings from parsed xml.  This
> actually
> reduced memory usage.
> 
> Another thing, the widgets, scripts, config files are read
> very
> infrequently, then cached.  The time it takes to parse them
> is not
> really a performance consideration.
> 
> As an aside, how much swap do you have on your server? 
> Any?  Is it
> being used?  Then you don't have enough ram.  If your
> work-load is
> causing swap to be used, then you haven't correctly
> identified your
> work load usage requirements.
> 
> The same can be said for java maximum memory allocation.


      

Reply via email to