Re: DataImportHandler Streaming XML Parse

2011-11-21 Thread Chris Hostetter

:  We're using DIH to import flat xml files. We're getting Heap memory
: exceptions due to the file size. Is there any way to force DIH to do a
: streaming parse rather than a DOM parse? I really don't want to chunk my
: files up or increase the heap size.

The XPathEntityProcessor is using a streaming parser -- it doesn't read in 
the entire DOM for each file (that's the main reason why it doesn't 
support full XPath expressions, just a subset)

If you are getting OOM errors it's possibly the sorce of the problem is 
simply a heap that is unreasonably small, or some other bug -- you haven't 
really provided many details to go on (ie: how big is your current heap, 
what types of things you do in this Solr server (ie: index & serach? 
using filterCache? sorting?), what your DIH configs look like, how big 
each indivual entity is in the XML files, etc...) so it's hard to guess 
what your problem might be.

One of the best tools for narrowing down a problem like this is to look at 
some heap visualization tools to see what is actaully using all the heap 
(who knows: maybe you can help us track down a bug no one else has 
discovered yet because your usecase is unusual)


-Hoss


DataImportHandler Streaming XML Parse

2011-11-08 Thread Josh Harness
All -

 We're using DIH to import flat xml files. We're getting Heap memory
exceptions due to the file size. Is there any way to force DIH to do a
streaming parse rather than a DOM parse? I really don't want to chunk my
files up or increase the heap size.

Many Thanks!

Josh