Re: OutOfMemoryError while loading datagraphs

Andy Seaborne Thu, 17 Feb 2011 05:36:19 -0800


On 16/02/11 18:47, Frank Budinsky wrote:

Hi,


Hi Frank,

I am trying to load about 100,000 datagraphs (roughly 10M triples) into a
Jena TDB Dataset, but am running out of memory.


10M total I hope :-)

Is this on a 32 bit machine or a 64 bit machine? Also, which JVM is it?

I'm doing the load by
repeatedly calling code that looks something like this:

       InputStream instream = entity.getContent(); // the RDF graph to load


An input stream of RDF/XML bytes.

What does the data look like?

       fResourceDataset.getLock().enterCriticalSection(Lock.WRITE);
       try {
             Model model = fResourceDataset.getNamedModel(resourceURI);
             model.read(instream, null);
             //model.close();
       } finally { fResourceDataset.getLock().leaveCriticalSection() ; }
       instream.close();

After calling this code about 2-3 thousand times, it starts to run much
slower, and then eventually I get an exception like this:

       Exception in thread "pool-3-thread-43" java.lang.OutOfMemoryError:
Java heap space

Could you provide a complete minimal example please? There are somedetails like how fResourceDataset is set that might make a difference.

The stacktrace might be useful as well although it is not proof exactlywhere the memory is in use.

I tried increasing the amount of memory, but that just increased the number
of calls that succeed (e.g., 10000 vs 2000) before getting the exception.

I'm wondering if there's something I need to do to release memory between
these calls. I tried putting in a call to model.close(), but all that it
seemed to do was make it run slower, but I still got the exception.

There isn't anything that should be needed but I'm wondering if serveralthings like entity.getContent as involved in using memory and it's thecumulative effect that's a problem.

Is there something else I should be doing, or is there a possible memory
leak in the version of Jena I'm using (a fairly recent SNAPSHOT build)?

Btw, I tried commenting out the call to model.read(instream, null) to
confirm that the memory leak isn't somewhere else in my program, and that
worked - i.e., went through the 100,000 calls without an exception.

Any ideas or pointers to what may be wrong would be appreciated.

Another way to do this is to use the bulk loader from the command line.It can read from stdin or from a collection of files.


RDF/XML parsing is expensive - N-Triples is fastest.

        Andy

Re: OutOfMemoryError while loading datagraphs

Reply via email to