On 16/02/11 18:47, Frank Budinsky wrote:

Hi,

Hi Frank,

I am trying to load about 100,000 datagraphs (roughly 10M triples) into a
Jena TDB Dataset, but am running out of memory.

10M total I hope :-)

Is this on a 32 bit machine or a 64 bit machine? Also, which JVM is it?

I'm doing the load by
repeatedly calling code that looks something like this:

       InputStream instream = entity.getContent(); // the RDF graph to load

An input stream of RDF/XML bytes.

What does the data look like?

       fResourceDataset.getLock().enterCriticalSection(Lock.WRITE);
       try {
             Model model = fResourceDataset.getNamedModel(resourceURI);
             model.read(instream, null);
             //model.close();
       } finally { fResourceDataset.getLock().leaveCriticalSection() ; }
       instream.close();

After calling this code about 2-3 thousand times, it starts to run much
slower, and then eventually I get an exception like this:

       Exception in thread "pool-3-thread-43" java.lang.OutOfMemoryError:
Java heap space

Could you provide a complete minimal example please? There are some details like how fResourceDataset is set that might make a difference.

The stacktrace might be useful as well although it is not proof exactly where the memory is in use.

I tried increasing the amount of memory, but that just increased the number
of calls that succeed (e.g., 10000 vs 2000) before getting the exception.

I'm wondering if there's something I need to do to release memory between
these calls. I tried putting in a call to model.close(), but all that it
seemed to do was make it run slower, but I still got the exception.

There isn't anything that should be needed but I'm wondering if serveral things like entity.getContent as involved in using memory and it's the cumulative effect that's a problem.

Is there something else I should be doing, or is there a possible memory
leak in the version of Jena I'm using (a fairly recent SNAPSHOT build)?

Btw, I tried commenting out the call to model.read(instream, null) to
confirm that the memory leak isn't somewhere else in my program, and that
worked - i.e., went through the 100,000 calls without an exception.

Any ideas or pointers to what may be wrong would be appreciated.

Another way to do this is to use the bulk loader from the command line. It can read from stdin or from a collection of files.

RDF/XML parsing is expensive - N-Triples is fastest.

        Andy

Reply via email to