to follow-up on this. I was told that byte buffers are not allocated on
the Java heap, but are always put onto the native heap (i.e. the OS). When
looking at the OME stack traces, they invariably look like this:
JVMDUMP013I Processed dump event "systhrow", detail
"java/lang/OutOfMemoryError".
Exception in thread "jazz.jfs.resuming.indexer.internal.triple"
java.lang.OutOfMemoryError
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:50)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:323)
at com.hp.hpl.jena.tdb.base.objectfile.ObjectFileStorage.read(
ObjectFileStorage.java:249)
at com.hp.hpl.jena.tdb.lib.NodeLib.fetchDecode(NodeLib.java:60)
at com.hp.hpl.jena.tdb.nodetable.NodeTableNative.readNodeByNodeId(
NodeTableNative.java:162)
at
com.hp.hpl.jena.tdb.nodetable.NodeTableNative._retrieveNodeByNodeId(
NodeTableNative.java:89)
at com.hp.hpl.jena.tdb.nodetable.NodeTableNative.getNodeForNodeId(
NodeTableNative.java:59)
at
com.hp.hpl.jena.tdb.nodetable.NodeTableCache._retrieveNodeByNodeId(
NodeTableCache.java:88)
at com.hp.hpl.jena.tdb.nodetable.NodeTableCache.getNodeForNodeId(
NodeTableCache.java:59)
at
com.hp.hpl.jena.tdb.nodetable.NodeTableWrapper.getNodeForNodeId(
NodeTableWrapper.java:44)
at com.hp.hpl.jena.tdb.nodetable.NodeTableInline.getNodeForNodeId(
NodeTableInline.java:55)
at com.hp.hpl.jena.tdb.lib.TupleLib.quad(TupleLib.java:125)
at com.hp.hpl.jena.tdb.lib.TupleLib.quad(TupleLib.java:118)
at com.hp.hpl.jena.tdb.lib.TupleLib.access$100(TupleLib.java:32)
at com.hp.hpl.jena.tdb.lib.TupleLib$4.convert(TupleLib.java:76)
at com.hp.hpl.jena.tdb.lib.TupleLib$4.convert(TupleLib.java:72)
at org.openjena.atlas.iterator.Iter$4.next(Iter.java:267)
at
com.hp.hpl.jena.tdb.store.GraphTDBBase$ProjectQuadsToTriples.next(
GraphTDBBase.java:183)
at
com.hp.hpl.jena.tdb.store.GraphTDBBase$ProjectQuadsToTriples.next(
GraphTDBBase.java:171)
at com.hp.hpl.jena.util.iterator.WrappedIterator.next(
WrappedIterator.java:68)
at com.hp.hpl.jena.util.iterator.Map1Iterator.next(
Map1Iterator.java:35)
at com.hp.hpl.jena.util.iterator.WrappedIterator.next(
WrappedIterator.java:68)
at com.hp.hpl.jena.rdf.model.impl.StmtIteratorImpl.next(
StmtIteratorImpl.java:33)
at com.hp.hpl.jena.rdf.model.impl.StmtIteratorImpl.next(
StmtIteratorImpl.java:21)
It seems that these byte buffers are responsible either by uncontrollably
bloating up or perhaps by not being cleaned up properly when I close a
dataset. Any thoughts how and if this could be possible?
Simon
From:
Simon Helsen/Toronto/IBM@IBMCA
To:
[email protected]
Cc:
[email protected]
Date:
08/08/2011 12:38 PM
Subject:
Re: testing TDB-Tx
Andy,
thanks for the fix. With regard to the OME, that is a very strange one. I
first did some tests using the IBM JRE 6 and using the eclipse memory
analyzer (see
http://wiki.eclipse.org/index.php/MemoryAnalyzer#System_Dumps_and_Heap_Dumps_from_IBM_Virtual_Machines
if you're interested) I noticed that my 2Gb heap dump only showed about
110Mb of real data. My initial max heap was set to 1.5Gb and stack at
512Mb. I am not quite sure how that is possible except if the problem sits
1) in native memory or 2) there is a sudden spike in dynamic memory
generation which does not give the GC time to collect before the JVM
bails.
My next test was to use Orcale's JRE 6 which did not start at first
because maxPermGen wasn't large enough, so I increased it to 256Mb (the
default is 64Mb) and then around the same time, I also ran into an OME but
I don't see to get a useful heap dump to analyze. And I suspect it would
show something similar to what I saw with the IBM JRE.
I also tried just increasing the heap size from 1.5Gb to 3Gb (I have 8
total on my machine), and although that delays the problem, it does not
actually avoid it and in fact, the moment the memory spikes (it happens
really fast), my entire machine freezes as all my memory is more or less
gone.
A few things about the unit tests I am using. We have quite a few tests
which "disconnect" and "connect" to the Jena dataset by opening and
closing them (even moving the indices physically around, etc.) I suspect
the problem is related to this, but unfortunately, I cannot pin it to one
particular test since I can "move" the problem along by skipping tests. I
should add this entire OME problem is a regression from our point of view
since we did not observe anything like this in 0.8.7. I also have to
emphasize that this is still without using the transaction API. I may give
it a shot with the new API, but I doubt it will resolve the issue.
This brings me to a question. Is calling reset on the StoreProvider to
release all resources and allow me to remove the indices from disk?
thanks,
Simon
From:
Andy Seaborne <[email protected]>
To:
[email protected]
Date:
08/08/2011 11:20 AM
Subject:
Re: testing TDB-Tx
On 05/08/11 19:01, Simon Helsen wrote:
> I tested my stuff in mapped mode which did not show the problem, so the
> issue I encountered is specific to direct mode. IMO the code below
> contains the problem and needs to be fixed with a call to
> blockMgrCache.getWrite (on the wrapped BlockMgr) whenever there is a
> cache miss.
>
> @Andy: could you fix this for the next build?
Done in SVN.
We've lost the repo (AWS ate the server) [temporarily] so for a bit,
it's "svn update; mvn clean package"
> I still hit the OME though. I will try to analyze the stack dumps to see
> if there is anything special. When I hit the OME, it comes very quick,
> i.e. in a matter of seconds, my entire heap space is exhausted from
> regular heap usage before.
So if I read the thread right, calling BulkUpdateHandlerTDB.removeAll
still goes bad sometimes.
Andy
>
> Simon
>
> Inactive hide details for Simon Helsen---08/05/2011 01:27:25 PM---Ok, so
> I looked at the code in BlockMgrCache and I notice thaSimon
> Helsen---08/05/2011 01:27:25 PM---Ok, so I looked at the code in
> BlockMgrCache and I notice that getWrite is implemented like this:
>
>
> From:
> Simon Helsen/Toronto/IBM
>
> To:
> [email protected]
>
> Cc:
> [email protected]
>
> Date:
> 08/05/2011 01:27 PM
>
> Subject:
> Re: testing TDB-Tx
>
> ------------------------------------------------------------------------
>
>
> Ok, so I looked at the code in BlockMgrCache and I notice that getWrite
> is implemented like this:
>
> @Override
> *synchronized*
> *public*Block getWrite(*long*_id)
> {
> Long id = Long./valueOf/(_id) ;
> Block blk = *null*;
> *if*( writeCache!= *null*)
> blk = writeCache.get(id) ;
> *if*( blk != *null*)
> {
> cacheWriteHits++ ;
> log("Hit(w->w) : %d", id) ;
> *return*blk ;
> }
>
> // blk is null.
> // A requested block may be in the other cache. Promote it.
>
> *if*( readCache.containsKey(id) )
> {
> blk = readCache.get(id) ;
> cacheReadHits++ ;
> log("Hit(w->r) : %d", id) ;
> blk = promote(blk) ;
> *return*blk ;
> }
>
> // Did not find.
> cacheMisses++ ;
> log("Miss/w: %d", id) ;
> *if*( writeCache!= *null*)
> writeCache.put(id, blk) ;
> *return*blk ;
> }
>
> Now, in my particular case, the id to come in is 0, but neither cache
> contains the value. In this case, it will put the entry {0 = null} in
> the write cache which necessarily leads to the NPE in the caller. So I
> am not quite following the logic here. I would expect that if there is a
> cache miss, the wrapped block mgr would be used to obtain block before
> it is written in the writeCache.
>
> Simon
>
>
> Inactive hide details for Simon Helsen---08/05/2011 12:01:57 PM---Paolo,
> I don't know who wrote the code, but it would help if Simon
> Helsen---08/05/2011 12:01:57 PM---Paolo, I don't know who wrote the
> code, but it would help if a first analysis is
>
>
> From:
> Simon Helsen/Toronto/IBM@IBMCA
>
> To:
> [email protected]
>
> Cc:
> [email protected]
>
> Date:
> 08/05/2011 12:01 PM
>
> Subject:
> Re: testing TDB-Tx
>
> ------------------------------------------------------------------------
>
>
>
> Paolo,
>
> I don't know who wrote the code, but it would help if a first analysis
is
> done with the stack trace I provided and perhaps other questions that
can
> help identify the problem and a possible fix. Producing sharable code
> which reproduces the problem is not trivial and may not even be possible
> since we run in a rather complex framework. If possible, I will try to
> debug myself from within our framework but obviously, I have limited
> knowledge of the details of the PageBlockMgr.
>
> All the instances of this stack trace (and I am seeing quite a few of
> them) seem to come from BulkUpdateHandlerTDB.removeAll, but I know that
> removeAll initially works fine (until the NPE occurs the first time - it
> seems that after the first time, it keeps happening). I will also try to
> isolate the problem more to see if there is anything specific that
brings
> the store in this situation
>
> thanks
>
> Simon
>
>
>
> From:
> Paolo Castagna <[email protected]>
> To:
> [email protected]
> Date:
> 08/05/2011 10:46 AM
> Subject:
> Re: testing TDB-Tx
>
>
>
> Hi Simon,
> I don't have an answer or a solution to your problem, but I want to
thank
> you for reporting your experience (and the problems you found) on
> jena-dev.
>
> It would be extremely helpful if you could reproduce the problem with
some
> sharable code we can run and debug. I know, I know... it's not always
easy
> nor possible.
>
> I hit a problem using TestTransSystem.java which runs multiple threads
and
> it's not easy to replicate.
>
> Thanks again and keep sharing on jena-dev, this way everybody can
benefit.
>
> Cheers,
> Paolo
>
> Simon Helsen wrote:
> > Hi everyone,
> >
> > I am giving a first stab at integrating TDB-Tx into our framework. My
> > first goal is to test this new TDB *without* actually using the
> > transaction API because we are coming from TDB 0.8.7. After some
minor
> > problems on our end, I seem to run into the following NPE (usually
> > followed by a warning)
> >
> > 09:49:02,176 [jazz.jfs.suspending.indexer.internal.triple] ERROR
> > com.ibm.team.jfs - CRJZS5663E Unable
> to
> > persist tripe index
> > java.lang.NullPointerException
> > at com.hp.hpl.jena.tdb.base.page.PageBlockMgr.getWrite(
> > PageBlockMgr.java:50)
> > at com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.getMgrWrite(
> > BPTreeNode.java:162)
> > at com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.get(
> > BPTreeNode.java:145)
> > at com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.delete(
> > BPTreeNode.java:227)
> > at
> > com.hp.hpl.jena.tdb.index.bplustree.BPlusTree.deleteAndReturnOld(
> > BPlusTree.java:324)
> > at com.hp.hpl.jena.tdb.index.bplustree.BPlusTree.delete(
> > BPlusTree.java:318)
> > at com.hp.hpl.jena.tdb.index.TupleIndexRecord.performDelete(
> > TupleIndexRecord.java:55)
> > at com.hp.hpl.jena.tdb.index.TupleIndexBase.delete(
> > TupleIndexBase.java:61)
> > at
> com.hp.hpl.jena.tdb.index.TupleTable.delete(TupleTable.java:108
> > )
> > at com.hp.hpl.jena.tdb.graph.BulkUpdateHandlerTDB.removeWorker(
> > BulkUpdateHandlerTDB.java:136)
> > at com.hp.hpl.jena.tdb.graph.BulkUpdateHandlerTDB.removeAll(
> > BulkUpdateHandlerTDB.java:90)
> > at com.hp.hpl.jena.rdf.model.impl.ModelCom.removeAll(
> > ModelCom.java:315)
> > ...
> > 09:49:02,207 [jazz.jfs.suspending.indexer.internal.triple] WARN
> > com.hp.hpl.jena.tdb.base.block.BlockMgrCache - Write cache: 0
> > expelling entry that isn't there
> >
> > The exception sits all over my log and I wonder if it is related to
the
> > removeAll. Also, after a while, my memory spikes and I run into an
OME.
> I
> > don't know yet if there is a relation, but possible these exceptions
> cause
> > serious leaks.
> >
> > The version of TDB (and associated libs) I am using is
> > tx-tdb-0.9.0-20110802.083904-6
> >
> > thanks,
> >
> > Simon
>
>
>
>
>