I tested my stuff in mapped mode which did not show the problem, so the
issue I encountered is specific to direct mode. IMO the code below contains
the problem and needs to be fixed with a call to blockMgrCache.getWrite (on
the wrapped BlockMgr) whenever there is a cache miss.
@Andy: could you fix this for the next build?
I still hit the OME though. I will try to analyze the stack dumps to see if
there is anything special. When I hit the OME, it comes very quick, i.e. in
a matter of seconds, my entire heap space is exhausted from regular heap
usage before.
Simon
|------------>
| From: |
|------------>
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|Simon Helsen/Toronto/IBM
|
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| To: |
|------------>
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|[email protected]
|
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Cc: |
|------------>
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|[email protected]
|
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Date: |
|------------>
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|08/05/2011 01:27 PM
|
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Subject: |
|------------>
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|Re: testing TDB-Tx
|
>--------------------------------------------------------------------------------------------------------------------------------------------------|
Ok, so I looked at the code in BlockMgrCache and I notice that getWrite is
implemented like this:
@Override
synchronized
public Block getWrite(long _id)
{
Long id = Long.valueOf(_id) ;
Block blk = null ;
if ( writeCache != null )
blk = writeCache.get(id) ;
if ( blk != null )
{
cacheWriteHits++ ;
log("Hit(w->w) : %d", id) ;
return blk ;
}
// blk is null.
// A requested block may be in the other cache. Promote it.
if ( readCache.containsKey(id) )
{
blk = readCache.get(id) ;
cacheReadHits++ ;
log("Hit(w->r) : %d", id) ;
blk = promote(blk) ;
return blk ;
}
// Did not find.
cacheMisses++ ;
log("Miss/w: %d", id) ;
if ( writeCache != null )
writeCache.put(id, blk) ;
return blk ;
}
Now, in my particular case, the id to come in is 0, but neither cache
contains the value. In this case, it will put the entry {0 = null} in the
write cache which necessarily leads to the NPE in the caller. So I am not
quite following the logic here. I would expect that if there is a cache
miss, the wrapped block mgr would be used to obtain block before it is
written in the writeCache.
Simon
|------------>
| From: |
|------------>
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|Simon Helsen/Toronto/IBM@IBMCA
|
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| To: |
|------------>
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|[email protected]
|
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Cc: |
|------------>
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|[email protected]
|
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Date: |
|------------>
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|08/05/2011 12:01 PM
|
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Subject: |
|------------>
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|Re: testing TDB-Tx
|
>--------------------------------------------------------------------------------------------------------------------------------------------------|
Paolo,
I don't know who wrote the code, but it would help if a first analysis is
done with the stack trace I provided and perhaps other questions that can
help identify the problem and a possible fix. Producing sharable code
which reproduces the problem is not trivial and may not even be possible
since we run in a rather complex framework. If possible, I will try to
debug myself from within our framework but obviously, I have limited
knowledge of the details of the PageBlockMgr.
All the instances of this stack trace (and I am seeing quite a few of
them) seem to come from BulkUpdateHandlerTDB.removeAll, but I know that
removeAll initially works fine (until the NPE occurs the first time - it
seems that after the first time, it keeps happening). I will also try to
isolate the problem more to see if there is anything specific that brings
the store in this situation
thanks
Simon
From:
Paolo Castagna <[email protected]>
To:
[email protected]
Date:
08/05/2011 10:46 AM
Subject:
Re: testing TDB-Tx
Hi Simon,
I don't have an answer or a solution to your problem, but I want to thank
you for reporting your experience (and the problems you found) on
jena-dev.
It would be extremely helpful if you could reproduce the problem with some
sharable code we can run and debug. I know, I know... it's not always easy
nor possible.
I hit a problem using TestTransSystem.java which runs multiple threads and
it's not easy to replicate.
Thanks again and keep sharing on jena-dev, this way everybody can benefit.
Cheers,
Paolo
Simon Helsen wrote:
> Hi everyone,
>
> I am giving a first stab at integrating TDB-Tx into our framework. My
> first goal is to test this new TDB *without* actually using the
> transaction API because we are coming from TDB 0.8.7. After some minor
> problems on our end, I seem to run into the following NPE (usually
> followed by a warning)
>
> 09:49:02,176 [jazz.jfs.suspending.indexer.internal.triple] ERROR
> com.ibm.team.jfs - CRJZS5663E Unable
to
> persist tripe index
> java.lang.NullPointerException
> at com.hp.hpl.jena.tdb.base.page.PageBlockMgr.getWrite(
> PageBlockMgr.java:50)
> at com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.getMgrWrite(
> BPTreeNode.java:162)
> at com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.get(
> BPTreeNode.java:145)
> at com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.delete(
> BPTreeNode.java:227)
> at
> com.hp.hpl.jena.tdb.index.bplustree.BPlusTree.deleteAndReturnOld(
> BPlusTree.java:324)
> at com.hp.hpl.jena.tdb.index.bplustree.BPlusTree.delete(
> BPlusTree.java:318)
> at com.hp.hpl.jena.tdb.index.TupleIndexRecord.performDelete(
> TupleIndexRecord.java:55)
> at com.hp.hpl.jena.tdb.index.TupleIndexBase.delete(
> TupleIndexBase.java:61)
> at
com.hp.hpl.jena.tdb.index.TupleTable.delete(TupleTable.java:108
> )
> at com.hp.hpl.jena.tdb.graph.BulkUpdateHandlerTDB.removeWorker(
> BulkUpdateHandlerTDB.java:136)
> at com.hp.hpl.jena.tdb.graph.BulkUpdateHandlerTDB.removeAll(
> BulkUpdateHandlerTDB.java:90)
> at com.hp.hpl.jena.rdf.model.impl.ModelCom.removeAll(
> ModelCom.java:315)
> ...
> 09:49:02,207 [jazz.jfs.suspending.indexer.internal.triple] WARN
> com.hp.hpl.jena.tdb.base.block.BlockMgrCache - Write cache: 0
> expelling entry that isn't there
>
> The exception sits all over my log and I wonder if it is related to the
> removeAll. Also, after a while, my memory spikes and I run into an OME.
I
> don't know yet if there is a relation, but possible these exceptions
cause
> serious leaks.
>
> The version of TDB (and associated libs) I am using is
> tx-tdb-0.9.0-20110802.083904-6
>
> thanks,
>
> Simon