I tested my stuff in mapped mode which did not show the problem, so the
issue I encountered is specific to direct mode. IMO the code below contains
the problem and needs to be fixed with a call to blockMgrCache.getWrite (on
the wrapped BlockMgr) whenever there is a cache miss.

@Andy: could you fix this for the next build?

I still hit the OME though. I will try to analyze the stack dumps to see if
there is anything special. When I hit the OME, it comes very quick, i.e. in
a matter of seconds, my entire heap space is exhausted from regular heap
usage before.

Simon


|------------>
| From:      |
|------------>
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Simon Helsen/Toronto/IBM                                                     
                                                                     |
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| To:        |
|------------>
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
  |[email protected]                                                
                                                                     |
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Cc:        |
|------------>
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
  |[email protected]                                                
                                                                     |
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Date:      |
|------------>
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
  |08/05/2011 01:27 PM                                                          
                                                                     |
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Subject:   |
|------------>
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Re: testing TDB-Tx                                                           
                                                                     |
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|




Ok, so I looked at the code in BlockMgrCache and I notice that getWrite is
implemented like this:

    @Override
    synchronized
    public Block getWrite(long _id)
    {
        Long id = Long.valueOf(_id) ;
        Block blk = null ;
        if ( writeCache != null )
            blk = writeCache.get(id) ;
        if ( blk != null )
        {
            cacheWriteHits++ ;
            log("Hit(w->w) : %d", id) ;
            return blk ;
        }

        // blk is null.
        // A requested block may be in the other cache. Promote it.

        if ( readCache.containsKey(id) )
        {
            blk = readCache.get(id) ;
            cacheReadHits++ ;
            log("Hit(w->r) : %d", id) ;
            blk = promote(blk) ;
            return blk ;
        }

        // Did not find.
        cacheMisses++ ;
        log("Miss/w: %d", id) ;
        if ( writeCache != null )
            writeCache.put(id, blk) ;
        return blk ;
    }

Now, in my particular case, the id to come in is 0, but neither cache
contains the value. In this case, it will put the entry {0 = null} in the
write cache which necessarily leads to the NPE in the caller. So I am not
quite following the logic here. I would expect that if there is a cache
miss, the wrapped block mgr would be used to obtain block before it is
written in the writeCache.

Simon



|------------>
| From:      |
|------------>
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Simon Helsen/Toronto/IBM@IBMCA                                               
                                                                     |
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| To:        |
|------------>
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
  |[email protected]                                                
                                                                     |
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Cc:        |
|------------>
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
  |[email protected]                                                
                                                                     |
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Date:      |
|------------>
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
  |08/05/2011 12:01 PM                                                          
                                                                     |
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Subject:   |
|------------>
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Re: testing TDB-Tx                                                           
                                                                     |
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|





Paolo,

I don't know who wrote the code, but it would help if a first analysis is
done with the stack trace I provided and perhaps other questions that can
help identify the problem and a possible fix. Producing sharable code
which reproduces the problem is not trivial and may not even be possible
since we run in a rather complex framework. If possible, I will try to
debug myself from within our framework but obviously, I have limited
knowledge of the details of the PageBlockMgr.

All the instances of this stack trace (and I am seeing quite a few of
them) seem to come from BulkUpdateHandlerTDB.removeAll, but I know that
removeAll initially works fine (until the NPE occurs the first time - it
seems that after the first time, it keeps happening). I will also try to
isolate the problem more to see if there is anything specific that brings
the store in this situation

thanks

Simon



From:
Paolo Castagna <[email protected]>
To:
[email protected]
Date:
08/05/2011 10:46 AM
Subject:
Re: testing TDB-Tx



Hi Simon,
I don't have an answer or a solution to your problem, but I want to thank
you for reporting your experience (and the problems you found) on
jena-dev.

It would be extremely helpful if you could reproduce the problem with some
sharable code we can run and debug. I know, I know... it's not always easy
nor possible.

I hit a problem using TestTransSystem.java which runs multiple threads and
it's not easy to replicate.

Thanks again and keep sharing on jena-dev, this way everybody can benefit.

Cheers,
Paolo

Simon Helsen wrote:
> Hi everyone,
>
> I am giving a first stab at integrating TDB-Tx into our framework. My
> first goal is to test this new TDB *without* actually using the
> transaction API because we are coming from TDB 0.8.7. After some minor
> problems on our end, I seem to run into the following NPE (usually
> followed by a warning)
>
> 09:49:02,176 [jazz.jfs.suspending.indexer.internal.triple] ERROR
> com.ibm.team.jfs                                    - CRJZS5663E Unable
to
> persist tripe index
> java.lang.NullPointerException
>         at com.hp.hpl.jena.tdb.base.page.PageBlockMgr.getWrite(
> PageBlockMgr.java:50)
>         at com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.getMgrWrite(
> BPTreeNode.java:162)
>         at com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.get(
> BPTreeNode.java:145)
>         at com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.delete(
> BPTreeNode.java:227)
>         at
> com.hp.hpl.jena.tdb.index.bplustree.BPlusTree.deleteAndReturnOld(
> BPlusTree.java:324)
>         at com.hp.hpl.jena.tdb.index.bplustree.BPlusTree.delete(
> BPlusTree.java:318)
>         at com.hp.hpl.jena.tdb.index.TupleIndexRecord.performDelete(
> TupleIndexRecord.java:55)
>         at com.hp.hpl.jena.tdb.index.TupleIndexBase.delete(
> TupleIndexBase.java:61)
>         at
com.hp.hpl.jena.tdb.index.TupleTable.delete(TupleTable.java:108
> )
>         at com.hp.hpl.jena.tdb.graph.BulkUpdateHandlerTDB.removeWorker(
> BulkUpdateHandlerTDB.java:136)
>         at com.hp.hpl.jena.tdb.graph.BulkUpdateHandlerTDB.removeAll(
> BulkUpdateHandlerTDB.java:90)
>         at com.hp.hpl.jena.rdf.model.impl.ModelCom.removeAll(
> ModelCom.java:315)
>         ...
> 09:49:02,207 [jazz.jfs.suspending.indexer.internal.triple]  WARN
> com.hp.hpl.jena.tdb.base.block.BlockMgrCache        - Write cache: 0
> expelling entry that isn't there
>
> The exception sits all over my log and I wonder if it is related to the
> removeAll. Also, after a while, my memory spikes and I run into an OME.
I
> don't know yet if there is a relation, but possible these exceptions
cause
> serious leaks.
>
> The version of TDB (and associated libs) I am using is
> tx-tdb-0.9.0-20110802.083904-6
>
> thanks,
>
> Simon




Reply via email to