[jira] [Issue Comment Edited] (JENA-161) TDB Transaction deadlock

Paolo Castagna (Issue Comment Edited) (JIRA) Thu, 17 Nov 2011 05:49:19 -0800

    [ 
https://issues.apache.org/jira/browse/JENA-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152050#comment-13152050
 ]


Paolo Castagna edited comment on JENA-161 at 11/17/11 1:48 PM:
---------------------------------------------------------------

JournalControl
 --> replay(Journal journal, DatasetGraphTDB dsg) 
      --> dsg.getLock().enterCriticalSection(Lock.WRITE)
      <-- dsg.getLock().leaveCriticalSection()

TransactionManager:TSM_WriteBackEndTxn
 --> readerStarts(Transaction txn)
      --> txn.getBaseDataset().getLock().enterCriticalSection(Lock.READ)
 --> writerStarts(Transaction txn)
      --> txn.getBaseDataset().getLock().enterCriticalSection(Lock.READ)
 --> readerFinishes(Transaction txn) 
      --> txn.getBaseDataset().getLock().leaveCriticalSection()
 --> writerCommits(Transaction txn)
      --> txn.getBaseDataset().getLock().leaveCriticalSection()
           --> JournalControl.replay(txn)
 --> writerAborts(Transaction txn)
      --> txn.getBaseDataset().getLock().leaveCriticalSection()

If I understand correctly, the problem is that by the time a reader finishes 
and JournalControl.replay(txn) is called (which needs to acquire a WRITE lock) 
another reader came in (which acquired a READ lock) and therefore the WRITE 
lock is never acquired.

Maybe, as a temporary workaround to this problem, before we "write-back using a 
separate thread", we could use tryLock [1] in JournalControl.replay(...) method 
instead of lock (in order to avoid deadlocks). If we want to try this a 
different Lock implementation needs to be provided and used by DatasetGraphTxn.

 [1] 
http://download.oracle.com/javase/6/docs/api/java/util/concurrent/locks/Lock.html#tryLock%28long,%20java.util.concurrent.TimeUnit%29
                
      was (Author: castagna):
    JournalControl
 --> replay(Journal journal, DatasetGraphTDB dsg) 
      --> dsg.getLock().enterCriticalSection(Lock.WRITE)
      <-- dsg.getLock().leaveCriticalSection()

TransactionManager:TSM_WriteBackEndTxn
 --> readerStarts(Transaction txn)
      --> txn.getBaseDataset().getLock().enterCriticalSection(Lock.READ)
 --> writerStarts(Transaction txn)
      --> txn.getBaseDataset().getLock().enterCriticalSection(Lock.READ)
 --> readerFinishes(Transaction txn) 
      --> txn.getBaseDataset().getLock().leaveCriticalSection()
 --> writerCommits(Transaction txn)
      --> txn.getBaseDataset().getLock().leaveCriticalSection()
           --> JournalControl.replay(txn)
 --> writerAborts(Transaction txn)
      --> txn.getBaseDataset().getLock().leaveCriticalSection()

If I understand the problem is that by the time a reader finishes and 
JournalControl.replay(txn) is called (which needs to acquire a WRITE lock) 
another reader came in (which acquired a READ lock) and therefore the WRITE 
lock is never acquired.

Maybe, as a temporary workaround to this problem, before we "write-back using a 
separate thread", we could use tryLock [1] in JournalControl.replay(...) method 
instead of lock (in order to avoid deadlocks). If we want to try this a 
different Lock implementation needs to be provided and used by DatasetGraphTxn.

 [1] 
http://download.oracle.com/javase/6/docs/api/java/util/concurrent/locks/Lock.html#tryLock%28long,%20java.util.concurrent.TimeUnit%29
                  
> TDB Transaction deadlock
> ------------------------
>
>                 Key: JENA-161
>                 URL: https://issues.apache.org/jira/browse/JENA-161
>             Project: Jena
>          Issue Type: Bug
>          Components: TDB
>    Affects Versions: TDB 0.9.0
>         Environment: Windows 7 64 bit. I am using the snapshot of SVN of 10 
> November 2011
>            Reporter: Simon Helsen
>            Assignee: Andy Seaborne
>         Attachments: TDBTxnDeadlockTest.java
>
>
> While running some tests I ran into a deadlock. Unfortunately, on my 64 bit 
> windows 7, I was unable to trigger the complete stack trace. (there is no 
> equivalent of kill -3 and all known utilities to achieve this on windows 
> don't work with a 64 bit process). I was able to see two of the threads which 
> were hanging (because of a UI view in our admin console), but it is not 
> showing the other threads (which I'd need to see why the 2 threads I do have 
> are hanging). I will show the 2 threads which are hanging here in the hope it 
> rings a bell. I hope I will be able to get a full set of stack traces at some 
> point
> as for an analysis: Not sure if we are dealing with a double-crossed locking 
> issue here. It seems that thread 2 is waiting for thread 1 who clearly has 
> the lock on the transaction manager, but it is not clear why it is waiting on 
> the Transaction object. It seems that some other thread still has it and the 
> question is whether thread 2 could be the one (so there is a crossing of the 
> locks)? It would surprise me because thread 1 is doing a READ transaction and 
> thread 2 is doing a separate WRITE transaction. 
> thread 1:
> ------------
> com.hp.hpl.jena.tdb.transaction.Transaction.signalEnacted(Transaction.java:178)
>    
> com.hp.hpl.jena.tdb.transaction.TransactionManager.enactTransaction(TransactionManager.java:384)
>    
> com.hp.hpl.jena.tdb.transaction.TransactionManager.processDelayedReplayQueue(TransactionManager.java:419)
>    
> com.hp.hpl.jena.tdb.transaction.TransactionManager$TSM_WriteBackEndTxn.readerFinishes(TransactionManager.java:189)
>    
> com.hp.hpl.jena.tdb.transaction.TransactionManager.readerFinishes(TransactionManager.java:609)
>    
> com.hp.hpl.jena.tdb.transaction.TransactionManager.noteTxnCommit(TransactionManager.java:472)
>    
> com.hp.hpl.jena.tdb.transaction.TransactionManager.notifyCommit(TransactionManager.java:349)
>    com.hp.hpl.jena.tdb.transaction.Transaction.commit(Transaction.java:100)
>    com.hp.hpl.jena.tdb.transaction.Transaction.close(Transaction.java:151)
>    com.hp.hpl.jena.tdb.DatasetGraphTxn.close(DatasetGraphTxn.java:55)
>    
> com.ibm.team.jfs.rdf.internal.jena.tdb.JenaTdbTxProvider.storeOperation(JenaTdbTxProvider.java:179)
> <snip>
> thread 2
> ------------
> com.hp.hpl.jena.tdb.transaction.TransactionManager.notifyClose(TransactionManager.java:445)
>    com.hp.hpl.jena.tdb.transaction.Transaction.close(Transaction.java:162)
>    com.hp.hpl.jena.tdb.DatasetGraphTxn.close(DatasetGraphTxn.java:55)
>    
> com.ibm.team.jfs.rdf.internal.jena.tdb.JenaTdbTxProvider.storeOperation(JenaTdbTxProvider.java:285)
>    
> com.ibm.team.jfs.rdf.internal.jena.tdb.JenaTdbTxProvider.unprotectedDelete(JenaTdbTxProvider.java:1902)
>    
> com.ibm.team.jfs.rdf.internal.jena.tdb.JenaTdbTxProvider.delete(JenaTdbTxProvider.java:664)
> <snip>

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (JENA-161) TDB Transaction deadlock

Reply via email to