[jira] [Updated] (DERBY-4279) Statement cache deadlock

Brett Wooldridge (JIRA) Wed, 27 Jun 2012 00:35:49 -0700

     [ 
https://issues.apache.org/jira/browse/DERBY-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Brett Wooldridge updated DERBY-4279:
------------------------------------

    Attachment: patch4279_2.txt

Okay, here is my second cut a fix...

First of all, the original report surmised that any DML, not just SELECT, could 
cause this deadlock.  I myself thought this as well, but it turns out not to be 
the case.

There are two places in the statement compilation code where openConglomerate() 
is called.   All compiled statement nodes ultimately inherit from 
StatementNode.  In StatementNode.lockTableForCompilation() you will find this 
logic:

if (dataDictionary.getCacheMode() == DataDictionary.DDL_MODE)   <=== Important
{
    ...
    heapCC = tc.openConglomerate(td.getHeapConglomerateId(),
        false,
        TransactionController.OPENMODE_FORUPDATE |
        TransactionController.OPENMODE_FOR_LOCK_ONLY,
        TransactionController.MODE_RECORD,
        TransactionController.ISOLATION_SERIALIZABLE);   <=== Important
    ...
}

The method is preceded with this comment:

        /* We need to get some kind of table lock (IX here) at the beginning of
         * compilation of DMLModStatementNode and DDLStatementNode, to prevent 
the
         * interference of insert/update/delete/DDL compilation and DDL 
execution,
         * see beetle 3976, 4343, and 
$WS/language/SolutionsToConcurrencyIssues.txt
         */

However, ultimately, according to the logic the lock is only obtained if the 
DataDictionary is in DDL_MODE.  This is all fine.

Now to the heart of the issue.  The second place where openConglomerate() is 
called is in ResultColumnList.generateHolderMethod().  This is used for SELECT 
statements for columns that will ultimately appear in the ResultSet (or 
sub-SELECTs in nested queries).  In ResultColumnList.generateHolderMethod() you 
will find this logic:

cc = getLanguageConnectionContext().getTransactionCompile().openConglomerate(
        conglomerateId,
        false,
        0,
        TransactionController.MODE_RECORD,
        TransactionController.ISOLATION_READ_COMMITTED);   <=== Important

Notice no conditional logic.  In other words, a lock on the conglomerate is 
unconditionally obtained.  It is this lock that contends with a table lock that 
may have been obtained elsewhere (possibly DDL, possibly an explicit lock).  
The attached patch changes this code to the following:

LanguageConnectionContext lcc = getLanguageConnectionContext();
DataDictionary dd = lcc.getDataDictionary();
int isolationLevel = (dd.getCacheMode() == DataDictionary.DDL_MODE) ?
        TransactionController.ISOLATION_READ_COMMITTED : 
TransactionController.ISOLATION_NOLOCK;
cc = lcc.getTransactionCompile().openConglomerate(
        conglomerateId,
        false,
        0,
        TransactionController.MODE_RECORD,
        isolationLevel);

Basically, it too checks the DataDictionary to see if it is in DDL_MODE.  If it 
is, it maintains the same locking level as the existing code.  However, if the 
DataDictionary is not being modified the code will not obtain a lock.

What is the impact?  The original code has the effect of blocking DDL from 
execution by locking the table(s) during this compile phase.  And in turn, if 
DDL has locked the table(s), the compilation will be blocked.  The first I take 
to be an unintended side-effect (as you'll see).

Let's take the second case first.  If DDL has occurred (and locked tables), the 
DataDictionary will be in DDL_MODE, and the new code behaves the same as the 
old code; the conglomerate is opened with READ_COMMITTED isolation and the 
compilation will be blocked just as before.

Now the first case.  If compilation of a statement (that involves 
ResultColumnList) has started and no DDL is in-progress at that instant, no 
locks are obtained (because the cache mode is not DDL_MODE).  This is the same 
as other statement types [that extend from StatementNode].  If DDL occurs after 
compilation has started, it will not be blocked (but it was never blocked in 
all other cases).  Because it was never blocked in other cases (due to the 
conditional logic), I take this to be a side-effect.  However...

If DDL does happen to alter the table(s) involved there is already code to 
handle it.  All roads to ResultColumnList.generateHolderMethod() lead back to 
GenericStatement.prepare().  GenericStatement.prepare() has this logic fragment:

try {
    return prepMinion(lcc, true, (Object[]) null, (SchemaDescriptor) null, 
forMetaData);
} catch (StandardException se) {
    // There is a chance that we didn't see the invalidation
    // request from a DDL operation in another thread because
    // the statement wasn't registered as a dependent until
    // after the invalidation had been completed. Assume that's
    // what has happened if we see a conglomerate does not exist
    // error, and force a retry even if the statement hasn't been
    // invalidated.
    if (SQLState.STORE_CONGLOMERATE_DOES_NOT_EXIST.equals(se.getMessageId())) {
        ... recompile ...
    }
}

So, even though the compile of a SELECT will no longer block DDL from occurring 
(remember UPDATE, DELETE, and all other DML never blocked it anyway) there is 
code in-place to handle the possibility that openConglomerate() in 
StatementNode or ResultColumnList might fail.  In that case, the compile is 
given one more shot before failing (recompile = true).

It is debatable now that the logic in ResultColumnList is the same as 
StatementNode, whether the isolation in ResultColumnList should also be 
ISOLATION_SERIALIZABLE like StatementNode given that it is now also conditional 
based on DataDictionary.getCacheMode() == DDL_MODE.  However, my approach was 
to be least impacting in terms of deviation from current known working code.

On a final note, this patch obviously fixed the test case attached to this 
issue, however past attempts to fix this bug did the same but ultimately failed 
in some of the long running stress tests.  It would be helpful if this patch 
could be applied and let run through the usual full barrage of tests.  That 
said, the previous patch attempted to fix the issue at the statement cache 
level with all of it's hairy synchronization.  This patch is considerably 
simpler both conceptually and implementation-wise.

                
> Statement cache deadlock
> ------------------------
>
>                 Key: DERBY-4279
>                 URL: https://issues.apache.org/jira/browse/DERBY-4279
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.0.2.1, 10.1.3.1, 10.2.2.0, 10.3.3.0, 10.4.2.0, 
> 10.5.1.1, 10.8.1.2
>         Environment: Windows Vista, OS X 10.5+
>            Reporter: Jeff Stuckman
>              Labels: derby_triage10_5_2
>         Attachments: Derby4279.java, client_stacktrace_activation_closed.txt, 
> patch4279.txt, patch4279_2.txt, stacktrace.txt
>
>
> Due to a design flaw in the statement cache, a deadlock can occur if a 
> prepared statement becomes out-of-date.
> I will illustrate this with the following example:
> The application is using the embedded Derby driver. The application has two 
> threads, and each thread uses its own connection.
> There is a table named MYTABLE with column MYCOLUMN.
> 1. A thread prepares and executes the query SELECT MYCOLUMN FROM MYTABLE. The 
> prepared statement is stored in the statement cache (see 
> org.apache.derby.impl.sql.GenericStatement for this logic)
> 2. After some time, the prepared statement becomes invalid or out-of-date for 
> some reason (see org.apache.derby.impl.sql.GenericPreparedStatement)
> 3. Thread 1 begins a transaction and executes LOCK TABLE MYTABLE IN EXCLUSIVE 
> MODE
> 4. Thread 2 begins a transaction and executes SELECT MYCOLUMN FROM MYTABLE. 
> The statement is in the statement cache but it is out-of-date. The thread 
> begins to recompile the statement. To compile the statement, the thread needs 
> a shared lock on MYTABLE. Thread 1 already has an exclusive lock on MYTABLE. 
> Thread 2 waits.
> 5. Thread 1 executes SELECT MYCOLUMN FROM MYTABLE. The statement is in the 
> statement cache but it is being compiled. Thread 1 waits on the statement's 
> monitor.
> 6. We have a deadlock. Derby eventually detects a lock timeout, but the error 
> message is not descriptive. The stacks at the time of the deadlock are:
> This deadlock is unique because it can still occur in a properly designed 
> database. You are only safe if all of your transactions are very simple and 
> cannot be interleaved in a sequence that causes the deadlock, or if your 
> particular statements do not require a table lock to compile. (For the sake 
> of simplicity, I used LOCK TABLE in my example, but any UPDATE statement 
> would fit.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (DERBY-4279) Statement cache deadlock

Reply via email to