[
https://issues.apache.org/jira/browse/DERBY-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Brett Wooldridge updated DERBY-4279:
------------------------------------
Attachment: patch4279_2.txt
Okay, here is my second cut a fix...
First of all, the original report surmised that any DML, not just SELECT, could
cause this deadlock. I myself thought this as well, but it turns out not to be
the case.
There are two places in the statement compilation code where openConglomerate()
is called. All compiled statement nodes ultimately inherit from
StatementNode. In StatementNode.lockTableForCompilation() you will find this
logic:
if (dataDictionary.getCacheMode() == DataDictionary.DDL_MODE) <=== Important
{
...
heapCC = tc.openConglomerate(td.getHeapConglomerateId(),
false,
TransactionController.OPENMODE_FORUPDATE |
TransactionController.OPENMODE_FOR_LOCK_ONLY,
TransactionController.MODE_RECORD,
TransactionController.ISOLATION_SERIALIZABLE); <=== Important
...
}
The method is preceded with this comment:
/* We need to get some kind of table lock (IX here) at the beginning of
* compilation of DMLModStatementNode and DDLStatementNode, to prevent
the
* interference of insert/update/delete/DDL compilation and DDL
execution,
* see beetle 3976, 4343, and
$WS/language/SolutionsToConcurrencyIssues.txt
*/
However, ultimately, according to the logic the lock is only obtained if the
DataDictionary is in DDL_MODE. This is all fine.
Now to the heart of the issue. The second place where openConglomerate() is
called is in ResultColumnList.generateHolderMethod(). This is used for SELECT
statements for columns that will ultimately appear in the ResultSet (or
sub-SELECTs in nested queries). In ResultColumnList.generateHolderMethod() you
will find this logic:
cc = getLanguageConnectionContext().getTransactionCompile().openConglomerate(
conglomerateId,
false,
0,
TransactionController.MODE_RECORD,
TransactionController.ISOLATION_READ_COMMITTED); <=== Important
Notice no conditional logic. In other words, a lock on the conglomerate is
unconditionally obtained. It is this lock that contends with a table lock that
may have been obtained elsewhere (possibly DDL, possibly an explicit lock).
The attached patch changes this code to the following:
LanguageConnectionContext lcc = getLanguageConnectionContext();
DataDictionary dd = lcc.getDataDictionary();
int isolationLevel = (dd.getCacheMode() == DataDictionary.DDL_MODE) ?
TransactionController.ISOLATION_READ_COMMITTED :
TransactionController.ISOLATION_NOLOCK;
cc = lcc.getTransactionCompile().openConglomerate(
conglomerateId,
false,
0,
TransactionController.MODE_RECORD,
isolationLevel);
Basically, it too checks the DataDictionary to see if it is in DDL_MODE. If it
is, it maintains the same locking level as the existing code. However, if the
DataDictionary is not being modified the code will not obtain a lock.
What is the impact? The original code has the effect of blocking DDL from
execution by locking the table(s) during this compile phase. And in turn, if
DDL has locked the table(s), the compilation will be blocked. The first I take
to be an unintended side-effect (as you'll see).
Let's take the second case first. If DDL has occurred (and locked tables), the
DataDictionary will be in DDL_MODE, and the new code behaves the same as the
old code; the conglomerate is opened with READ_COMMITTED isolation and the
compilation will be blocked just as before.
Now the first case. If compilation of a statement (that involves
ResultColumnList) has started and no DDL is in-progress at that instant, no
locks are obtained (because the cache mode is not DDL_MODE). This is the same
as other statement types [that extend from StatementNode]. If DDL occurs after
compilation has started, it will not be blocked (but it was never blocked in
all other cases). Because it was never blocked in other cases (due to the
conditional logic), I take this to be a side-effect. However...
If DDL does happen to alter the table(s) involved there is already code to
handle it. All roads to ResultColumnList.generateHolderMethod() lead back to
GenericStatement.prepare(). GenericStatement.prepare() has this logic fragment:
try {
return prepMinion(lcc, true, (Object[]) null, (SchemaDescriptor) null,
forMetaData);
} catch (StandardException se) {
// There is a chance that we didn't see the invalidation
// request from a DDL operation in another thread because
// the statement wasn't registered as a dependent until
// after the invalidation had been completed. Assume that's
// what has happened if we see a conglomerate does not exist
// error, and force a retry even if the statement hasn't been
// invalidated.
if (SQLState.STORE_CONGLOMERATE_DOES_NOT_EXIST.equals(se.getMessageId())) {
... recompile ...
}
}
So, even though the compile of a SELECT will no longer block DDL from occurring
(remember UPDATE, DELETE, and all other DML never blocked it anyway) there is
code in-place to handle the possibility that openConglomerate() in
StatementNode or ResultColumnList might fail. In that case, the compile is
given one more shot before failing (recompile = true).
It is debatable now that the logic in ResultColumnList is the same as
StatementNode, whether the isolation in ResultColumnList should also be
ISOLATION_SERIALIZABLE like StatementNode given that it is now also conditional
based on DataDictionary.getCacheMode() == DDL_MODE. However, my approach was
to be least impacting in terms of deviation from current known working code.
On a final note, this patch obviously fixed the test case attached to this
issue, however past attempts to fix this bug did the same but ultimately failed
in some of the long running stress tests. It would be helpful if this patch
could be applied and let run through the usual full barrage of tests. That
said, the previous patch attempted to fix the issue at the statement cache
level with all of it's hairy synchronization. This patch is considerably
simpler both conceptually and implementation-wise.
> Statement cache deadlock
> ------------------------
>
> Key: DERBY-4279
> URL: https://issues.apache.org/jira/browse/DERBY-4279
> Project: Derby
> Issue Type: Bug
> Components: SQL
> Affects Versions: 10.0.2.1, 10.1.3.1, 10.2.2.0, 10.3.3.0, 10.4.2.0,
> 10.5.1.1, 10.8.1.2
> Environment: Windows Vista, OS X 10.5+
> Reporter: Jeff Stuckman
> Labels: derby_triage10_5_2
> Attachments: Derby4279.java, client_stacktrace_activation_closed.txt,
> patch4279.txt, patch4279_2.txt, stacktrace.txt
>
>
> Due to a design flaw in the statement cache, a deadlock can occur if a
> prepared statement becomes out-of-date.
> I will illustrate this with the following example:
> The application is using the embedded Derby driver. The application has two
> threads, and each thread uses its own connection.
> There is a table named MYTABLE with column MYCOLUMN.
> 1. A thread prepares and executes the query SELECT MYCOLUMN FROM MYTABLE. The
> prepared statement is stored in the statement cache (see
> org.apache.derby.impl.sql.GenericStatement for this logic)
> 2. After some time, the prepared statement becomes invalid or out-of-date for
> some reason (see org.apache.derby.impl.sql.GenericPreparedStatement)
> 3. Thread 1 begins a transaction and executes LOCK TABLE MYTABLE IN EXCLUSIVE
> MODE
> 4. Thread 2 begins a transaction and executes SELECT MYCOLUMN FROM MYTABLE.
> The statement is in the statement cache but it is out-of-date. The thread
> begins to recompile the statement. To compile the statement, the thread needs
> a shared lock on MYTABLE. Thread 1 already has an exclusive lock on MYTABLE.
> Thread 2 waits.
> 5. Thread 1 executes SELECT MYCOLUMN FROM MYTABLE. The statement is in the
> statement cache but it is being compiled. Thread 1 waits on the statement's
> monitor.
> 6. We have a deadlock. Derby eventually detects a lock timeout, but the error
> message is not descriptive. The stacks at the time of the deadlock are:
> This deadlock is unique because it can still occur in a properly designed
> database. You are only safe if all of your transactions are very simple and
> cannot be interleaved in a sequence that causes the deadlock, or if your
> particular statements do not require a table lock to compile. (For the sake
> of simplicity, I used LOCK TABLE in my example, but any UPDATE statement
> would fit.)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira