Mike Matrigali wrote:
I have seen this fail 2 or 3 times this week with various deadlocks, my
assumption is that the problem is a test problem and that it needs to
be changed to handle deadlocks.

I haven't seen requests for extra info from new cases, so my assumption
is that no new value is being gained by others running into this know
JIRA issue.

Is it time to move this out of the suite until a fix is submitted?

I am sorry that it has taken me some time to report back on this. I have been much on the road lately (for different reasons), and I am trying to catch up now. What I have done with this issue is that I have run the test with some tracing to see what caused the lock timeouts. I have not quite got to the bottom of it, but so far it seems to me that it is not a deadlock scenario, but just timeouts due to long queues on the dictionary lock. (See below for more info).

Since creating 100 tables in parallel is not a common scenario, I am not sure whether it is worth the effort to attempt fix this so the test runs cleanly. I was about to suggest take we should just remove the test from derbyall. The test was made to test a fix (Derby-230) that I do not think is very likely to reoccur. Unless someone protests, this is what I will do.

A more detailed description of what I have found:
When a thread tries to create a table, it will first get a shared lock on the dictionary (DataDictionaryImpl.startReading). This is released before it tries to lock the dictionary exclusively. The way DataDictionaryImpl.startwriting works is that it first checks whether someone is holding a lock on the dictionary. If so, it will sleep for a while a then try again. This goes on for a while until it gets impatient and actually requests an exclusive lock and enters the lock queue. In the mean time, a lot more threads have acquired a shared lock and the updating thread will have to wait for all of them to release it. This causes the thread to time out. I have not tried whether it would improve the issue if we did not allow readers to acquire locks while a writer is waiting, and I do not know what general consequences that may have. However, since this does not seem to create problems for normal load, I doubt that it is worth the effort to do anything about it.

--
Øystein

Reply via email to