On Thu, Feb 11, 2016 at 9:36 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Thu, Feb 11, 2016 at 9:29 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> Robert Haas <rh...@postgresql.org> writes:
>>> Add some isolation tests for deadlock detection and resolution.
>>
>> Buildfarm says this needs work ...
>>
>> dromedary is one of mine, do you need me to look into what is
>> happening?
>
> That would be great.  Taking a look at what happened, I have a feeling
> this may be a race condition of some kind in the isolation tester.  It
> seems to have failed to recognize that a1 started waiting, and that
> caused the "deadlock detected" message to reported differently.  I'm
> not immediately sure what to do about that.

Yeah, so: try_complete_step() waits 10ms, and if it still hasn't
gotten any data back from the server, then it uses a separate query to
see whether the step in question is waiting on a lock.  So what
must've happened here is that it took more than 10ms for the process
to show up as waiting in pg_stat_activity.

It might be possible to fix this by not passing STEP_NONBLOCK if
there's only one connection that isn't waiting.  I think I had it like
that at one point, and then took it out because it caused some other
problem.  Another option is to lengthen the timeout.  It doesn't seem
great to be dependent on a fixed timeout, but the server doesn't send
any protocol traffic to indicate a lock wait.  If we declared which
steps are supposed to wait, then there'd be no ambiguity, but that
seems like a drag.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to