On 12/2/19 11:42 AM, Andrew Dunstan wrote:

On 12/2/19 11:23 AM, Tom Lane wrote:
I see from the buildfarm status page that since commits 6b802cfc7
et al went in a week ago, frogmouth and currawong have failed that
new test case every time, with the symptom

================== pgsql.build/src/test/isolation/regression.diffs 
===================
*** 
c:/prog/bf/root/REL_10_STABLE/pgsql.build/src/test/isolation/expected/async-notify.out
      Mon Nov 25 00:30:49 2019
--- 
c:/prog/bf/root/REL_10_STABLE/pgsql.build/src/test/isolation/results/async-notify.out
       Mon Dec  2 00:54:26 2019
***************
*** 93,99 ****
   step llisten: LISTEN c1; LISTEN c2;
   step lcommit: COMMIT;
   step l2commit: COMMIT;
- listener2: NOTIFY "c1" with payload "" from notifier
   step l2stop: UNLISTEN *;
starting permutation: llisten lbegin usage bignotify usage
--- 93,98 ----

(Note that these two critters don't run branches v11 and up, which
is why they're only showing this failure in 10 and 9.6.)

drongo showed the same failure once in v10, and fairywren showed
it once in v12.  Every other buildfarm animal seems happy.

I'm a little baffled as to what this might be --- some sort of
timing problem in our Windows signal emulation, perhaps?  But
if so, why haven't we found it years ago?

I don't have any ability to test this myself, so would appreciate
help or ideas.



I can test things, but I don't really know what to test. FYI frogmouth
and currawong run on virtualized XP. drongo anf fairywrne run on
virtualized WS2019. Neither VM is heavily resourced.

Hi Andrew, if you have time you could perhaps check the
isolation test structure itself.  Like Tom, I don't have a
Windows box to test this.

I would be curious to see if there is a race condition in
src/test/isolation/isolationtester.c between the loop starting
on line 820:

  while ((res = PQgetResult(conn)))
  {
     ...
  }

and the attempt to consume input that might include NOTIFY
messages on line 861:

  PQconsumeInput(conn);

If the first loop consumes the commit message, gets no
further PGresult from PQgetResult, and finishes, and execution
proceeds to PQconsumeInput before the NOTIFY has arrived
over the socket, there won't be anything for PQnotifies to
return, and hence for try_complete_step to print before
returning.

I'm not sure if it is possible for the commit message to
arrive before the notify message in the fashion I am describing,
but that's something you might easily check by having
isolationtester sleep before PQconsumeInput on line 861.


--
Mark Dilger


Reply via email to