> On Jul 20 16:16, David Allsopp wrote:
> > I've pushed a repro case for this to
> > https://github.com/dra27/cygwin-nanosleep-bug.git
> >
> > Originally noticed as the main CI system for OCaml has been failing
> > sporadically for the signal.ml test mentioned in that repo. This
> > morning I tried hammering that test on my dev machine and discovered
> > that it fails very frequently. No idea if that's drivers, Windows 10
> > updates, number of cores or what, but it was definitely happening, and
> > easily.
> >
> > Drilling further, it appears that NtQueryTimer is able to return a
> > negative value in the TimeRemaining field even when SignalState is
> > false. The values I've seen have always been < 15ms - i.e. less than
> > the timer resolution, so I wonder if there is a point at which the
> > timer has elapsed but has not been signalled, but WaitForMultipleObjects
> returns because of the EINTR signal.
> > Mildly surprising that it seems to be so reproducible.
> >
> > Anyway, a patch is attached which simply guards a negative return
> > value. The test on tbi.SignalState is in theory unnecessary.
> 
> Thanks for the patch, I think your patch is fine.  However, I'd like to
> dig a bit into this to see what exactly happens.  Do you have a very
> simple testcase in plain C, by any chance?

https://github.com/dra27/cygwin-nanosleep-bug/blob/main/signal.c was as simple 
as I'd gone at this stage (eliminating OCaml from the equation!). It might be 
possible to get it to happen without all the pthreads stuff: having confirmed 
it definitely wasn't OCaml and been able to put the appropriate system_printf's 
into cygwait to see that NtQueryTimer really was returning this small negative 
value, I stopped simplifying.

Does that repro case trigger on your system too?

Best,


D

Reply via email to