On Sun, Sep 21, 2025 at 09:01:49PM +0200, Léo Larnack wrote:
> Hi misc@,
>
> I'm currently working on an OCaml library[1] that does some multicore
> stuff. It spawns a bunch of worker threads, and the main thread is
> responsible for scheduling tasks onto them. The tasks usually yield
> to the main thread just before doing I/O, and the main thread uses
> select(2) (for now) to know which pending tasks to reschedule.
> IIUC, the main thread is also responsible for handling signals.
>
> I have a problem when running a dummy HTTP server that uses this
> library[2]: when sending it a SIGINT during the main loop, it should
> die immediately. Instead, it still waits for one more request and
> dies after answering it. I tried setting a breakpoint inside the
> main loop, where select(2) is blocking:
>
> $ egdb hurl.srv
> # [...]
> (gdb) run 127.0.0.1:8080
> Starting program: /home/user/.opam/miou/bin/hurl.srv 127.0.0.1:8080
> ^C[New thread 460857 of process 5725]
> [New thread 431086 of process 5725]
> [New thread 184835 of process 5725]
>
> Thread 2 received signal SIGINT, Interrupt.
> [Switching to thread 460857 of process 5725]
> futex () at /tmp/-:2
> warning: 2 /tmp/-: No such file or directory
>
> (gdb) info threads
> Id Target Id Frame
> 1 thread 338321 of process 5725 _thread_sys_select () at /tmp/-:2
> * 2 thread 460857 of process 5725 futex () at /tmp/-:2
> 3 thread 431086 of process 5725 futex () at /tmp/-:2
> 4 thread 184835 of process 5725 futex () at /tmp/-:2
>
> (gdb) thread apply 1 bt
>
> Thread 1 (thread 338321 of process 5725):
> #0 _thread_sys_select () at /tmp/-:2
> #1 0xc4987c91079e9fd8 in ?? ()
> #2 0x000004def4d99ce2 in _libc_select_cancel (nfds=8,
> readfds=0x78483c5bd4a0, writefds=0x78483c5bd520,
> exceptfds=0x4def4d8fb2b <_thread_sys_select+27>, timeout=0x0) at
> /usr/src/lib/libc/sys/w_select.c:28
> #3 0x000004dc1c6cc29d in caml_unix_select (readfds=5356133755072,
> writefds=1, exceptfds=1, timeout=<optimized out>) at select_unix.c:91
> #4 <signal handler called>
> #5 0x000004dc1c5874ab in camlMiou_unix.select_1434 ()
> #6 0x000004dc1c5931de in camlMiou.unblock_awaits_with_system_events_2039 ()
> #7 0x000004dc1c593583 in camlMiou.run_2061 ()
> #8 0x000004dc1c598ec5 in camlMiou.run_inner_5845 ()
> #9 0x000004dc1c5ed488 in camlCmdliner_term.fun_662 ()
> #10 0x000004dc1c5f19f9 in camlCmdliner_eval.run_parser_589 ()
> #11 0x000004dc1c5f2bed in camlCmdliner_eval.eval_value_inner_1728 ()
> #12 0x000004dc1c5f33d0 in camlCmdliner_eval.eval_1479 ()
> #13 0x000004dc1c43f99e in camlDune__exe__Srv.entry ()
> #14 0x000004dc1c437487 in caml_startup.code_begin ()
> #15 <signal handler called>
> #16 0x000004dc1c7121cc in caml_startup_common (argv=0x78483c5bd7c8,
> pooling=<optimized out>) at runtime/startup_nat.c:127
> #17 0x000004dc1c71227d in caml_startup_exn (argv=0x8) at
> runtime/startup_nat.c:134
> #18 caml_startup (argv=0x8) at runtime/startup_nat.c:139
> #19 caml_main (argv=0x8) at runtime/startup_nat.c:146
> #20 0x000004dc1c6f4f90 in main (argc=<optimized out>,
> argv=0x78483c5bd4a0) at runtime/main.c:37
>
> At startup, hurl.srv does a bunch of non-blocking calls to select(2)
> (timeout set to {0}) which, after a bit of gdb stepping, do not
> seem to switch the currently executing thread. It's on the last
> select(2) call (which is blocking, timeout set to NULL, since it's
> entering the main loop) that stepping past _thread_sys_select causes
> thread switching (in the gdb trace above: Id 2, a worker thread).
> pthreads(3) mentions that "Signal handlers are normally run on the
> stack of the currently executing thread", so my understanding is
> that my SIGINT isn't handled by the right thread (which should be
> the main one).
>
> I'm getting the same kind of behavior when using versions of this
> library that use poll(2), ppoll(2) or kqueue(2) instead of select(2):
> the main thread switches with a worker thread when blocking on the
> syscall. On Linux, the same gdb session (breaking inside the main
> loop) shows that the current thread is still the main thread (waiting
> on select(2)), and it behaves as I would expect (dies when receiving
> SIGINT).
>
> I'm having a hard time understanding what's happening. I've looked
> at src/lib/libc/sys/w_{poll,select}.c, maybe this has something to
> do with {ENTER,LEAVE}_CANCEL_POINT / DEF_CANCEL? Any insight or
> pointer would be greatly appreciated.
>
> Cheers,
>
> Léo
>
> [1] https://github.com/robur-coop/miou
> [2] https://github.com/robur-coop/hurl
>
Signals like SIGINT from hitting ^C are delivered to the process and then
a suitable thread is selected for the delivery. The selection takes a
thread that is interruptible and accepts the signal.
A thread that is sleeping in futex is as acceptable as one that sleeps in
select() in that regard.
If you want the signal to be only accepted by the main thread use
sigprocmask(2) to block the signal on all your worker threads.
--
:wq Claudio