On 15/03/2026 17:59, Samuel Thibault wrote:
Hello,
Michael Kelly, le dim. 15 mars 2026 17:38:36 +0000, a ecrit:
SIGSTOP and SIGCONT are a bit special
Yes, they are: they don't involve signal handlers and context
save/restore.
glibc/hurd/hurdsig.c's suspend() however uses abort_all_rpcs, possibly
the RPCs used by ar/tar are not properly restarted.
I've had a look at this problem and have a partial understanding of why
interrupting an RPC with SIGSTOP can lead to file system inconsistency.
Client side:
1) A call to write() in thread1 results in an IO_write RPC call to
ext2fs server in the example described earlier in the thread. The
generated code for the RPC leads to _hurd_intr_rpc_mach_msg to make the
system call to mach_msg_trap() and awaits a reply from the server.
2) Client processes received SIGSTOP in the signal thread which suspends
all other threads in the task then calls abort_all_rpcs(). Thread1 is
the only thread with an outstanding RPC and _hurdsig_abort_rpcs() finds
the RPC was at the MACH_RCV_INTERRUPTED stage, successfully sends the
interrupt_operation and determines the interrupted reply port.
Consequently that thread's SYSRETURN is modified to EINTR.
3) When the thread is later resumed (after SIGCONT) INTR_MSG_TRAP
returns EINTR which results in a retry of the IO_write RPC.
Server side (ext2fs):
1) libdiskfs/io-write.c (diskfs_S_io_write) handles the IO_write
request. In this example case, all calls use the -1 offset which appends
the data to the file and increments the filepointer accordingly.
2) Receipt of the interrupt_operation message calls hurd_thread_cancel()
with the id of the thread handling the IO_write request. That thread is
suspended, no RPCs are active in that thread and the thread is then
resumed. Nothing changes except that thread's 'hurd_sigstate->cancel'
flag is set. Once diskfs_S_io_write is entered it always runs to
completion and filepointer is always incremented.
I don't understand how this can work. I expected a call to write() that
returns EINTR to be guaranteed to have written 0 bytes but that is not
the case here. The second attempt at making the RPC call (after
INTR_MSG_TRAP has returned EINTR) does not start the write at the same
file position as the first one because the filepointer has been
incremented regardless.
I don't understand what hurd_thread_cancel() intends for a server thread
like in this case. diskfs_S_io_write() must always complete once
entered. The only way it couldn't would be for it to check the thread
cancellation state at suitable times?
I don't have enough knowledge of this to make conclusions without
further input. My guess would be that the assignment of EINTR on the
client side in this instance is wrong.
Cheers,
Mike.