On 15/03/2026 17:59, Samuel Thibault wrote:
Hello,

Michael Kelly, le dim. 15 mars 2026 17:38:36 +0000, a ecrit:
SIGSTOP and SIGCONT are a bit special
Yes, they are: they don't involve signal handlers and context
save/restore.

glibc/hurd/hurdsig.c's suspend() however uses abort_all_rpcs, possibly
the RPCs used by ar/tar are not properly restarted.

I've had a look at this problem and have a partial understanding of why interrupting an RPC with SIGSTOP can lead to file system inconsistency.

Client side:

1) A call to write() in thread1 results in an IO_write RPC call to ext2fs server in the example described earlier in the thread. The generated code for the RPC leads to _hurd_intr_rpc_mach_msg to make the system call to mach_msg_trap() and awaits a reply from the server.

2) Client processes received SIGSTOP in the signal thread which suspends all other threads in the task then calls abort_all_rpcs(). Thread1 is the only thread with an outstanding RPC and _hurdsig_abort_rpcs() finds the RPC was at the MACH_RCV_INTERRUPTED stage, successfully sends the interrupt_operation and determines the interrupted reply port. Consequently that thread's SYSRETURN is modified to EINTR.

3) When the thread is later resumed (after SIGCONT) INTR_MSG_TRAP returns EINTR which results in a retry of the IO_write RPC.

Server side (ext2fs):

1) libdiskfs/io-write.c (diskfs_S_io_write) handles the IO_write request. In this example case, all calls use the -1 offset which appends the data to the file and increments the filepointer accordingly.

2) Receipt of the interrupt_operation message calls hurd_thread_cancel() with the id of the thread handling the IO_write request.  That thread is suspended, no RPCs are active in that thread and the thread is then resumed. Nothing changes except that thread's 'hurd_sigstate->cancel' flag is set. Once diskfs_S_io_write is entered it always runs to completion and filepointer is always incremented.

I don't understand how this can work. I expected a call to write() that returns EINTR to be guaranteed to have written 0 bytes but that is not the case here. The second attempt at making the RPC call (after INTR_MSG_TRAP has returned EINTR) does not start the write at the same file position as the first one because the filepointer has been incremented regardless.

I don't understand what hurd_thread_cancel() intends for a server thread like in this case. diskfs_S_io_write() must always complete once entered. The only way it couldn't would be for it to check the thread cancellation state at suitable times?

I don't have enough knowledge of this to make conclusions without further input. My guess would be that the assignment of EINTR on the client side in this instance is wrong.

Cheers,

Mike.


Reply via email to