Re: qtcreator compilation failure due to memory/disk corruption

Michael Kelly Wed, 25 Mar 2026 10:35:24 -0700

On 23/03/2026 01:01, Samuel Thibault wrote:

Hello,


Michael Kelly, le mar. 17 mars 2026 21:13:53 +0000, a ecrit:

I don't have enough knowledge of this to make conclusions without further
input. My guess would be that the assignment of EINTR on the client side in
this instance is wrong.

Indeed. As I would understand it, the interrupt call should make the
server carefully stop its operation, and have the opportunity to return
either EINTR or a short read/write. Then the client should be able to
receive that and return it.

Claude's suggestion of not calling abort_all_rpcs() in suspend() is just
papering over the real issue, which would definitely happen with signal
handling, anyway, so better really fix the issue than avoid it.
(and no, this issue cannot explain the corrupted haskell symbol tables,
since it's not about a repeated piece of data, the binary would be
completely bogus otherwise)

Thanks, Samuel, for the confirmation and thanks to Claude and Brent forvalidating my findings.

I think that once the RPC has made it to the server the overall resultof the RPC should be determined by the server and not by the client asis currently the case when a signal is about to be handled.

The strategy on the server side seems right to me already. The serveroperation must be terminated swiftly (either by completion or abortingearly) to minimise the delay before the client can handle thesignal. The most likely cause of delay is the server waiting a responsefrom making an RPC or system call of its own. Part of the signalhandling preparation is to send an interrupt_operation RPC to the serverwhose default implementation is to call hurd_thread_cancel() whichaborts all server RPCs in progress. Provided that the server codehandles RPC errors appropriately, it has the opportunity to correctsystem state (if necessary) before returning an appropriate RPC reply tothe client. There doesn't seem to be a method for interrupting normaluser code within the server but provided that the operation isrelatively fast it can simply complete and return its reply to theclient before the signal handling is progressed. It therefore isnecessary for the signal handling code to not only wait for the serverreply but to make that reply available to the client once the signalhandling is completed. Although the code does wait for the server replycurrently it does not preserve that reply for the client.

I have prototyped an alteration to glibc/hurd/hurdsig:abort_all_rpcs().After the 'interrupt operation' has been sent to the server the codeawaits a reply to the RPC that is being interrupted. Currently the codereceives the reply with an undersized message header presumably just toconfirm that the operation is complete. The actual reply is thendiscarded. I instead supplied the mach_message_header_t that wassupplied to the original RPC call in _hurd_intr_rpc_mach_msg() with itsassociated rcv_size. These can be obtained from the thread state inregisters rdi and r10. The actual return code from the server can bestored in SYSRETURN. In effect, changing:


       mach_msg_header_t head;

err = __mach_msg (&head, MACH_RCV_MSG|MACH_RCV_TIMEOUT, 0,sizeof head,

                          reply_ports[nthreads],
                          _hurd_interrupted_rpc_timeout, MACH_PORT_NULL);
to:

        mach_msg_header_t* head = (mach_msg_header_t*)state->basic.rdi;
        mach_msg_size_t rcv_size = (mach_msg_size_t)state->basic.r10;

        err = __mach_msg (head, MACH_RCV_MSG|MACH_RCV_TIMEOUT, 0, rcv_size,
                                  reply_ports[nthreads],

_hurd_interrupted_rpc_timeout,MACH_PORT_NULL);


        state->basic.SYSRETURN = err;
        state_changed = 1;

I was able to run the test case (calls to write() with simultaneousSIGSTOP/SIGCONT) successfully with this change and some minorrearrangement of the code. This is only a partial solution as there areseveral places where EINTR is potentially returned to the clientinappropriately. The ability to return the actual server reply to thesuspended thread was the main part I was uncertain about succeeding soI'd be more confident now about providing a complete implementation ifit is considered the right way to go so please advise. I'll probablyneed some guidance with the appropriate behaviour under other failureconditions, for example, if the interrupt operation cannot be delivered.Those can be considered later if the overall approach is valid.


Cheers,

Mike.

Re: qtcreator compilation failure due to memory/disk corruption

Reply via email to