Hi Flávio :) Quoting Flávio Cruz (2015-08-26 01:46:16) > Next, I started the bootstrapping process to compile SBCL but it would always > fail with a kernel panic since Mach was unable to allocate more slabs for the > ipc_port data structure. I figured out that the kernel had thousands of > inactive ports (which had a single reference and one send-only right).
A very interesting find :) > and the number of inactive ports in the kernel will increase (never > reclaimed), > even when the CLISP process is killed. I've also attempted to use a separate > partition and the ports would still hang around after the ext2fs translator > was > terminated. Indeed we have suspected that there is a bug like this for some time now. > After looking around and using KDB, I've figured out that the following loop > in > kern/exceptions.c/exception_raise_continue_slow was the culprit: Could you elaborate a little on your method? > while (mr == MACH_RCV_INTERRUPTED) { > /* > * Somebody is trying to force this thread > * to a clean point. We must cooperate > * and then resume the receive. > */ > > while (thread_should_halt(self)) { > /* don't terminate while holding a reference */ > if (self->ast & AST_TERMINATE) > ipc_port_release(reply_port); > thread_halt_self(); > } > > ip_lock(reply_port); > .... > } > > The reply port of the target thread (from CLISP?) is not being released since > it enters the while(thread_should_halt(self)) loop and the thread terminates > inside thread_halt_self but the previous condition does not hold, which fails > to release the port. I also think that once the code enters thread_halt_self, > it never comes back since the stack is discarded (for both if cases inside the > function). While I agree that the function thread_halt_self does not return, it only terminates the thread if the AST_TERMINATE flag is set. Otherwise, it returns to userspace. > I've changed the code to make sure the port is released when > thread_should_halt > is true. So far, the kernel works great and I was finally able to complete the > second bootstrapping stage for SBCL. Please see the attached patch and let me > know what you think. Fwiw, I prefer inlined patches so I can reply easier ;) > diff --git a/kern/exception.c b/kern/exception.c > index 6cb3bfb..c68c5cb 100644 > --- a/kern/exception.c > +++ b/kern/exception.c > @@ -875,12 +875,10 @@ exception_raise_continue_slow( > * and then resume the receive. > */ > > - while (thread_should_halt(self)) { > - /* don't terminate while holding a reference */ > - if (self->ast & AST_TERMINATE) Unless I'm missing something, you're basically removing this conditional. For me, it is not obvious that this is correct. B/c now we've given up our reference, return to userspace, and userspace is likely re-trying the same again. So when it does, isn't the reference missing now? > - ipc_port_release(reply_port); > + if (thread_should_halt(self)) > + ipc_port_release(reply_port); > + while (thread_should_halt(self)) > thread_halt_self(); > - } I find the use of `while' dodgy in the first place b/c if thread_halt_self doesn't return, an if would do the trick as well and don't imply the chance of that code looping... Cheers, Justus