On Mon Nov 15 23:23:12 EST 2010, lu...@proxima.alt.za wrote:
> Regarding the "deadlock" report that I occasionally see on my CPU
> server console, I won't bore anyone with PC addresses or anything like
> that, but I will recommend something I believe to be a possible
> trigger: the failure always seems to occur within "exportfs", which in
> this case is used exclusively to run stats(1) remotely from my
> workstation.  So the recommendation is that somebody like Erik, who is
> infinitely more clued up than I am in the kernel arcana should run one
> or more stats sessions into a cpu server (I happen to be running
> fossil, so maybe Erik won't see this) and see if he can also trigger this 
> behaviour.  I'm hoping that it is not platform specific.
> 
> Right now, I'm short of skills as well as a serial console :-(

i run stats all the time.  i've never seen a lock loop caused by stats.

exportfs gets blamed all the time for the sins of others.  possible
culprits are the tcp/ip stack and the kernel devices that stats accesses
and of course, the channel code itself.

it would be a good idea for you to track down all the pcs involved
and send them along.  i can't think of another way of narrowing down
the list of potential suspects.  not all of our usual suspects has an
alibi.

i assume you've fixed this?  (not yet fixed on sources.)

/n/sources/plan9//sys/src/9/port/chan.c:1012,1018 - chan.c:1012,1020
                                /*
                                 * mh->mount->to == c, so start at 
mh->mount->next
                                 */
+                               f = nil;
                                rlock(&mh->lock);
+                               if(mh->mount)
                                for(f = mh->mount->next; f; f = f->next)
                                        if((wq = ewalk(f->to, nil, names+nhave, 
ntry)) != nil)
                                                break;

- erik

Reply via email to