What Derrick Said.  

You have to leave a packet capture running continuously on the off chance
that this might happen...  Not just the first error packet, but the last
couple of RPCs just before that.  So you really want
1.  a network monitor that implements stop triggers.  These used to be
rather expensive, but maybe ethereal finally implemented them?  I don't
know.
2.  a true broadcast network or the ability to tap your switch so you don't
have to run monitor software directly on the fileserver.  Running software
on the client is not likely to be useful unless you can reliably predict
which system will be affected.  

Wait a sec.  At this point, you're thinking you know which system will be
affected, it's this one at 192.168.18.34, right?  But what I'm saying is --
After you reboot that machine, and it comes back up and is running normally
for a while, which client will be next to experience this bug?  Is it always
the same one?  Even after reboots?  That is new, useful, and surprising
information.  

My experience was that the affected client would vary and not be
particularly reproducible, which means that you have to monitor a whole lot
of connections simultaneously, hence a tap on the switch.

Make sense?


-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Derrick J Brashear
Sent: Sunday, August 21, 2005 1:42 AM
To: [email protected]
Subject: Re: [OpenAFS-devel] "Lost contact with file server" problems


it needs to include the first error packet, e.g. the window where it loses 
contact, to be useful

once it's down, that's not interesting

Derrick


_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel

Reply via email to