The combination of problems that you have experienced should
be solved in 1.4.1.  One of the issues that you were seeing is
that the client was contacting the server on a port other than
7001 and the server was attempting to break callbacks on port
7001.  Since the NAT doesn't have a port mapping from 7001 to
the client, the callbacks could not be broken.  Every time the
client would contact the server, the server would believe that
it had callbacks for the client that must be broken and would
block the incoming RPC until the callbacks could be broken.

The 1.3.77 client also has a serious bug that would cause it
to generate rapid fire requests using a new RX Connection for
each RPC.  If you have 1.3.77 still deployed, try your best to
upgrade them.

The 1.4.1 file server (to be announced real soon now) goes to
great lengths to track clients by both address and port number
and to deal with clients behind NATs so that each time the NAT
allocates a new port number to the client the relevant host
entry will be updated to track it.  This should provide a very
good NAT experience for end users that have AFS clients that
support UUIDs.  All of the OpenAFS clients for UNIX/Linux support
UUIDs and Windows clients 1.3.80 and later do.

Jeffrey Altman


John W. Sopko Jr. wrote:
> We have 3 OpenAFS 1.4.0 files ervers running on Redhat linux
> enterprixe 3 with the latest patches. This morning when I
> came in the servers were very slow and not responding to
> client requests, they were basically hung. This in turn
> pretty much takes down all our web servers file services
> for home dirs etc.
> 
> I tracked this down to a "bad" afs windows client, the client
> was running an old 1.3.77 version of the client or may have
> a mis configured firewall. I halted the "bad" client
> and this fixed our server problems. I turned up
> debugging on the file server (kill -TSTP) and got the below
> messages I used to track this down. I searched the afs-info
> archives and this problem was discussed in 2002 and was
> supposed to get fixed. Is this
> fixed in a version newer then 1.4.0? That is, not allowing
> clients to bring down the server with bad callbacks. Thanks
> for your input.
> 
> Tue Apr 18 10:20:19 2006 CB: RCallBackConnectBack failed for
> 152.2.128.182:7001
> Tue Apr 18 10:22:27 2006 [12] CB: Call back connect back failed (in
> break delayed) for 152.2.128.182:7001
> Tue Apr 18 10:22:27 2006 [12] BreakDelayedCallbacks FAILED for host
> 152.2.128.182 which IS UP.  Possible network or routing failure.
> Tue Apr 18 10:22:27 2006 [12] MultiProbe failed to find new address for
> host 152.2.128.182:7001
> Tue Apr 18 10:24:34 2006 [7] CB: WhoAreYou failed for
> 152.2.128.182:7001, error -03
> Tue Apr 18 10:26:42 2006 [7] CB: Call back connect back failed (in break
> delayed) for 152.2.128.182:7001
> Tue Apr 18 10:26:42 2006 [7] BreakDelayedCallbacks FAILED for host
> 152.2.128.182 which IS UP.  Possible network or routing failure.
> 
> Here is the old post about this:
> 
> --------------------------------------------
> From [EMAIL PROTECTED]  Tue Aug 27 12:13:13 2002
> Date: Tue, 27 Aug 2002 18:12:59 +0200
> From: FBO <[EMAIL PROTECTED]>
> To: OpenAFS-info@openafs.org
> 
>               432936,1      22%
> X-BeenThere: openafs-info@openafs.org
> X-Mailman-Version: 2.0.4
> Precedence: bulk
> List-Help: <mailto:[EMAIL PROTECTED]>
> List-Post: <mailto:openafs-info@openafs.org>
> List-Subscribe: <https://lists.openafs.org/mailman/listinfo/openafs-info>,
>         <mailto:[EMAIL PROTECTED]>
> List-Id: OpenAFS Info/Discussion <openafs-info.openafs.org>
> List-Unsubscribe:
> <https://lists.openafs.org/mailman/listinfo/openafs-info>,
>         <mailto:[EMAIL PROTECTED]>
> List-Archive: <https://lists.openafs.org/pipermail/openafs-info/>
> 
> Hello,
> 
> We (Solaris 8, Transarc 3.6 2.32 servers, 3.6 2.26 db servers) had an
> issue where a client with a certain firewall (Zone Alarm and or Black
> Ice) configuration (allowing AFS traffic out but no AFS traffic in, or
> more precisely, it didn't allow any _uninitiated_ inbound AFS traffic
> e.g. a fileserver callback) caused the fileserver (a couple actually) to
> come to a crawl (reads/writes taking 10minutes or more to complete) and
> become virtually unusable.  Had to end up blocking this firewall'ed
> client machine to get fileservers back to normal.  During "outage"
> FileLog would repeat following message sequence every minute:
> 
> Wed Jul 10 16:22:55 2002 BreakDelayedCallbacks FAILED for host 894f2528
> which IS UP.  Possible network or routing failure.
> Wed Jul 10 16:22:55 2002 MultiProbe failed to find new address for
> host894f2528.7001
> Wed Jul 10 16:23:51 2002 CB: Call back connect back failed (in break
> delayed) for 894f2528.7001
> 
> We have not been able to duplicate the problem but we've experienced it
> 2 to 3 times within about 3 months.
> 
> Below is the explanation I got from Transarc. They've informed us that a
> fix is en route.  Has anybody ever experienced this in openafs (or
> anywhere)?
> 
> 
> 
> 
> 

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to