On 1/23/2014 11:53 AM, Andrew Deason wrote: > On Thu, 23 Jan 2014 11:43:50 +0000 > Simon Wilkinson <simonxwilkin...@gmail.com> wrote: > >> The real question here is how widely we should be applying the abort >> threshold - should it apply to all aborts sent to a particular client, >> or should we be more selective? There are a lot of competing views >> here, as it depends on what you believe the abort threshold is >> actually protecting us against.
Agreed. Personally I'm not in favor of abort thresholds but they have prevented a large number of file servers from drowning under the load applied by very broken clients in the wild. Back in the 1.2.x days there were clients that were broken in some extreme ways. For example a client that produced a new connection for every RPC triggering a new TMAY and tying up a thread for a bit. Another was a problem with users that obtained tokens in the morning and left file browser open to their home directory. When the tokens expired the file browser would read status for every file in the directory tree it knew about and fail. Then repeat in rapid succession. Take an org with a few hundred such desktops triggering the same behavior ten hours after the work day starts and the file servers would fall over each night. Lest you say that EEXIST and ENOTEMPTY are different. Well, we have seen cache corruption that makes the cache manager think the file does exist when it does. This has triggered the application to retry creation of the file over and over in a loop. The primary purpose of the abort threshold is to protect the file server from an abusive client whether or not the client intends to be abusive. > > Well, the issue also goes away if we do either of two other things that > are advantageous for other reasons: > > - Don't issue RPCs we know will fail (mkdir EEXIST, rmdir ENOTEMPTY, > maybe others). Even without the abort threshold, this causes an > unnecessary delay waiting for the fileserver to respond. This is > really noticeable with git in AFS. I am firmly in the camp that says the cache manager should avoid sending any RPC it can assume with good justification will fail based upon known state and file server rules. The EEXIST and ENOTEMPTY certainly fall into that case. So do things like create file, create directory, unlink, etc when it is known the permissions are wrong. The same for writing to a file when it is known that quota has been exceeded. The Windows cache manager even takes things a step further by maintaining a negative cache for EACCESS errors on {FID, user}. This has avoided hitting the abort threshold limits triggered by Windows that assumes that if it can list a directory it must be able to read the status of all the objects within it. > - Don't raise aborts for every single kind of error (like > InlineBulkStatus). Aborts aren't secure, for one thing. InlineBulkStatus can throw aborts. It just doesn't do so when an individual FetchStatus on an included FID fails. The problem with this approach is that OpenAFS cannot turn off the existing RPC support and will always need abort thresholds to protect the file servers. At least until it becomes possible for a file server to scale with the load so that a pool of misbehaving clients cannot deny service to other clients. Jeffrey Altman
smime.p7s
Description: S/MIME Cryptographic Signature