On 4/17/2010 2:25 AM, shruti jain wrote:
> Here is what I know about the cache manager and its file server
> interactions.
> The Cache Manager// resides on the client side in openAFS environment
> and communicates with AFS file server on behalf of the application
> programs running on the client. When an AFS file is needed by any
> application program running on a client machine, the request is sent to
> the Cache Manager which in turn issues RPC calls to the file server
> storing the requested file.

This is true for any object (file, directory, mount point, symlink, ...)

AFS supports readonly replicas.  The CM is permitted to request copies
of the data from any of the replicas although at present, the CM only
reads from a single replica at a time.

>// When the Cache Manager receives the
> requested data from the file Server, it stores it in the cache and also
> delivers it to the application program which had initially requested for
> the data. In order to maintain cache consistency, server issues a
> callback along with the data. A callback is a promise by a File Server
> to a Cache Manager to inform any change in the data delivered by the
> File Server to the Cache Manager. If any other client on the network
> modifies the file then the file server breaks this callback and thus
> gives an indication to the Cache manager that its locally cached copy of
> the file is obsolete and needs to be updated.The callback mechanism
> ensures that the Cache Manager always requests the most up-to-date
> version of a file. In this way, cache manager also performs the
> responsibility of maintaining the cache.

You have the general idea.  Let me provide a few additional details.  In
the original (and currently deployed) implementation of callbacks, a
callback is a promise that the FS will notify the CM of a change for up
to S seconds with values for read/write data typically measured in
minutes and for read-only data typically measured in hours.  The number
of callback promises (or registrations) that a FS can maintain is
finite.  Callback registrations can therefore be canceled prematurely
without there being a change.

The callback notification (or invalidation) is delivered via an
unauthenticated RPC channel.  As a result, the notification cannot be
trusted by the CM and must be treated as meaning "a change might have
occurred, please verify if it matters".

The existing callback notification does not provide any hint as to the
type of change that might have occurred.  Callback notifications are
issued for many reasons including:

 . the data changed
 . the access control list changed
 . other metadata changed
 . the locking state changed
 . the volume in which the data is located is being replicated
   (aka released)
 . the object has been deleted
 . the FS ran out of room in the registration table

Once a notification is issued, the registration is broken and the
CM will receive no further notifications until it requests updated
status for the object in question.

The CM determines what has changed by issuing a FetchStatus RPC to
the FS and comparing the prior and current status fields.

Matt Benjamin has developed and implemented (but its not shipping yet)
an extended version of callback notifications that provide the CM with
additional details regarding the change.  When combined with an
authenticated callback channel this becomes a very powerful combination.

It is also important to discuss how the FS and CM track object data.
Each time a change to the data (not the metadata) occurs, a data version
(DV) number for the object is incremented.  When the CM issues a
StoreData rpc, it is returned updated status info.  If the DV was
incremented by one, then the CM knows that there was no race with
another CM and all of the data in the cache for that file is still
current.  If the DV increment was greater than one, then the CM knows
that the data it just wrote is current, but all other data is suspect.

When using the Extended Callback mechanism, the FS can issue a
notification that a StoreData occurred affecting {FileID, offset,
length} and the current DV is N without canceling the callback
registration.  This permits the CM to maintain the cache coherency at a
lower cost of network traffic when an object is actively being used.

However, when a CM starts or when an object has been idle for more than
a few minutes, there will be no callback registration.  In that
situation, a change could have occurred to the file data and the CM will
be forced to discard all of the cached data if a change did occur.
Unfortunately, there is no mechanism at present for the CM to ask the FS
"I need the chunk of data represented by {FileID, offset, length} but I
currently have data in that range with the following hash value.  Could
you confirm that my data is current or send me the correct data?"

I have been considering a proposal to implement such an RPC,
RXAFS_FetchDataWithHash(FID, offset, length, hash).  With such an RPC in
place, the CM can verify the contents of the cache and avoid large
amounts of unnecessary traffic.

I am raising this idea here because I believe it is very applicable to
your project.  The trust model in AFS is between the CM and the FS.
There is no trust between CMs.  As a result, if a CM obtains data from
another CM, it needs a low cost mechanism to validate it against the FS.

> So in this project, we need to modify the cache manager to enable
> interactions with other clients as well.
> In the first part of the project, where the cache manager contacts a
> fixed set of remote clients, it retrieves the file from any of these
> clients if their callback of the file is not broken. Since the callback
> is not broken, it is an indication that the file present on this remote
> client is most recent. In case no client has most recent copy of the
> file, we can contact the file server to retrieve the data.

That is one approach but not the one I would take.  If the cost of
reading the data from a local CM is so much cheaper than reading it from
the FS, the CM can read the data from the other CM (or at least get its
hash) and then verify it with the file server.

In most file operations, the entire file is not re-written.  Just
portions of it are and in the case of "append only files" such as log
files, the data never changes after it is written.  Re-fetching this
data from the FS every time the DV changes is extremely wasteful.  It is
much better to obtain it in the cheapest mechanism possible and then
verify it via a trusted means.

> In the second part of the project, we can allow discovery of peer
> clients for collaboration. This can be done by modifying the file server
> to keep access logs of the clients and if a client requests for any data
> then its corresponding clients in the logs would be returned to the
> requesting clients. In order to maintain cache consistency, the
> requesting client also establishes a callback guarantee from the file
> server so that it knows of the modifications in the file irrespective of
> where it has got the file from.

I would leave the FS out of the peer collaboration and instead permit
CMs that wish to offer data to do so via Bonjour.

> 
> I have seen the files afs_callback.c, cbqueue.c, dcache.c and server.c
> and think that these are some of the programs used in cache manager and
> server-cache manager interactions. Please correct me if I am wrong.

In terms of how I would like to see this project structured.  Before any
collaboration is implemented I would like to see a generic mechanism
added to the CM to permit use of a second level cache.  Then once than
mechanism is in place, a plug-in to that framework can be implemented
that supports obtaining data from the second level cache which happens
to be peer CMs.

The benefit of this approach is that the framework for the second level
cache can be implemented and incorporated into a future openafs release
without committing us to a particular implementation of the peer to peer
protocols.  Future research in peer to peer cache sharing can then take
place at a much lower cost.

Jeffrey Altman



Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to