Dear all,

I am a computer science student at IIIT Hyderabad, India. I am interested in
contributing to OpenAFS and have applied in GSoC 2010 for OpenAFS. I think
this would be good starting point for me to work with the community. I have
also participated in GSoC 2009 with Globus Alliance as my mentoring
organization. Also I am working on a research project at my university to
improve read access and execution performance for DFS.

I am interested in Collaborative Caching Project listed on the ideas page.
The project proposal I have submitted is as follows:

The project aims at developing a system which would use collaborative
caching techniques to improve the read accesses in OpenAFS. This project is
based on two observations.

Firstly, in a cluster environment, a large number of clients need same
datasets to work on i.e. the data on which client nodes need to execute is
same for many other nodes on the network. Currently, each client contacts
the server individually to fetch the data. This increase load on the server
unnecessarily. If the size of the file is very large then the problem would
be highly magnified.

Second observation is that the local bandwidth are mostly fast and runs into
Gbps. In a cluster, many clients would share the same geography and thus
have fast interconnects between them. The server might be connected through
a slow network link. In this situation, accessing data from another client
would be much faster than accessing data from server itself.

Instead of each client contacting the server individually, a collaborative
caching technique can be employed. When a client contacts a server for
fetching some data, the subsequent requests for the data can be forwarded to
this client. This reduces load on server and also improves bandwidth usage
at the server side. It also leads to faster data access if the link between
the requesting client is weaker than that with other clients.

Initially, we can start with a fixed list of peers at the client. The client
would access only these clients present on this list for collaboration.
Next, we would allow functionality to discover the peers. This can be done
using the fileserver. The fileserver can be modified to keep the access logs
of the clients and if a client request for any data then its corresponding
clients in these logs can be returned to the requesting client. The access
controls are also needed here as to how a fileserver could authorise a
client to fetch data from another client. Then in OpenAFS systems, server
responds with a callback to the client if the file it is using has been
modified. We have to consider the situation if some client is accessing data
from some other client and this client receives a callback in midst of the
transfer. In this situation we could make the call that the client uses to
get the hash from the fileserver also establish a callback guarantee. So
that all of the clients would be notified by the fileserver, regardless of
where they got their data from.

I have received a reply from Mr. Jeffrey Altman asking me to contact the
community for refining the proposal. He has suggested that It would be
useful to discuss the internal workings of the AFS cache manager and CM-FS
interactions so that I can refine my proposal. Also, please suggest a
project that I can perform over the next few days to demonstrate my
abilities and get selected for OpenAFS.

Link to project proposal on GSoC portal:
http://socghop.appspot.com/gsoc/student_proposal/private/google/gsoc2010/shrutijain/t127083915309

Thank You

Best Regards,

Shruti

Reply via email to