The file data would be located based on its GFID, so before the *first*
lookup/stat for a file, there is no way to know it's GFID.
NOTE: Instead of a name hash the GFID hash is used, to get immunity
against renames and the like, as a name hash could change the location
information for the file (among other reasons).

Another manner of achieving the same when the GFID of the file is known (from a readdir) is to wind the lookup and read of size to the respective MDS and DS, where the lookup would be responded to once the MDS responds, and the DS response is cached for the subsequent open+read case. So on the wire we would have a fan out of 2 FOPs, but still satisfy the quick read requirements.

Tar kind of workload doesn't have a problem because we know the gfid after readdirp.


I would assume the above resolves the problem posted, are there cases where we do not know the GFID of the file? i.e no readdir performed and client knows the file name that it wants to operate on? Do we have traces of the webserver workload to see if it generates names on the fly or does a readdir prior to that?

Problem is with workloads which know the files that need to be read without readdir, like hyperlinks (webserver), swift objects etc. These are two I know of which will have this problem, which can't be improved because we don't have metadata, data co-located. I have been trying to think of a solution for past few days. Nothing good is coming up :-/

Pranith
_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Reply via email to