[EMAIL PROTECTED] wrote on Fri, 23 Feb 2007 14:38 -0600: > The current implementation (a) tracks what is free and (b) what is > recently used, (c) lets the server choose the handle to return, and (d) > keeps a global handle space (a handle is unique across all servers).
No disputes with these properties and most of what they buy us. > walt's idea seems to allow us to map a collection of objects (a > "segment") to a given server, then a client could pick values in that > segment. my feeling is that this hamstrings our ability to move objects > around, because we would then need to move entire segments around, as at > the very least it could take a very long time to reach a consistent > state again (think of many large objects needing to be moved; how do > clients know where to contact?). this idea is a generalization of pete's > idea to have a server id be part of the object handle; pete's approach > makes it impossible to migrate without changing file metadata. more on > this below. Actually we may do someting like this for the OSD implementation. Consider your bottom 64-bits in the new 128-bit handle ID are server-assigned, or random, whatever. The top are chosen by the config manager, call it segment if you like. At file system create time, servers A, B, C are assigned ranges 1.1-9, 2.1-9, 3.1-9, respectively. Each has its own 64-bit space underneath a particular server ID (where "9" is the biggest number that fits in 64 bits). Now add a new server D and rebalance by moving objects off the existing servers and onto D, trying to keep the handle ID space contiguous. The new handle map might be: 1.1-6 A 1.7-9 D 2.1-6 B 2.7-9 D 3.1-6 C 3.7-9 D 4.1-9 A 5.1-9 B 6.1-9 C 7.1-9 D To create a new datafile in A, the client specifies handle range 4.1-9. I.e., the entire 64-bit space with upper bits 4. The old "1" space is closed to new additions. You only move the part of the space you want to for performance reasons. Similarly to consolidate you break up an existing space on the server that is going away and assign contiguous parts to the remaining servers. One issue is that collision may happen during consolidation in the lower 64-bit space, which the OSD will not like, at which point you need to do a metafile remapping. Not sure if this kills the entire algorithm from a PVFS point of view. The size of the handle map grows at about the same rate as our current fixed-range map, as servers are added and removed. > i don't understand why it is difficult to get a value in a particular > range in the OSD work. can you clarify this pete? can't you just "guess" > a value in the range until you get one? Certainly one can guess, and retry until it works out. We'll likely not be able to do the recently-used list on OSDs, but could do lazy algorithms to hide old handles on the OSD until a cleaner comes through and wipes them based on timestamp. > one thing that we could discuss is the relative merit of migration using > this sort of approach. maybe in fact this idea that i have that we want > to keep a FS-wide object handle space is flawed, that changing file > metadata can be addressed in a reasonable way that simplifies the > overall system, allows for migration, and doesn't have a negative impact > on our caching of metadata. This may in fact be something we will be forced to give up as things get too large. The size of the handle map is fairly small when you start with N ranges, one for each server in your new FS. If someone actually adds/removes servers with some regularity, and moves existing handles to balance load, either in size or utilization sense, that map will fragment until it becomes too painful for each client to search to do the handle -> server lookup. The only way out is to rewrite metafile entries. I'm not worried in the short term though. > overall i think that changing how we reference objects, with the > exception of perhaps redoing how we keep up with free/recently-freed > objects, is something that should perhaps wait until we have > server-to-server working. we're likely to want to make some changes at > that point anyway, once the system has more control over the > construction of files and directories. maybe we can discuss how we'd > like things to work in that context and concentrate on getting there, > rather than torquing things now and then perhaps messing with things again? Of course, I want to make sure everything works without server-to-server communication, as that is 1) not possible with OSDs, and 2) limits server scalability if they always have to talk to each other. Don't even start with server-collective algorithms. :) Nor will I be motivated to make big changes to the handle map without some motivating problem to fix. -- Pete _______________________________________________ Pvfs2-developers mailing list Pvfs2-developers@beowulf-underground.org http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers