Jonathan Briggs writes: > On Tue, 2005-05-31 at 12:30 -0400, [EMAIL PROTECTED] wrote: > > On Tue, 31 May 2005 08:04:42 PDT, Hans Reiser said: > > > > > >Cycle may consists of more graph nodes than fits into memory. > > > > > > > There are pathname length restrictions already in the kernel that should > > > prevent that, yes? > > > > The problem is that although a *single* pathname can't be longer than some > > length, you can still create a cycle. Consider for instance a pathname > > restriction > > of 1024 chars. Filenames A, B, and C are all 400 characters long. A > > points at B, > > B points at C - and C points back to A. > > > > Also, although the set of inodes *in the cycle* fits in memory, the set of > > inodes *in the entire graph* that has to be searched to verify the > > presence of > > a cycle may not (in general, you have to be ready to examine *all* the > > inodes > > unless you can do some pruning (unallocated, provably un-cycleable, and so > > on)). THis is the sort of thing that you can afford to do in userspace > > during > > an fsck, but certainly can't do in the kernel on every syscall that might > > create a cycle... > > You can avoid cycles by redefining the problem. > > Every file or "data object" has one single True Name which is their > inode or OID. Each data object then has one or more "names" as > properties. Names are either single strings with slash separators for > directories, or each directory element is a unique object in an object > list. Directories then become queries that return the set of objects > holding that directory name. The query results are of course cached and > updated whenever a name property changes. > > Now there are no cycles, although a naive Unix "find" program could get > stuck in a loop.
Huh? Cycles are still here. Query D0 returns D1, query D1 returns D2, ... query DN returns D0. The problem is not in the mechanism used to encode tree/graph structure. The problem is in the limitations imposed by required semantics: (R) every object except some selected root is Reachable. (No leaks.) (G) unused objects are sooner or later discarded. (Garbage collection.) Neither requirement is compatible with cycles in the directory structure: - from (R) it follows that object can be discarded only if it empty (as a directory). All nodes in a cycle are not empty (because each of them contains at least a reference to the next one), and hence none of them can be ever removed; - if garbage collection is implemented through the reference counting (which is the only known way tractable for a file system), then cycles are never collected. Unless you are talking about a two-level naming scheme, where "One True Names" are visible to the user. In that case reachability problem evaporates, because manipulations with normal directory structure never make node unreachable---it is always accessible through its True Name. But the garbage collection problem is still there. You are more than welcome to solve it by implementing generation mark-and-sweep GC on file system scale. :-) Nikita.