Jonathan Briggs writes:
 > On Tue, 2005-05-31 at 12:30 -0400, [EMAIL PROTECTED] wrote:
 > > On Tue, 31 May 2005 08:04:42 PDT, Hans Reiser said:
 > > 
 > > > >Cycle may consists of more graph nodes than fits into memory. 
 > > > >
 > > > There are pathname length restrictions already in the kernel that should
 > > > prevent that, yes?
 > > 
 > > The problem is that although a *single* pathname can't be longer than some
 > > length, you can still create a cycle.  Consider for instance a pathname 
 > > restriction
 > > of 1024 chars.  Filenames A, B, and C are all 400 characters long.  A 
 > > points at B,
 > > B points at C - and C points back to A.
 > > 
 > > Also, although the set of inodes *in the cycle* fits in memory, the set of
 > > inodes *in the entire graph* that has to be searched to verify the 
 > > presence of
 > > a cycle may not (in general, you have to be ready to examine *all* the 
 > > inodes
 > > unless you can do some pruning (unallocated, provably un-cycleable, and so
 > > on)).  THis is the sort of thing that you can afford to do in userspace 
 > > during
 > > an fsck, but certainly can't do in the kernel on every syscall that might
 > > create a cycle...
 > 
 > You can avoid cycles by redefining the problem.
 > 
 > Every file or "data object" has one single True Name which is their
 > inode or OID.  Each data object then has one or more "names" as
 > properties.  Names are either single strings with slash separators for
 > directories, or each directory element is a unique object in an object
 > list.  Directories then become queries that return the set of objects
 > holding that directory name.  The query results are of course cached and
 > updated whenever a name property changes.
 > 
 > Now there are no cycles, although a naive Unix "find" program could get
 > stuck in a loop.

Huh? Cycles are still here.

Query D0 returns D1, query D1 returns D2, ... query DN returns D0. The
problem is not in the mechanism used to encode tree/graph structure. The
problem is in the limitations imposed by required semantics:

   (R) every object except some selected root is Reachable. (No leaks.)

   (G) unused objects are sooner or later discarded. (Garbage
   collection.)

Neither requirement is compatible with cycles in the directory
structure:

 - from (R) it follows that object can be discarded only if it empty (as
 a directory). All nodes in a cycle are not empty (because each of them
 contains at least a reference to the next one), and hence none of them
 can be ever removed;

 - if garbage collection is implemented through the reference counting
 (which is the only known way tractable for a file system), then cycles
 are never collected.

Unless you are talking about a two-level naming scheme, where "One True
Names" are visible to the user. In that case reachability problem
evaporates, because manipulations with normal directory structure never
make node unreachable---it is always accessible through its True
Name.

But the garbage collection problem is still there. You are more than
welcome to solve it by implementing generation mark-and-sweep GC on file
system scale. :-)

Nikita.

Reply via email to