Re: File as a directory - VFS Changes

2005-06-06 Thread Hans Reiser
Nikita Danilov wrote:

Hans Reiser writes:
  Nikita Danilov wrote:
  
  
  But cycles are solvable in current file systems too: they simply do
  not exist there.

  
  Yes, but Nikita, cycles represent semantic functionality that has value
  because being able to embody more expressions means more power of

If you mean that multiple parents have some value, I agree. Problem is
that solutions proposed so far have severe limitation:

 - they add support for cycle detection that is necessary to support
 multiple parents, but that support is only efficient for small
 datasets: when total number of objects is not very big, and average
 object has only one parent.

 - even when there are no multiple parents, system is not efficient for
 large number of files.
  

Can you say this in more detail?

  expression.  If some way can be found to allow them, then functionality
  is increased. Separating links that increase reference count from links
  that merely point (ala hard vs. sym links) is one approach.  If there
  was effective enough for real world use cycle detection, that would be
  better.

It seems to me that in the domains where proposed designs are
applicable, symlinks already provide viable solution.
  

I have been thinking that disabling hard links for filedirectories might
be an acceptable solution for reiser4 if cycles are a deeper problem
than I currently appreciate.  We can then allow people to turn off one
of either filedirectories or hard links.  I would prefer solving the
cycles problem though

Nikita.


  




Re: File as a directory - VFS Changes

2005-06-04 Thread Nikita Danilov
Hans Reiser writes:
  Nikita Danilov wrote:
  
  
  But cycles are solvable in current file systems too: they simply do
  not exist there.

  
  Yes, but Nikita, cycles represent semantic functionality that has value
  because being able to embody more expressions means more power of

If you mean that multiple parents have some value, I agree. Problem is
that solutions proposed so far have severe limitation:

 - they add support for cycle detection that is necessary to support
 multiple parents, but that support is only efficient for small
 datasets: when total number of objects is not very big, and average
 object has only one parent.

 - even when there are no multiple parents, system is not efficient for
 large number of files.

  expression.  If some way can be found to allow them, then functionality
  is increased. Separating links that increase reference count from links
  that merely point (ala hard vs. sym links) is one approach.  If there
  was effective enough for real world use cycle detection, that would be
  better.

It seems to me that in the domains where proposed designs are
applicable, symlinks already provide viable solution.

Nikita.


Re: File as a directory - VFS Changes

2005-06-03 Thread Faraz Ahmed
Hi;
  Why is this discussion revoling around Relational
Databases. The attributes of the files and files themselves, if were to be
modelled for querying a Realtional Database would really s**k.  The
attribute info is neither structured, nor is it unstructured, its
SEMI-STRUCTURED. Exceuting  Structured Query Lang(Sql) over semistrutured
data would result in
- Harder modelling (almost a waste of effort),
- Complex Quering (Eleganant system of no use because of the amout of joins
that would result in Quering , if you somehow model semi-structured data in
some structured Data Model);
The best option, to start would be with best COT. I feel
we should look at Loreal a stanford project. For hints about modelling our
whatever.

Regards
Faraz :)


- Original Message - 
From: Nikita Danilov [EMAIL PROTECTED]
To: Jonathan Briggs [EMAIL PROTECTED]
Cc: Hans Reiser [EMAIL PROTECTED]; [EMAIL PROTECTED];
Alexander G. M. Smith [EMAIL PROTECTED]; [EMAIL PROTECTED];
reiserfs-list@namesys.com; [EMAIL PROTECTED]; Nate Diller
[EMAIL PROTECTED]
Sent: Thursday, June 02, 2005 4:54 PM
Subject: Re: File as a directory - VFS Changes


 Jonathan Briggs writes:
   On Thu, 2005-06-02 at 14:38 +0400, Nikita Danilov wrote:
Jonathan Briggs writes:
  On Wed, 2005-06-01 at 21:27 +0400, Nikita Danilov wrote:
  [snip]
   Frankly speaking, I suspect that name-as-attribute is going to
limit
   usability of file system significantly.
  
   Usability as in features?  Or usability as in performance?

 Usability as in ease of use.

 [...]

  
   A index is an arrangement of information about the indexed items.  The
   index contents *belong* to the items.  An index by name?  That name
   belongs to the item.  An index by date?  Those dates are properties of

 In the flat world of relation databases, maybe. But almost nowhere else
 improper name is an attribute of its signified: variable is not an
 attribute of object it points to, URL is not an attribute of the web
 page, block number is not an attribute of data stored in that block on
 the disk, etc.

 [...]

  
   In the same way that you can descend a directory tree and copy the
names
   found into each item, you can check each item and copy the names found
   into a directory tree.

 Except that as was already discussed resulting directory tree is _bound_
 to be inconsistent with real names.

  
   
Indices cannot be reduced to real names (as rename is impossible to
implement efficiently), but real names can very well be reduced to
indices as exemplified by each and every UNIX file system out there.
   
So, the question is: what real names buy one, that indices do not?
  
   By storing the names in the items, cycles become solvable because you
   can always look at the current directory's name(s) to see where you
   really are.  Every name becomes absolutely connected to the top of the
   namespace instead of depending on a parent pointer that may not ever
   connect to the top.

 But cycles are solvable in current file systems too: they simply do
 not exist there.

  
   If speeding up rename was very important, you can replace every
pathname
   component with a indirect reference instead of using simple strings.
   Changing directory levels is still difficult.

 It is not only speed that will be extremely hard to achieve in that
 design; atomicity (in the face of possible crash during rename), and
 concurrency control look problematic too.

  
   -- 
   Jonathan Briggs [EMAIL PROTECTED]
   eSoft, Inc.

 Nikita.




Re: File as a directory - VFS Changes

2005-06-03 Thread Hans Reiser
Nikita Danilov wrote:


But cycles are solvable in current file systems too: they simply do
not exist there.
  

Yes, but Nikita, cycles represent semantic functionality that has value
because being able to embody more expressions means more power of
expression.  If some way can be found to allow them, then functionality
is increased. Separating links that increase reference count from links
that merely point (ala hard vs. sym links) is one approach.  If there
was effective enough for real world use cycle detection, that would be
better.

  
  If speeding up rename was very important, you can replace every pathname
  component with a indirect reference instead of using simple strings.
  Changing directory levels is still difficult.

It is not only speed that will be extremely hard to achieve in that
design; atomicity (in the face of possible crash during rename), and
concurrency control look problematic too.

  
  -- 
  Jonathan Briggs [EMAIL PROTECTED]
  eSoft, Inc.

Nikita.


  




Re: File as a directory - VFS Changes

2005-06-02 Thread Hans Reiser
Alexander G. M. Smith wrote:

Hans Reiser wrote on Tue, 31 May 2005 11:32:04 -0700:
  

What about if we have it that only the first name a directory is created
with counts towards its reference count, and that if the directory is
moved if it is moved from its first name, the new name becomes the one
that counts towards the reference count?   A bit of a hack, but would work.



Sounds a lot like what I did earlier.  Files got really deleted when the
true name was the only name for a file (only one parent in other words).
But I also had a large cycle finding pause when any file movement happened.
I'm not sure if it would still be needed.

Nikita Danilov wrote:
  

- if garbage collection is implemented through the reference counting
(which is the only known way tractable for a file system), then cycles
are never collected.
[...]
But the garbage collection problem is still there. You are more than
welcome to solve it by implementing generation mark-and-sweep GC on file
system scale. :-)



There are at least two choices:

Bite the bullet and have a file system that is occasionally slow due to
cycle checking, but only when the user somehow makes a huge cycle.  Keep
in mind that this only happens when you use the new functionality, if you
only create files with one parent, it should be as fast as regular file
systems.  I see its features being useful for desktop use, not servers,
so the occasional speed hit is less annoyance than the lack of features
(the ability to file your files in several places).
  

I prefer the above to the below.

Another way is to not delete the files when they get unlinked.  Similar
to some other allocation management systems, have a background thread
doing the garbage collection and cycle tracing.  The drawback is that
you might run out of disc space if you're creating files faster than
the collector is cleaning up.

I wonder if you can combine a wandering journal (or whatever it is called,
where the journalled data blocks become the file's current contents) with
the copy type garbage collection (is that the same as a 2 generation mark
and sweep?).  Copy type collection copies all known reachable objects to
an empty half of the disk.  When that's done, the original half is marked
empty and the next pass copies in the other direction.  Could work nicely
if you have two disk drives.  Yet another PhD topic on garbage collection
for someone to research :-)

There are lots of other garbage collection schemes that might be
applicable to file systems with cycles.  It could work, maybe with
decent speed too!

- Alex


  




Re: File as a directory - VFS Changes

2005-06-02 Thread Nikita Danilov
Hans Reiser writes:
  What about if we have it that only the first name a directory is created
  with counts towards its reference count, and that if the directory is
  moved if it is moved from its first name, the new name becomes the one
  that counts towards the reference count?   A bit of a hack, but would work.

This means that list of names has to be kept together with every object
(to find out where true reference has to be moved). And this makes
rename of directory problematic, as lists of names of all directory
children have to be updated.

  
  Hans

Nikita.


Re: File as a directory - VFS Changes

2005-06-02 Thread Nikita Danilov
Alexander G. M. Smith writes:

[...]

  
  The typical worst case operation will be deleting a link to your photo
  from a directory you decided didn't classify it properly.  The photo may
  be in several directories, such as Cottage, Aunt and Bottles if it is
  a picture of a champaign bottle you polished off at your aunt's cottage.
  You decide that it shouldn't really be in the Aunt folder, so you delete
  it (or rather the link) from there.

This is typical operation for a desktop usage, I agree. But desktop is
not interesting. It doesn't pose technical difficulty to implement
whatever indexing structure when your dataset is but a few dozen
thousand objects [1]. What _is_ interesting, is to make file system
scalable. Solution that fails to move directory simply because sub-tree
rooted at it is large is not scalable.

  
  The traversal starts with recursively finding all the children of the
  deleted object, which will include the photo and all attributish
  subobjects (thumbnail, description, ...).  Not too bad, maybe a
  dozen objects.  Then reconnect those children to objects which have
  a known good path to the root, reached through whatever parents remain.

And at that moment user hits ^C...

That is, how atomicity guarantees of rename will be preserved? Note that
many applications, like some mail servers crucially depend on rename
atomicity to implement their transaction mini-engines.

And concurrency issues also don't look bright: what if while

mv /d0/d1/d2/d2 /b0/b1/b2

is performed and thread is in the middle of scanning descendants of
/d0/d1/d2/d2 recursively, another thread does

mv /d0/d1 /c0/c1/c2

? Obviously scanning cannot take locks on individual files as it sees
them (because, namespace being an arbitrary graph, this will
deadlock). The only remaining solution is to take whole-fs-lock during
every rename/link/unlink operation. Which is another nail to the
scalability coffin.

[...]

  
  Now if you move the directory containing millions of files, then it's
  going to take a while.  And if it has a hard link down to another
  directory, that gets traversed too.  But that won't happen too often,
  only around spring time when you're reorganizing your mail archives.

It happens all the time on my workstation, when I move Linux source
trees around.

  
  - Alex

Nikita.

Footnotes: 
[1]  Implementing things like Spotlight does not require
any innovation at the file system layer (and not coincidentally,
Spotlight is based on almost 20 years old BSDLite kernel code).



Re: File as a directory - VFS Changes

2005-06-02 Thread Nikita Danilov
Jonathan Briggs writes:
  On Wed, 2005-06-01 at 21:27 +0400, Nikita Danilov wrote:
  [snip]
   Frankly speaking, I suspect that name-as-attribute is going to limit
   usability of file system significantly.
   
   Note, that in the real world, only names from quite limited class are
   attributes of objects, viz. /proper names/ like France, or Jonathan
   Briggs. Communication wouldn't get any far if only proper names were
   allowed.
   
   Nikita.
  
  Bringing up /proper names/ from the real world agrees with my idea
  though! :-)

I don't understand why if you are liberty to design new namespace model
from scratch (it seems POSIX semantics are not binding in our case), you
are going to faithfully replicate deficiencies of natural languages.

It is common trait in both science and engineering that when two flavors
of the same functionality (real names vs. indices) arise, an attempt is
made to reduce one of them to another, simplifying the system as a
result.

In our case, motivation to reduce one type of names to another is even
more pressing, as these types are incompatible: in the presence of
cycles or dynamic queries, namespace visible through the directory
hierarchy is different from the namespace of real names.

Indices cannot be reduced to real names (as rename is impossible to
implement efficiently), but real names can very well be reduced to
indices as exemplified by each and every UNIX file system out there.

So, the question is: what real names buy one, that indices do not?

[...]

  -- 
  Jonathan Briggs [EMAIL PROTECTED]
  eSoft, Inc.

Nikita.


RE: File as a directory - VFS Changes

2005-06-02 Thread Faraz Ahmed
 Hi Nikita;


 The problems of files not fitting in the query of the smart folder is a
 serious one. We had implemented this same thing for our semantic
filesystem.

For ex we create a MP3 file is a JPEG folder things it wont ever get
listed.
This will fundamentally change the way users see your filesytem, the users
expect to see the files in the folder they created. This it self should be a
default search criteria.
We almost solved this by having the parentdirectory as a attribute of the
file. All the smart folders have thier query transparently modified as
where type=jpg Or parentdirectory=thisdirectory. This make the virtual
folder stuff work as EXTENSION to standard file/directory relationship
rather than work as RELPLACEMENT.

Personal experience says that user dont digest any change to UNIX
filesystem mode. Anything extra is OK but replacements are BAD. Think of it
you created a C file in a virtual folder for h files the files wont get
listed(althoug they will exist). THEN WHAT??? the user has to search it BAD,
your whole fancy virtual directory USECASE itself is lost and eventually we
endup solving nothing.



Other issues include this display name stuff etc. They are bad. what if
two files with same display name get listed in the same virtual directory.
No point in creating a problem and then solving it. Good Work though we dont
want to get booged down once WinFS is released.
Regards
Faraz.




Re: File as a directory - VFS Changes

2005-06-02 Thread Jonathan Briggs
On Thu, 2005-06-02 at 14:38 +0400, Nikita Danilov wrote:
 Jonathan Briggs writes:
   On Wed, 2005-06-01 at 21:27 +0400, Nikita Danilov wrote:
   [snip]
Frankly speaking, I suspect that name-as-attribute is going to limit
usability of file system significantly.

Usability as in features?  Or usability as in performance?


Note, that in the real world, only names from quite limited class are
attributes of objects, viz. /proper names/ like France, or Jonathan
Briggs. Communication wouldn't get any far if only proper names were
allowed.

Nikita.
   
   Bringing up /proper names/ from the real world agrees with my idea
   though! :-)
 
 I don't understand why if you are liberty to design new namespace model
 from scratch (it seems POSIX semantics are not binding in our case), you
 are going to faithfully replicate deficiencies of natural languages.
 
 It is common trait in both science and engineering that when two flavors
 of the same functionality (real names vs. indices) arise, an attempt is
 made to reduce one of them to another, simplifying the system as a
 result.

A index is an arrangement of information about the indexed items.  The
index contents *belong* to the items.  An index by name?  That name
belongs to the item.  An index by date?  Those dates are properties of
the item.  Anything that can be indexed about an item can be described
as a property of the item.

Only for efficiency reasons are index data not included with the item
data.

 
 In our case, motivation to reduce one type of names to another is even
 more pressing, as these types are incompatible: in the presence of
 cycles or dynamic queries, namespace visible through the directory
 hierarchy is different from the namespace of real names.

Queries create indexes based on properties of the items.  This is no
different from directories, which are indexes based on names of the
items.

In the same way that you can descend a directory tree and copy the names
found into each item, you can check each item and copy the names found
into a directory tree.

 
 Indices cannot be reduced to real names (as rename is impossible to
 implement efficiently), but real names can very well be reduced to
 indices as exemplified by each and every UNIX file system out there.
 
 So, the question is: what real names buy one, that indices do not?

By storing the names in the items, cycles become solvable because you
can always look at the current directory's name(s) to see where you
really are.  Every name becomes absolutely connected to the top of the
namespace instead of depending on a parent pointer that may not ever
connect to the top.

If speeding up rename was very important, you can replace every pathname
component with a indirect reference instead of using simple strings.
Changing directory levels is still difficult.

-- 
Jonathan Briggs [EMAIL PROTECTED]
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: File as a directory - VFS Changes

2005-06-02 Thread Nikita Danilov
Jonathan Briggs writes:
  On Thu, 2005-06-02 at 14:38 +0400, Nikita Danilov wrote:
   Jonathan Briggs writes:
 On Wed, 2005-06-01 at 21:27 +0400, Nikita Danilov wrote:
 [snip]
  Frankly speaking, I suspect that name-as-attribute is going to limit
  usability of file system significantly.
  
  Usability as in features?  Or usability as in performance?

Usability as in ease of use.

[...]

  
  A index is an arrangement of information about the indexed items.  The
  index contents *belong* to the items.  An index by name?  That name
  belongs to the item.  An index by date?  Those dates are properties of

In the flat world of relation databases, maybe. But almost nowhere else
improper name is an attribute of its signified: variable is not an
attribute of object it points to, URL is not an attribute of the web
page, block number is not an attribute of data stored in that block on
the disk, etc.

[...]

  
  In the same way that you can descend a directory tree and copy the names
  found into each item, you can check each item and copy the names found
  into a directory tree.

Except that as was already discussed resulting directory tree is _bound_
to be inconsistent with real names.

  
   
   Indices cannot be reduced to real names (as rename is impossible to
   implement efficiently), but real names can very well be reduced to
   indices as exemplified by each and every UNIX file system out there.
   
   So, the question is: what real names buy one, that indices do not?
  
  By storing the names in the items, cycles become solvable because you
  can always look at the current directory's name(s) to see where you
  really are.  Every name becomes absolutely connected to the top of the
  namespace instead of depending on a parent pointer that may not ever
  connect to the top.

But cycles are solvable in current file systems too: they simply do
not exist there.

  
  If speeding up rename was very important, you can replace every pathname
  component with a indirect reference instead of using simple strings.
  Changing directory levels is still difficult.

It is not only speed that will be extremely hard to achieve in that
design; atomicity (in the face of possible crash during rename), and
concurrency control look problematic too.

  
  -- 
  Jonathan Briggs [EMAIL PROTECTED]
  eSoft, Inc.

Nikita.


Re: File as a directory - VFS Changes

2005-06-01 Thread Nikita Danilov
Jonathan Briggs writes:
  On Wed, 2005-06-01 at 02:36 +0400, Nikita Danilov wrote:

[...]

   
   One problem with the above is that directory structure is inconsistent
   with lists of names associated with objects. For example, file1 is a
   child of /tmp/A/B/C/A, but Object 1001 doesn't list /tmp/A/B/C/A/file1
   among its names.
  
  file1 *appears* to be a child because it is actually returned as the
  query result for its name of /tmp/A/file1 because A is a query

I beg your pardon, but this is confusing. Objects have real names that
are stings attached to them. User, on the other hand, accesses objects
through paths in directory hierarchy which is just a way to execute
queries on real-names. But some paths do correspond to real-names and
same do not? I, personally, would be very wary to use such a behavior as
a fundamental model of file system.

Also, if directories are just queries, it is not clear why they have
real-names on their own. For example, what does it mean, for object O1
(a directory) to have a real-name /a/b, and to return (c - O2) as a
part of query result, where O2 has only one name, viz. /d/e?

Basically, without some extra restrictions, your model doesn't provide
consistency between user visible paths, and hidden real-names, which
makes it not very useful in the practice, I am afraid.

  for /tmp/A/.  If the shell was smart enough to normalize its path by
  asking the directory for its name, it would know that /tmp/A/B/C/A
  was /tmp/A.   

/tmp/A/B/C/A may have other names beyond /tmp/A, which one to choose?

But yes, a stupid program could be confused by the
  difference between names.

A _user_ will most definitely be confused, which is much more important.

[...]

  
  Moving an object with mv would change its name.  Moving a top-level
  directory like /usr would require visiting every object starting
  with /usr and doing an edit.  A compression scheme could be used where
  the most-used top-level directory names were replaced with lookup
  tables, then /usr could be renamed just once in the table.

Heh, you just invented good old directories, by the way.

[...]

  
  Yes. :-)  It is radical, and the idea is taken from databases.  I
  thought that seemed to be the direction Reiser filesystems were moving.
  In this scheme a name is just another bit of metadata and not
  first-class important information.  The name-query directories would be
  there for traditional filesystem users and Unix compatibility.  They
  would probably be virtual and dynamic, only being created when needed
  and only being persistent if assigned meta-data (extra names (links),
  non-default permission bits, etc) or for performance reasons (faster to
  load from cache than searching every file).

That latter bit, about making them persistent, is where the tr


Re: File as a directory - VFS Changes

2005-06-01 Thread Nikita Danilov
Nikita Danilov writes:

[...]

  

Yes. :-)  It is radical, and the idea is taken from databases.  I
thought that seemed to be the direction Reiser filesystems were moving.
In this scheme a name is just another bit of metadata and not
first-class important information.  The name-query directories would be
there for traditional filesystem users and Unix compatibility.  They
would probably be virtual and dynamic, only being created when needed
and only being persistent if assigned meta-data (extra names (links),
non-default permission bits, etc) or for performance reasons (faster to
load from cache than searching every file).
  
  That latter bit, about making them persistent, is where the tr
  

[Hmm... grue ate my message.]

That latter bit, about making them persistent, is where the trouble
begins: once queries acquire identity and a place in the file system
name-space, they logically become part of that very name-space they are
querying! This leads to various complication, and you are trying to work
around them by claiming that queries are not _always_ part of name-space
(file1 [only] **appears** to be a child...). This non-uniform behavior
is a big disadvantage.

Nikita.


Re: File as a directory - VFS Changes

2005-06-01 Thread Jonathan Briggs
On Wed, 2005-06-01 at 14:43 +0400, Nikita Danilov wrote:
 Nikita Danilov writes:
 
 [...]
 
   
 
 Yes. :-)  It is radical, and the idea is taken from databases.  I
 thought that seemed to be the direction Reiser filesystems were moving.
 In this scheme a name is just another bit of metadata and not
 first-class important information.  The name-query directories would be
 there for traditional filesystem users and Unix compatibility.  They
 would probably be virtual and dynamic, only being created when needed
 and only being persistent if assigned meta-data (extra names (links),
 non-default permission bits, etc) or for performance reasons (faster to
 load from cache than searching every file).
   
   That latter bit, about making them persistent, is where the tr
   
 
 [Hmm... grue ate my message.]
 
 That latter bit, about making them persistent, is where the trouble
 begins: once queries acquire identity and a place in the file system
 name-space, they logically become part of that very name-space they are
 querying! This leads to various complication, and you are trying to work
 around them by claiming that queries are not _always_ part of name-space
 (file1 [only] **appears** to be a child...). This non-uniform behavior
 is a big disadvantage.

In this scheme, query objects were always part of the name-space.

None of the objects are really children of any of the others. They only
appear to be children when viewed through a set of name-query
directories.  In reality every object would be an equal in the true OID
name-space.  Only meta-data objects are children of their data objects.

You could also create a confusing query named /tmp/G that returned
results for /usr/lib/.  This is the same sort of abuse that creates
A-B-C-A loops: the query was deliberately set to have a misleading
name/name-query relationship.

The user is responsible for sensible naming.   Under normal use, a user
would hardly notice the difference between traditional directories and
this name-query system.  

With persistent disk cache of queries and lookup tables for common
names, it does start to look like regular directory structures, but it
is still coming at the problem from the opposite direction.  Traditional
directories store information about a file (its name) outside the file,
and this system would store everything about a file with the file
itself.
-- 
Jonathan Briggs [EMAIL PROTECTED]
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: File as a directory - VFS Changes

2005-06-01 Thread Nikita Danilov
Jonathan Briggs writes:
  On Wed, 2005-06-01 at 14:43 +0400, Nikita Danilov wrote:
   Nikita Danilov writes:

[...]

   
   That latter bit, about making them persistent, is where the trouble
   begins: once queries acquire identity and a place in the file system
   name-space, they logically become part of that very name-space they are
   querying! This leads to various complication, and you are trying to work
   around them by claiming that queries are not _always_ part of name-space
   (file1 [only] **appears** to be a child...). This non-uniform behavior
   is a big disadvantage.
  
  In this scheme, query objects were always part of the name-space.

Then, paths visible through queries are inconsistent with names of
underlying objects. You querying system returns fake results
(/tmp/A/B/C/A/file1) that are not present in the database queries are
ran against. This is *wrong*. Nobody is going to tolerate DBMS that
sometimes returns extra rows in SELECT statement, right?

[...]

  
  The user is responsible for sensible naming.   Under normal use, a user
  would hardly notice the difference between traditional directories and
  this name-query system.  

Heh, this assumes that users will continue to use new namespace as they
use old one. Which is not true. Usage is determined by features
provided. This is, by the way, one of driving forces behind reiserfs
support for small files and large directories.

If file system provides ability to create namespaces in the form of
arbitrary graphs, this will be used.

Nikita.


Re: File as a directory - VFS Changes

2005-06-01 Thread Jonathan Briggs
On Wed, 2005-06-01 at 18:42 +0400, Nikita Danilov wrote:
 Jonathan Briggs writes:
   On Wed, 2005-06-01 at 14:43 +0400, Nikita Danilov wrote:
Nikita Danilov writes:
 
 [...]
 

That latter bit, about making them persistent, is where the trouble
begins: once queries acquire identity and a place in the file system
name-space, they logically become part of that very name-space they are
querying! This leads to various complication, and you are trying to work
around them by claiming that queries are not _always_ part of name-space
(file1 [only] **appears** to be a child...). This non-uniform behavior
is a big disadvantage.
   
   In this scheme, query objects were always part of the name-space.
 
 Then, paths visible through queries are inconsistent with names of
 underlying objects. You querying system returns fake results
 (/tmp/A/B/C/A/file1) that are not present in the database queries are
 ran against. This is *wrong*. Nobody is going to tolerate DBMS that
 sometimes returns extra rows in SELECT statement, right?

If you wished to enforce name-query directories always having a single
name and their query always being identical to their name, then that
wouldn't happen.

However, query directories (or smart folders) will have this namespace
problem in every case and there is no avoiding it.  If the query is for
every file modified in the past day, the file path through the query
directory is not going to match any given name of the file.  Same for
keyword queries, ownership queries, or whatever.

In the traditional directory system, a file doesn't have an official
name, just links to it from directory entries.  Perhaps if you think of
the proposed name meta-data as a preferred name the idea would work
better for you?
-- 
Jonathan Briggs [EMAIL PROTECTED]
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: File as a directory - VFS Changes

2005-06-01 Thread Nikita Danilov
Jonathan Briggs writes:

[...]

  
  However, query directories (or smart folders) will have this namespace
  problem in every case and there is no avoiding it.  If the query is for
  every file modified in the past day, the file path through the query
  directory is not going to match any given name of the file.  Same for
  keyword queries, ownership queries, or whatever.

Which I think exactly points to one fundamental problem with the idea
that names are attributes of object: this idea is incompatible with the
notion of dynamically created views that in effect add new paths
through which objects are reachable. These paths _are_ names as far as
user is concerned (after all names exist to reach objects), but they are
not in the name-as-attribute model.

  
  In the traditional directory system, a file doesn't have an official
  name, just links to it from directory entries.  Perhaps if you think of
  the proposed name meta-data as a preferred name the idea would work
  better for you?

Frankly speaking, I suspect that name-as-attribute is going to limit
usability of file system significantly.

Note, that in the real world, only names from quite limited class are
attributes of objects, viz. /proper names/ like France, or Jonathan
Briggs. Communication wouldn't get any far if only proper names were
allowed.

Nikita.


Re: File as a directory - VFS Changes

2005-06-01 Thread Jonathan Briggs
On Wed, 2005-06-01 at 21:27 +0400, Nikita Danilov wrote:
[snip]
 Frankly speaking, I suspect that name-as-attribute is going to limit
 usability of file system significantly.
 
 Note, that in the real world, only names from quite limited class are
 attributes of objects, viz. /proper names/ like France, or Jonathan
 Briggs. Communication wouldn't get any far if only proper names were
 allowed.
 
 Nikita.

Bringing up /proper names/ from the real world agrees with my idea
though! :-)

As a person, you have a list of proper names that you answer to and
that you prefer.  However, in some cases you will also answer to Hey,
you over there! or Someone who left a white Honda in the parking lot,
please turn your lights off.

So a file could have a list of proper names, but it can also be referred
to in any other way and by any other name.  Proper names would be
preferred, though.
-- 
Jonathan Briggs [EMAIL PROTECTED]
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: File as a directory - VFS Changes

2005-06-01 Thread Alexander G. M. Smith
Hans Reiser wrote on Tue, 31 May 2005 11:32:04 -0700:
 What about if we have it that only the first name a directory is created
 with counts towards its reference count, and that if the directory is
 moved if it is moved from its first name, the new name becomes the one
 that counts towards the reference count?   A bit of a hack, but would work.

Sounds a lot like what I did earlier.  Files got really deleted when the
true name was the only name for a file (only one parent in other words).
But I also had a large cycle finding pause when any file movement happened.
I'm not sure if it would still be needed.

Nikita Danilov wrote:
 - if garbage collection is implemented through the reference counting
 (which is the only known way tractable for a file system), then cycles
 are never collected.
 [...]
 But the garbage collection problem is still there. You are more than
 welcome to solve it by implementing generation mark-and-sweep GC on file
 system scale. :-)

There are at least two choices:

Bite the bullet and have a file system that is occasionally slow due to
cycle checking, but only when the user somehow makes a huge cycle.  Keep
in mind that this only happens when you use the new functionality, if you
only create files with one parent, it should be as fast as regular file
systems.  I see its features being useful for desktop use, not servers,
so the occasional speed hit is less annoyance than the lack of features
(the ability to file your files in several places).

Another way is to not delete the files when they get unlinked.  Similar
to some other allocation management systems, have a background thread
doing the garbage collection and cycle tracing.  The drawback is that
you might run out of disc space if you're creating files faster than
the collector is cleaning up.

I wonder if you can combine a wandering journal (or whatever it is called,
where the journalled data blocks become the file's current contents) with
the copy type garbage collection (is that the same as a 2 generation mark
and sweep?).  Copy type collection copies all known reachable objects to
an empty half of the disk.  When that's done, the original half is marked
empty and the next pass copies in the other direction.  Could work nicely
if you have two disk drives.  Yet another PhD topic on garbage collection
for someone to research :-)

There are lots of other garbage collection schemes that might be
applicable to file systems with cycles.  It could work, maybe with
decent speed too!

- Alex


Re: File as a directory - VFS Changes

2005-06-01 Thread Alexander G. M. Smith
Nikita Danilov wrote on Wed, 1 Jun 2005 14:58:47 +0400:
 For example: mv /d0 /d1
 
 To check that this doesn't introduce a cycle one has to load each child
 of /d0 (which may be millions) and recursively check that from none of
 them /d1 is reachable. This has to be done on each rename. I believe
 this is unacceptable overhead.

That's where we differ.  I think it is an acceptable overhead.  It also
only happens on rename and delete operations for objects with multiple
parents or descendants.  If you just move or delete an ordinary file
that's got just one parent directory and no children, the cost is
ordinary too.

If it's a fildirute object with a dozen attribute type things as
children, then it will need to traverse those dozen children.  Not
a big deal.  Consider this example:

The typical worst case operation will be deleting a link to your photo
from a directory you decided didn't classify it properly.  The photo may
be in several directories, such as Cottage, Aunt and Bottles if it is
a picture of a champaign bottle you polished off at your aunt's cottage.
You decide that it shouldn't really be in the Aunt folder, so you delete
it (or rather the link) from there.

The traversal starts with recursively finding all the children of the
deleted object, which will include the photo and all attributish
subobjects (thumbnail, description, ...).  Not too bad, maybe a
dozen objects.  Then reconnect those children to objects which have
a known good path to the root, reached through whatever parents remain.
That path through the new link becomes their true path name.  The photo
goes first, finding one of the alternative parent directories, say
Cottage as its new main parent.  Then the other children find the Photo
as their main parent.

In other words, the cycle checker has to find all the children of the
deleted object(s).  In most cases there aren't very many of them.

Now if you move the directory containing millions of files, then it's
going to take a while.  And if it has a hard link down to another
directory, that gets traversed too.  But that won't happen too often,
only around spring time when you're reorganizing your mail archives.

- Alex


Re: File as a directory - VFS Changes

2005-05-31 Thread Nikita Danilov
Alexander G. M. Smith writes:
  Nikita Danilov wrote on Mon, 30 May 2005 15:00:52 +0400:
   Nothing in VFS prevents files from supporting both read(2) and
   readdir(3). The problem is with link(2): VFS assumes that directories
   form _tree_, that is, every directory has well-defined parent.
  
  At least that's one problem that's solveable.  Just define one of
  the parents as the master parent directory, with a guaranteed path
  up to the root, and have the others as auxiliary parents.  That
  also gives you a good path name to each and every file-thing.
  
  The VFS or the file system (depending on where the designers want
  to split the work) will still have to handle cycles in the graph
  to recompute the new master parents, when an old one gets deleted
  or moved.

Cycle may consists of more graph nodes than fits into memory. Cycle
detection is crucial for rename semantics, and if
cycle-just-about-to-be-formed doesn't fit into memory it's not clear how
to detect it, because tree has to be locked while checked for cycles, and
one definitely doesn't want to keep such a lock over IO.

  
  - Alex

Nikita.


Re: File as a directory - VFS Changes

2005-05-31 Thread Hans Reiser
Nikita Danilov wrote:

Alexander G. M. Smith writes:
  Nikita Danilov wrote on Mon, 30 May 2005 15:00:52 +0400:
   Nothing in VFS prevents files from supporting both read(2) and
   readdir(3). The problem is with link(2): VFS assumes that directories
   form _tree_, that is, every directory has well-defined parent.
  
  At least that's one problem that's solveable.  Just define one of
  the parents as the master parent directory, with a guaranteed path
  up to the root, and have the others as auxiliary parents.  That
  also gives you a good path name to each and every file-thing.
  
  The VFS or the file system (depending on where the designers want
  to split the work) will still have to handle cycles in the graph
  to recompute the new master parents, when an old one gets deleted
  or moved.

Cycle may consists of more graph nodes than fits into memory. 

There are pathname length restrictions already in the kernel that should
prevent that, yes?

Cycle
detection is crucial for rename semantics, and if
cycle-just-about-to-be-formed doesn't fit into memory it's not clear how
to detect it, because tree has to be locked while checked for cycles, and
one definitely doesn't want to keep such a lock over IO.

  
  - Alex

Nikita.


  




Re: File as a directory - VFS Changes

2005-05-31 Thread Nikita Danilov
Hello Hans,

Hans Reiser writes:
  Nikita Danilov wrote:
  
  Alexander G. M. Smith writes:
Nikita Danilov wrote on Mon, 30 May 2005 15:00:52 +0400:
 Nothing in VFS prevents files from supporting both read(2) and
 readdir(3). The problem is with link(2): VFS assumes that directories
 form _tree_, that is, every directory has well-defined parent.

At least that's one problem that's solveable.  Just define one of
the parents as the master parent directory, with a guaranteed path
up to the root, and have the others as auxiliary parents.  That
also gives you a good path name to each and every file-thing.

The VFS or the file system (depending on where the designers want
to split the work) will still have to handle cycles in the graph
to recompute the new master parents, when an old one gets deleted
or moved.
  
  Cycle may consists of more graph nodes than fits into memory. 
  
  There are pathname length restrictions already in the kernel that should
  prevent that, yes?

UNIX namespaces are not _that_ retarded. :-)

int main(int argc, char **argv)
{
int i;

for (i = 0; ; ++ i) {
mkdir(foo, 0777);
chdir(foo);
if ((i % 1000) == 0)
printf(%i\n, i);
}
return 0;
}

run it for a while, interrupt, and do

$ find foo
$ rm -frv foo

  
  Cycle
  detection is crucial for rename semantics, and if
  cycle-just-about-to-be-formed doesn't fit into memory it's not clear how
  to detect it, because tree has to be locked while checked for cycles, and
  one definitely doesn't want to keep such a lock over IO.
  

- Alex
  

Nikita.

  
  

  


Re: File as a directory - VFS Changes

2005-05-31 Thread Valdis . Kletnieks
On Tue, 31 May 2005 08:04:42 PDT, Hans Reiser said:

 Cycle may consists of more graph nodes than fits into memory. 
 
 There are pathname length restrictions already in the kernel that should
 prevent that, yes?

The problem is that although a *single* pathname can't be longer than some
length, you can still create a cycle.  Consider for instance a pathname 
restriction
of 1024 chars.  Filenames A, B, and C are all 400 characters long.  A points at 
B,
B points at C - and C points back to A.

Also, although the set of inodes *in the cycle* fits in memory, the set of
inodes *in the entire graph* that has to be searched to verify the presence of
a cycle may not (in general, you have to be ready to examine *all* the inodes
unless you can do some pruning (unallocated, provably un-cycleable, and so
on)).  THis is the sort of thing that you can afford to do in userspace during
an fsck, but certainly can't do in the kernel on every syscall that might
create a cycle...



pgpdt2U5lIsqK.pgp
Description: PGP signature


Re: File as a directory - VFS Changes

2005-05-31 Thread Jonathan Briggs
On Tue, 2005-05-31 at 12:30 -0400, [EMAIL PROTECTED] wrote:
 On Tue, 31 May 2005 08:04:42 PDT, Hans Reiser said:
 
  Cycle may consists of more graph nodes than fits into memory. 
  
  There are pathname length restrictions already in the kernel that should
  prevent that, yes?
 
 The problem is that although a *single* pathname can't be longer than some
 length, you can still create a cycle.  Consider for instance a pathname 
 restriction
 of 1024 chars.  Filenames A, B, and C are all 400 characters long.  A points 
 at B,
 B points at C - and C points back to A.
 
 Also, although the set of inodes *in the cycle* fits in memory, the set of
 inodes *in the entire graph* that has to be searched to verify the presence of
 a cycle may not (in general, you have to be ready to examine *all* the inodes
 unless you can do some pruning (unallocated, provably un-cycleable, and so
 on)).  THis is the sort of thing that you can afford to do in userspace during
 an fsck, but certainly can't do in the kernel on every syscall that might
 create a cycle...

You can avoid cycles by redefining the problem.

Every file or data object has one single True Name which is their
inode or OID.  Each data object then has one or more names as
properties.  Names are either single strings with slash separators for
directories, or each directory element is a unique object in an object
list.  Directories then become queries that return the set of objects
holding that directory name.  The query results are of course cached and
updated whenever a name property changes.

Now there are no cycles, although a naive Unix find program could get
stuck in a loop.
-- 
Jonathan Briggs [EMAIL PROTECTED]
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: File as a directory - VFS Changes

2005-05-31 Thread Hans Reiser
What happens when you unlink the True Name?

Hans

Jonathan Briggs wrote:


You can avoid cycles by redefining the problem.

Every file or data object has one single True Name which is their
inode or OID.  Each data object then has one or more names as
properties.  Names are either single strings with slash separators for
directories, or each directory element is a unique object in an object
list.  Directories then become queries that return the set of objects
holding that directory name.  The query results are of course cached and
updated whenever a name property changes.

Now there are no cycles, although a naive Unix find program could get
stuck in a loop.
  




Re: File as a directory - VFS Changes

2005-05-31 Thread Jonathan Briggs
Either that isn't allowed, or it immediately vanishes from all
directories.

If deleting by OID isn't allowed, then every name property must be
removed in order to delete the file.

Personally, I would allow deleting the OID.  It would be a convenient
way to be sure every instance of a file was deleted.

On Tue, 2005-05-31 at 09:59 -0700, Hans Reiser wrote:
 What happens when you unlink the True Name?
 
 Hans
 
 Jonathan Briggs wrote:
 
 
 You can avoid cycles by redefining the problem.
 
 Every file or data object has one single True Name which is their
 inode or OID.  Each data object then has one or more names as
 properties.  Names are either single strings with slash separators for
 directories, or each directory element is a unique object in an object
 list.  Directories then become queries that return the set of objects
 holding that directory name.  The query results are of course cached and
 updated whenever a name property changes.
 
 Now there are no cycles, although a naive Unix find program could get
 stuck in a loop.
   
 
 
-- 
Jonathan Briggs [EMAIL PROTECTED]
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: File as a directory - VFS Changes

2005-05-31 Thread Nikita Danilov
Jonathan Briggs writes:
  On Tue, 2005-05-31 at 12:30 -0400, [EMAIL PROTECTED] wrote:
   On Tue, 31 May 2005 08:04:42 PDT, Hans Reiser said:
   
Cycle may consists of more graph nodes than fits into memory. 

There are pathname length restrictions already in the kernel that should
prevent that, yes?
   
   The problem is that although a *single* pathname can't be longer than some
   length, you can still create a cycle.  Consider for instance a pathname 
   restriction
   of 1024 chars.  Filenames A, B, and C are all 400 characters long.  A 
   points at B,
   B points at C - and C points back to A.
   
   Also, although the set of inodes *in the cycle* fits in memory, the set of
   inodes *in the entire graph* that has to be searched to verify the 
   presence of
   a cycle may not (in general, you have to be ready to examine *all* the 
   inodes
   unless you can do some pruning (unallocated, provably un-cycleable, and so
   on)).  THis is the sort of thing that you can afford to do in userspace 
   during
   an fsck, but certainly can't do in the kernel on every syscall that might
   create a cycle...
  
  You can avoid cycles by redefining the problem.
  
  Every file or data object has one single True Name which is their
  inode or OID.  Each data object then has one or more names as
  properties.  Names are either single strings with slash separators for
  directories, or each directory element is a unique object in an object
  list.  Directories then become queries that return the set of objects
  holding that directory name.  The query results are of course cached and
  updated whenever a name property changes.
  
  Now there are no cycles, although a naive Unix find program could get
  stuck in a loop.

Huh? Cycles are still here.

Query D0 returns D1, query D1 returns D2, ... query DN returns D0. The
problem is not in the mechanism used to encode tree/graph structure. The
problem is in the limitations imposed by required semantics:

   (R) every object except some selected root is Reachable. (No leaks.)

   (G) unused objects are sooner or later discarded. (Garbage
   collection.)

Neither requirement is compatible with cycles in the directory
structure:

 - from (R) it follows that object can be discarded only if it empty (as
 a directory). All nodes in a cycle are not empty (because each of them
 contains at least a reference to the next one), and hence none of them
 can be ever removed;

 - if garbage collection is implemented through the reference counting
 (which is the only known way tractable for a file system), then cycles
 are never collected.

Unless you are talking about a two-level naming scheme, where One True
Names are visible to the user. In that case reachability problem
evaporates, because manipulations with normal directory structure never
make node unreachable---it is always accessible through its True
Name.

But the garbage collection problem is still there. You are more than
welcome to solve it by implementing generation mark-and-sweep GC on file
system scale. :-)

Nikita.


Re: File as a directory - VFS Changes

2005-05-31 Thread Hans Reiser
Well,. if you allow multiple true names, then you start to resemble
something I suggested a few years ago, in which I outlined a taxonomy of
links, and suggested that some links would count towards the reference
count and some would not.

Of course, that does nothing for the cycle problem..

How are cycles handled for symlinks currently?

Hans

Jonathan Briggs wrote:

Either that isn't allowed, or it immediately vanishes from all
directories.

If deleting by OID isn't allowed, then every name property must be
removed in order to delete the file.

Personally, I would allow deleting the OID.  It would be a convenient
way to be sure every instance of a file was deleted.

On Tue, 2005-05-31 at 09:59 -0700, Hans Reiser wrote:
  

What happens when you unlink the True Name?

Hans

Jonathan Briggs wrote:



You can avoid cycles by redefining the problem.

Every file or data object has one single True Name which is their
inode or OID.  Each data object then has one or more names as
properties.  Names are either single strings with slash separators for
directories, or each directory element is a unique object in an object
list.  Directories then become queries that return the set of objects
holding that directory name.  The query results are of course cached and
updated whenever a name property changes.

Now there are no cycles, although a naive Unix find program could get
stuck in a loop.
 

  




Re: File as a directory - VFS Changes

2005-05-31 Thread Hans Reiser
What about if we have it that only the first name a directory is created
with counts towards its reference count, and that if the directory is
moved if it is moved from its first name, the new name becomes the one
that counts towards the reference count?   A bit of a hack, but would work.

Hans

Nikita Danilov wrote:

Jonathan Briggs writes:
  On Tue, 2005-05-31 at 12:30 -0400, [EMAIL PROTECTED] wrote:
   On Tue, 31 May 2005 08:04:42 PDT, Hans Reiser said:
   
Cycle may consists of more graph nodes than fits into memory. 

There are pathname length restrictions already in the kernel that should
prevent that, yes?
   
   The problem is that although a *single* pathname can't be longer than some
   length, you can still create a cycle.  Consider for instance a pathname 
   restriction
   of 1024 chars.  Filenames A, B, and C are all 400 characters long.  A 
   points at B,
   B points at C - and C points back to A.
   
   Also, although the set of inodes *in the cycle* fits in memory, the set of
   inodes *in the entire graph* that has to be searched to verify the 
   presence of
   a cycle may not (in general, you have to be ready to examine *all* the 
   inodes
   unless you can do some pruning (unallocated, provably un-cycleable, and so
   on)).  THis is the sort of thing that you can afford to do in userspace 
   during
   an fsck, but certainly can't do in the kernel on every syscall that might
   create a cycle...
  
  You can avoid cycles by redefining the problem.
  
  Every file or data object has one single True Name which is their
  inode or OID.  Each data object then has one or more names as
  properties.  Names are either single strings with slash separators for
  directories, or each directory element is a unique object in an object
  list.  Directories then become queries that return the set of objects
  holding that directory name.  The query results are of course cached and
  updated whenever a name property changes.
  
  Now there are no cycles, although a naive Unix find program could get
  stuck in a loop.

Huh? Cycles are still here.

Query D0 returns D1, query D1 returns D2, ... query DN returns D0. The
problem is not in the mechanism used to encode tree/graph structure. The
problem is in the limitations imposed by required semantics:

   (R) every object except some selected root is Reachable. (No leaks.)

   (G) unused objects are sooner or later discarded. (Garbage
   collection.)

Neither requirement is compatible with cycles in the directory
structure:

 - from (R) it follows that object can be discarded only if it empty (as
 a directory). All nodes in a cycle are not empty (because each of them
 contains at least a reference to the next one), and hence none of them
 can be ever removed;

 - if garbage collection is implemented through the reference counting
 (which is the only known way tractable for a file system), then cycles
 are never collected.

Unless you are talking about a two-level naming scheme, where One True
Names are visible to the user. In that case reachability problem
evaporates, because manipulations with normal directory structure never
make node unreachable---it is always accessible through its True
Name.

But the garbage collection problem is still there. You are more than
welcome to solve it by implementing generation mark-and-sweep GC on file
system scale. :-)

Nikita.


  




Re: File as a directory - VFS Changes

2005-05-31 Thread Jonathan Briggs
On Tue, 2005-05-31 at 15:01 -0600, Jonathan Briggs wrote:
 I should create an example.
 
 Wherever I used True Name previously, use OID instead.  True Name was
 simply another term for a unique object identifier.
 
 Three files with OIDs of 1001, 1002, and 1003.
 Object 1001:
 name: /tmp/A/file1
 name: /tmp/A/B/file1
 name: /tmp/A/B/C/file1
 
 Object 1002:
 name: /tmp/A/file2
 
 Object 1003:
 name: /tmp/A/B/file3
 
 Three query objects (directories) with OIDs of 1, 2, and 3.
 Object 1:
 name: /tmp/A
 name: /tmp/A/B/C/A
 query: name begins with /tmp/A/
 query result cache: B-2, file1-1001, file2-1002
 
 Object 2:
 name: /tmp/A/B
 query: name begins with /tmp/A/B/
 query result cache: C-3, file1-1001, file3-1003
 
 Object 3:
 name: /tmp/A/B/C
 query: name begins with /tmp/A/B/C/
 query result cache: A-1, file1-1001
 
 Now there is a A - B - C - A directory loop.  But removing
 name: /tmp/A/B/C/A from Object 1 fixes the loop.  Deleting Object 1 also
 fixes the loop.  Deleting any of Object 1, 2 or 3 does not affect any
 other object, because in this scheme, directory objects do not need to
 actually exist: they are just queries that return objects with certain
 names.

I forgot to address Nikita's point about reclaiming lost cycles.  In
this case, let me create Object 4 for /tmp
Object 4:
name: /tmp
query: name begins with /tmp/
query result cache: A-1

Now, if we delete Object 4, are Objects 1,2,3 lost?  I would say not
because they still have names.  When the shell calls chdir(/tmp) a new
query object (directory) must be created dynamically, and Objects
1001,1002,1003 still have their names that start with /tmp and so they
immediately appear again.  Their names still start with /, so the top
level query will still find them and /tmp as well.

Therefore, the cycle is never detached and lost.
-- 
Jonathan Briggs [EMAIL PROTECTED]
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: File as a directory - VFS Changes

2005-05-31 Thread Nikita Danilov
Jonathan Briggs writes:
  On Tue, 2005-05-31 at 15:01 -0600, Jonathan Briggs wrote:
   I should create an example.
   
   Wherever I used True Name previously, use OID instead.  True Name was
   simply another term for a unique object identifier.
   
   Three files with OIDs of 1001, 1002, and 1003.
   Object 1001:
   name: /tmp/A/file1
   name: /tmp/A/B/file1
   name: /tmp/A/B/C/file1
   
   Object 1002:
   name: /tmp/A/file2
   
   Object 1003:
   name: /tmp/A/B/file3
   
   Three query objects (directories) with OIDs of 1, 2, and 3.
   Object 1:
   name: /tmp/A
   name: /tmp/A/B/C/A
   query: name begins with /tmp/A/
   query result cache: B-2, file1-1001, file2-1002
   
   Object 2:
   name: /tmp/A/B
   query: name begins with /tmp/A/B/
   query result cache: C-3, file1-1001, file3-1003
   
   Object 3:
   name: /tmp/A/B/C
   query: name begins with /tmp/A/B/C/
   query result cache: A-1, file1-1001
   
   Now there is a A - B - C - A directory loop.  But removing
   name: /tmp/A/B/C/A from Object 1 fixes the loop.  Deleting Object 1 also
   fixes the loop.  Deleting any of Object 1, 2 or 3 does not affect any
   other object, because in this scheme, directory objects do not need to
   actually exist: they are just queries that return objects with certain
   names.

One problem with the above is that directory structure is inconsistent
with lists of names associated with objects. For example, file1 is a
child of /tmp/A/B/C/A, but Object 1001 doesn't list /tmp/A/B/C/A/file1
among its names.

  
  I forgot to address Nikita's point about reclaiming lost cycles.  In
  this case, let me create Object 4 for /tmp
  Object 4:
  name: /tmp
  query: name begins with /tmp/
  query result cache: A-1
  
  Now, if we delete Object 4, are Objects 1,2,3 lost?  I would say not
  because they still have names.  When the shell calls chdir(/tmp) a new
  query object (directory) must be created dynamically, and Objects
  1001,1002,1003 still have their names that start with /tmp and so they
  immediately appear again.  Their names still start with /, so the top
  level query will still find them and /tmp as well.

Object 4 is /tmp. Once it was removed what does it _mean_ for, say,
Object 1003 to have a name /tmp/A/B/file3? What is /tmp bit there?
Just a string? If so, and your directories are but queries, what does it
mean for directory to be removed? How mv /tmp/A /tmp/A1 is implemented?
By scanning whole file system and updating leaf name-lists?

It seems that what you are proposing is a radical departure from file
system namespace as we know it. :-) In your scheme all structural
information is encoded in leaves _only_, and directories just do some
kind of pattern matching. This is closer to a relational database than
to the current file-systems where directories are the only source of
the structural inform


Re: File as a directory - VFS Changes

2005-05-31 Thread Jonathan Briggs
On Wed, 2005-06-01 at 02:36 +0400, Nikita Danilov wrote:
 Jonathan Briggs writes:
   On Tue, 2005-05-31 at 15:01 -0600, Jonathan Briggs wrote:
I should create an example.

Wherever I used True Name previously, use OID instead.  True Name was
simply another term for a unique object identifier.

Three files with OIDs of 1001, 1002, and 1003.
Object 1001:
name: /tmp/A/file1
name: /tmp/A/B/file1
name: /tmp/A/B/C/file1

Object 1002:
name: /tmp/A/file2

Object 1003:
name: /tmp/A/B/file3

Three query objects (directories) with OIDs of 1, 2, and 3.
Object 1:
name: /tmp/A
name: /tmp/A/B/C/A
query: name begins with /tmp/A/
query result cache: B-2, file1-1001, file2-1002

Object 2:
name: /tmp/A/B
query: name begins with /tmp/A/B/
query result cache: C-3, file1-1001, file3-1003

Object 3:
name: /tmp/A/B/C
query: name begins with /tmp/A/B/C/
query result cache: A-1, file1-1001

Now there is a A - B - C - A directory loop.  But removing
name: /tmp/A/B/C/A from Object 1 fixes the loop.  Deleting Object 1 also
fixes the loop.  Deleting any of Object 1, 2 or 3 does not affect any
other object, because in this scheme, directory objects do not need to
actually exist: they are just queries that return objects with certain
names.
 
 One problem with the above is that directory structure is inconsistent
 with lists of names associated with objects. For example, file1 is a
 child of /tmp/A/B/C/A, but Object 1001 doesn't list /tmp/A/B/C/A/file1
 among its names.

file1 *appears* to be a child because it is actually returned as the
query result for its name of /tmp/A/file1 because A is a query
for /tmp/A/.  If the shell was smart enough to normalize its path by
asking the directory for its name, it would know that /tmp/A/B/C/A
was /tmp/A.   But yes, a stupid program could be confused by the
difference between names.

 
   
   I forgot to address Nikita's point about reclaiming lost cycles.  In
   this case, let me create Object 4 for /tmp
   Object 4:
   name: /tmp
   query: name begins with /tmp/
   query result cache: A-1
   
   Now, if we delete Object 4, are Objects 1,2,3 lost?  I would say not
   because they still have names.  When the shell calls chdir(/tmp) a new
   query object (directory) must be created dynamically, and Objects
   1001,1002,1003 still have their names that start with /tmp and so they
   immediately appear again.  Their names still start with /, so the top
   level query will still find them and /tmp as well.
 
 Object 4 is /tmp. Once it was removed what does it _mean_ for, say,
 Object 1003 to have a name /tmp/A/B/file3? What is /tmp bit there?
 Just a string? If so, and your directories are but queries, what does it
 mean for directory to be removed? How mv /tmp/A /tmp/A1 is implemented?
 By scanning whole file system and updating leaf name-lists?

Well, the name doesn't mean anything. :-)  It is just a convenient
metadata for describing where to find the file in a hierarchy, and for
Unix compatibility.

If a directory was removed by a standard rm -rf, it would work as
expected because it would descend the tree removing names (unlink) from
each object it found.

Moving an object with mv would change its name.  Moving a top-level
directory like /usr would require visiting every object starting
with /usr and doing an edit.  A compression scheme could be used where
the most-used top-level directory names were replaced with lookup
tables, then /usr could be renamed just once in the table.

 It seems that what you are proposing is a radical departure from file
 system namespace as we know it. :-) In your scheme all structural
 information is encoded in leaves _only_, and directories just do some
 kind of pattern matching. This is closer to a relational database than
 to the current file-systems where directories are the only source of
 the structural inform

Yes. :-)  It is radical, and the idea is taken from databases.  I
thought that seemed to be the direction Reiser filesystems were moving.
In this scheme a name is just another bit of metadata and not
first-class important information.  The name-query directories would be
there for traditional filesystem users and Unix compatibility.  They
would probably be virtual and dynamic, only being created when needed
and only being persistent if assigned meta-data (extra names (links),
non-default permission bits, etc) or for performance reasons (faster to
load from cache than searching every file).
-- 
Jonathan Briggs [EMAIL PROTECTED]
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: File as a directory - VFS Changes

2005-05-31 Thread Alexander G. M. Smith
Nikita Danilov wrote on Tue, 31 May 2005 13:34:55 +0400:
 Cycle may consists of more graph nodes than fits into memory. Cycle
 detection is crucial for rename semantics, and if
 cycle-just-about-to-be-formed doesn't fit into memory it's not clear how
 to detect it, because tree has to be locked while checked for cycles, and
 one definitely doesn't want to keep such a lock over IO.

Sometimes you'll just have to return an error code if the rename operation
is too complex to be done.  The user will have to then delete individual
leaf files to make the situation simpler.  I hope this won't happen very
often.

On the plus side, the detection of all the files that may be affected
means you can now delete a directory directly, contents and all, if all
the related inodes fit into memory.

- Alex


Re: File as a directory - VFS Changes

2005-05-30 Thread Hans Reiser
I think what Alex is suggesting below is reasonable and something
resembling it should be done, though I will not go into details on it
until we have some working code

Hans

Alexander G. M. Smith wrote:

[EMAIL PROTECTED] wrote on Sat, 28 May 2005 15:42:35 -0400:
  

I'm not Hans, but I *will* ask How much of this is *rationally* doable
without some help from the VFS?.  At the very least, some of this stuff
will require the FS to tell the VFS to suspend its disbelief (for starters,
doing this without confusing the VFS's concepts of dentries/inodes/reference
counts is going to be interesting... :)



Good point.  One way would be to cram it into the existing VFS (the
operating system's interface to file systems) as directories representing
the objects, containing a specially named file for the raw data, mixed in
with child items and symbolic links to parent objects.  Some inodes would
be fake ones, geneated as needed to represent the old style view of the
file / directory / attribute thing (such as the parent symbolic links).

But what would I (Hans likely has other views) like to see in a new VFS
to support files / directories / attributes all being the same kind of
object?  I'll talk about the user level API view of the VFS, rather than
the flip side for file systems or the gritty VFS internals, since it
doesn't need to be Linux specific.

For one, it would be almost the same as the existing VFS.  But when you
open a fildirute-thing, you can use the same file handle to read and
write its data and to list its children.

Thus open() and opendir() are combined into plain open().  It takes a
conventional hierarchical path (or later some of Hans Reiser's more
sophisticated namespaces?).  Returns a file handle.

The resulting file handle can be used with read(), write(), seek(),
readdir(), rewinddir() and the rest of the usual directory and file
basic operations.  And of course, close() it when you're done.

Stat() would disappear.  All the miscellaneous stat data would be
stored as sub-files, things like the date last modified, access
permissions and so on.  There would be a standard filename and file
type for those metadata subfiles to distinguish them from user created
subfiles (such as file/.meta.last_modified).  That also makes it
easier to add new kinds of metadata.

And that's about it for the basics.

Standard utilities, like ls would have to be changed to use the new
object structure - listing the contents of a thing and avoiding
recursion down paths that lead to parent objects (just like ls
currently avoids listing .. recursively).  That may involve more
work than the kernel changes!

I'd add a multi-read function to replace stat().  Give it a list of
sub-file names to read and it returns their names and contents in a
packed list (like a dirent structure).  That way bulk reading date
stamps, permissions and other attributish small metadata as subfiles
won't have as much overhead as opening then individually.  Particularly
if under the hood they are stored as fields in the file's inode rather
than as totally separate files (this is what BeOS's BFS does for small
attributes).  Though conceptually you treat them as separate subfiles.

I'd also like to add indexing.  That could be done by creating a magic
directory with an associated file type to index.  Then whenever a file
with that file type is changed, the index is updated using the file's
contents as the key, and a link to the file as the value.  The file
type also implies the interpretation of the values for sorting
purposes - as strings, binary numbers, etc.  Unlike BeOS, I'd expose
the indices directly (appearing as a directory full of hard links)
and have query languages implemented in userland libraries that make
use the indices, rather than as part of the file system.  Now should
indices be system wide and maintained by the VFS, or per-volume and
maintained by the file system?  How about indices for things on network
drives?  Things on public web sites for a web-view file system?

I'd also like to add change notification.  If a file system object's
child list changes, then a notification message gets sent to interested
listeners.  Similarly for an object's data content change.  BeOS had
useful notifications for live changes to a query - I'd punt this to
the userland query library and have it build on the change notifications
from an index directory.  The VFS and other parts of the OS would need
to support change notification (BeOS used inter-process message queues).

Can a file-as-directory system fit into Linux, or some other OS?
I expect that it will only happen if the new system also exposes a
backwards compatible view for old software, using the old APIs.
After that's done, the first big user program that needs to be
updated is the desktop file browser.  Once there's a good GUI for
browsing file-as-directory file systems, the general public might
become more aware of their advantages (easily drilling down inside
files to 

Re: File as a directory - VFS Changes

2005-05-30 Thread Nikita Danilov
Alexander G. M. Smith writes:
  [EMAIL PROTECTED] wrote on Sat, 28 May 2005 15:42:35 -0400:
   I'm not Hans, but I *will* ask How much of this is *rationally* doable
   without some help from the VFS?.  At the very least, some of this stuff
   will require the FS to tell the VFS to suspend its disbelief (for starters,
   doing this without confusing the VFS's concepts of 
   dentries/inodes/reference
   counts is going to be interesting... :)
  
  Good point.  One way would be to cram it into the existing VFS (the
  operating system's interface to file systems) as directories representing
  the objects, containing a specially named file for the raw data, mixed in
  with child items and symbolic links to parent objects.  Some inodes would
  be fake ones, geneated as needed to represent the old style view of the
  file / directory / attribute thing (such as the parent symbolic links).
  
  But what would I (Hans likely has other views) like to see in a new VFS
  to support files / directories / attributes all being the same kind of
  object?  I'll talk about the user level API view of the VFS, rather than
  the flip side for file systems or the gritty VFS internals, since it
  doesn't need to be Linux specific.
  
  For one, it would be almost the same as the existing VFS.  But when you
  open a fildirute-thing, you can use the same file handle to read and
  write its data and to list its children.

This is doable with the current VFS.

  
  Thus open() and opendir() are combined into plain open().  It takes a
  conventional hierarchical path (or later some of Hans Reiser's more
  sophisticated namespaces?).  Returns a file handle.

opendir(3) is user level function. It calls open(2) system
call. telldir(3) and seekdir(3) also are functions that call lseek(2)
under the hood.

  
  The resulting file handle can be used with read(), write(), seek(),
  readdir(), rewinddir() and the rest of the usual directory and file
  basic operations.  And of course, close() it when you're done.

Nothing in VFS prevents files from supporting both read(2) and
readdir(3). The problem is with link(2): VFS assumes that directories
form _tree_, that is, every directory has well-defined parent.

  
  Stat() would disappear.  All the miscellaneous stat data would be
  stored as sub-files, things like the date last modified, access
  permissions and so on.  There would be a standard filename and file
  type for those metadata subfiles to distinguish them from user created
  subfiles (such as file/.meta.last_modified).  That also makes it
  easier to add new kinds of metadata.
  
  And that's about it for the basics.

Problem with that is that in /etc/passwd/..foo-meta-thing
/etc/passwd is both regular (possibly with multiple names), and
directory at the same time, which is problem for VFS, see above. Read
Documentation/filesystems/directory-locking and imagine the following:

$ touch a
$ ln a b
$ mv a/..uid b/..uid

(and yes, rename had to lock parent directories _before_ ever calling
into file system back-end, so reiser4 code cannot somehow magically hint
VFS that a and b are to be treated in a special way).

Nikita.


Re: File as a directory - VFS Changes

2005-05-30 Thread Alexander G. M. Smith
Nikita Danilov wrote on Mon, 30 May 2005 15:00:52 +0400:
 Nothing in VFS prevents files from supporting both read(2) and
 readdir(3). The problem is with link(2): VFS assumes that directories
 form _tree_, that is, every directory has well-defined parent.

At least that's one problem that's solveable.  Just define one of
the parents as the master parent directory, with a guaranteed path
up to the root, and have the others as auxiliary parents.  That
also gives you a good path name to each and every file-thing.

The VFS or the file system (depending on where the designers want
to split the work) will still have to handle cycles in the graph
to recompute the new master parents, when an old one gets deleted
or moved.

- Alex


Re: File as a directory - VFS Changes

2005-05-29 Thread Alexander G. M. Smith
[EMAIL PROTECTED] wrote on Sat, 28 May 2005 15:42:35 -0400:
 I'm not Hans, but I *will* ask How much of this is *rationally* doable
 without some help from the VFS?.  At the very least, some of this stuff
 will require the FS to tell the VFS to suspend its disbelief (for starters,
 doing this without confusing the VFS's concepts of dentries/inodes/reference
 counts is going to be interesting... :)

Good point.  One way would be to cram it into the existing VFS (the
operating system's interface to file systems) as directories representing
the objects, containing a specially named file for the raw data, mixed in
with child items and symbolic links to parent objects.  Some inodes would
be fake ones, geneated as needed to represent the old style view of the
file / directory / attribute thing (such as the parent symbolic links).

But what would I (Hans likely has other views) like to see in a new VFS
to support files / directories / attributes all being the same kind of
object?  I'll talk about the user level API view of the VFS, rather than
the flip side for file systems or the gritty VFS internals, since it
doesn't need to be Linux specific.

For one, it would be almost the same as the existing VFS.  But when you
open a fildirute-thing, you can use the same file handle to read and
write its data and to list its children.

Thus open() and opendir() are combined into plain open().  It takes a
conventional hierarchical path (or later some of Hans Reiser's more
sophisticated namespaces?).  Returns a file handle.

The resulting file handle can be used with read(), write(), seek(),
readdir(), rewinddir() and the rest of the usual directory and file
basic operations.  And of course, close() it when you're done.

Stat() would disappear.  All the miscellaneous stat data would be
stored as sub-files, things like the date last modified, access
permissions and so on.  There would be a standard filename and file
type for those metadata subfiles to distinguish them from user created
subfiles (such as file/.meta.last_modified).  That also makes it
easier to add new kinds of metadata.

And that's about it for the basics.

Standard utilities, like ls would have to be changed to use the new
object structure - listing the contents of a thing and avoiding
recursion down paths that lead to parent objects (just like ls
currently avoids listing .. recursively).  That may involve more
work than the kernel changes!

I'd add a multi-read function to replace stat().  Give it a list of
sub-file names to read and it returns their names and contents in a
packed list (like a dirent structure).  That way bulk reading date
stamps, permissions and other attributish small metadata as subfiles
won't have as much overhead as opening then individually.  Particularly
if under the hood they are stored as fields in the file's inode rather
than as totally separate files (this is what BeOS's BFS does for small
attributes).  Though conceptually you treat them as separate subfiles.

I'd also like to add indexing.  That could be done by creating a magic
directory with an associated file type to index.  Then whenever a file
with that file type is changed, the index is updated using the file's
contents as the key, and a link to the file as the value.  The file
type also implies the interpretation of the values for sorting
purposes - as strings, binary numbers, etc.  Unlike BeOS, I'd expose
the indices directly (appearing as a directory full of hard links)
and have query languages implemented in userland libraries that make
use the indices, rather than as part of the file system.  Now should
indices be system wide and maintained by the VFS, or per-volume and
maintained by the file system?  How about indices for things on network
drives?  Things on public web sites for a web-view file system?

I'd also like to add change notification.  If a file system object's
child list changes, then a notification message gets sent to interested
listeners.  Similarly for an object's data content change.  BeOS had
useful notifications for live changes to a query - I'd punt this to
the userland query library and have it build on the change notifications
from an index directory.  The VFS and other parts of the OS would need
to support change notification (BeOS used inter-process message queues).

Can a file-as-directory system fit into Linux, or some other OS?
I expect that it will only happen if the new system also exposes a
backwards compatible view for old software, using the old APIs.
After that's done, the first big user program that needs to be
updated is the desktop file browser.  Once there's a good GUI for
browsing file-as-directory file systems, the general public might
become more aware of their advantages (easily drilling down inside
files to attach a description subfile or add a bunch of MP3 tags,
magic query directories and indexing to find things quickly, multiple
parents to put the same file in multiple folders without the
breakability of symbolic links