Re: File as a directory - VFS Changes
Nikita Danilov wrote: Hans Reiser writes: Nikita Danilov wrote: But cycles are solvable in current file systems too: they simply do not exist there. Yes, but Nikita, cycles represent semantic functionality that has value because being able to embody more expressions means more power of If you mean that multiple parents have some value, I agree. Problem is that solutions proposed so far have severe limitation: - they add support for cycle detection that is necessary to support multiple parents, but that support is only efficient for small datasets: when total number of objects is not very big, and average object has only one parent. - even when there are no multiple parents, system is not efficient for large number of files. Can you say this in more detail? expression. If some way can be found to allow them, then functionality is increased. Separating links that increase reference count from links that merely point (ala hard vs. sym links) is one approach. If there was effective enough for real world use cycle detection, that would be better. It seems to me that in the domains where proposed designs are applicable, symlinks already provide viable solution. I have been thinking that disabling hard links for filedirectories might be an acceptable solution for reiser4 if cycles are a deeper problem than I currently appreciate. We can then allow people to turn off one of either filedirectories or hard links. I would prefer solving the cycles problem though Nikita.
Re: File as a directory - VFS Changes
Hans Reiser writes: Nikita Danilov wrote: But cycles are solvable in current file systems too: they simply do not exist there. Yes, but Nikita, cycles represent semantic functionality that has value because being able to embody more expressions means more power of If you mean that multiple parents have some value, I agree. Problem is that solutions proposed so far have severe limitation: - they add support for cycle detection that is necessary to support multiple parents, but that support is only efficient for small datasets: when total number of objects is not very big, and average object has only one parent. - even when there are no multiple parents, system is not efficient for large number of files. expression. If some way can be found to allow them, then functionality is increased. Separating links that increase reference count from links that merely point (ala hard vs. sym links) is one approach. If there was effective enough for real world use cycle detection, that would be better. It seems to me that in the domains where proposed designs are applicable, symlinks already provide viable solution. Nikita.
Re: File as a directory - VFS Changes
Hi; Why is this discussion revoling around Relational Databases. The attributes of the files and files themselves, if were to be modelled for querying a Realtional Database would really s**k. The attribute info is neither structured, nor is it unstructured, its SEMI-STRUCTURED. Exceuting Structured Query Lang(Sql) over semistrutured data would result in - Harder modelling (almost a waste of effort), - Complex Quering (Eleganant system of no use because of the amout of joins that would result in Quering , if you somehow model semi-structured data in some structured Data Model); The best option, to start would be with best COT. I feel we should look at Loreal a stanford project. For hints about modelling our whatever. Regards Faraz :) - Original Message - From: Nikita Danilov [EMAIL PROTECTED] To: Jonathan Briggs [EMAIL PROTECTED] Cc: Hans Reiser [EMAIL PROTECTED]; [EMAIL PROTECTED]; Alexander G. M. Smith [EMAIL PROTECTED]; [EMAIL PROTECTED]; reiserfs-list@namesys.com; [EMAIL PROTECTED]; Nate Diller [EMAIL PROTECTED] Sent: Thursday, June 02, 2005 4:54 PM Subject: Re: File as a directory - VFS Changes Jonathan Briggs writes: On Thu, 2005-06-02 at 14:38 +0400, Nikita Danilov wrote: Jonathan Briggs writes: On Wed, 2005-06-01 at 21:27 +0400, Nikita Danilov wrote: [snip] Frankly speaking, I suspect that name-as-attribute is going to limit usability of file system significantly. Usability as in features? Or usability as in performance? Usability as in ease of use. [...] A index is an arrangement of information about the indexed items. The index contents *belong* to the items. An index by name? That name belongs to the item. An index by date? Those dates are properties of In the flat world of relation databases, maybe. But almost nowhere else improper name is an attribute of its signified: variable is not an attribute of object it points to, URL is not an attribute of the web page, block number is not an attribute of data stored in that block on the disk, etc. [...] In the same way that you can descend a directory tree and copy the names found into each item, you can check each item and copy the names found into a directory tree. Except that as was already discussed resulting directory tree is _bound_ to be inconsistent with real names. Indices cannot be reduced to real names (as rename is impossible to implement efficiently), but real names can very well be reduced to indices as exemplified by each and every UNIX file system out there. So, the question is: what real names buy one, that indices do not? By storing the names in the items, cycles become solvable because you can always look at the current directory's name(s) to see where you really are. Every name becomes absolutely connected to the top of the namespace instead of depending on a parent pointer that may not ever connect to the top. But cycles are solvable in current file systems too: they simply do not exist there. If speeding up rename was very important, you can replace every pathname component with a indirect reference instead of using simple strings. Changing directory levels is still difficult. It is not only speed that will be extremely hard to achieve in that design; atomicity (in the face of possible crash during rename), and concurrency control look problematic too. -- Jonathan Briggs [EMAIL PROTECTED] eSoft, Inc. Nikita.
Re: File as a directory - VFS Changes
Nikita Danilov wrote: But cycles are solvable in current file systems too: they simply do not exist there. Yes, but Nikita, cycles represent semantic functionality that has value because being able to embody more expressions means more power of expression. If some way can be found to allow them, then functionality is increased. Separating links that increase reference count from links that merely point (ala hard vs. sym links) is one approach. If there was effective enough for real world use cycle detection, that would be better. If speeding up rename was very important, you can replace every pathname component with a indirect reference instead of using simple strings. Changing directory levels is still difficult. It is not only speed that will be extremely hard to achieve in that design; atomicity (in the face of possible crash during rename), and concurrency control look problematic too. -- Jonathan Briggs [EMAIL PROTECTED] eSoft, Inc. Nikita.
Re: File as a directory - VFS Changes
Alexander G. M. Smith wrote: Hans Reiser wrote on Tue, 31 May 2005 11:32:04 -0700: What about if we have it that only the first name a directory is created with counts towards its reference count, and that if the directory is moved if it is moved from its first name, the new name becomes the one that counts towards the reference count? A bit of a hack, but would work. Sounds a lot like what I did earlier. Files got really deleted when the true name was the only name for a file (only one parent in other words). But I also had a large cycle finding pause when any file movement happened. I'm not sure if it would still be needed. Nikita Danilov wrote: - if garbage collection is implemented through the reference counting (which is the only known way tractable for a file system), then cycles are never collected. [...] But the garbage collection problem is still there. You are more than welcome to solve it by implementing generation mark-and-sweep GC on file system scale. :-) There are at least two choices: Bite the bullet and have a file system that is occasionally slow due to cycle checking, but only when the user somehow makes a huge cycle. Keep in mind that this only happens when you use the new functionality, if you only create files with one parent, it should be as fast as regular file systems. I see its features being useful for desktop use, not servers, so the occasional speed hit is less annoyance than the lack of features (the ability to file your files in several places). I prefer the above to the below. Another way is to not delete the files when they get unlinked. Similar to some other allocation management systems, have a background thread doing the garbage collection and cycle tracing. The drawback is that you might run out of disc space if you're creating files faster than the collector is cleaning up. I wonder if you can combine a wandering journal (or whatever it is called, where the journalled data blocks become the file's current contents) with the copy type garbage collection (is that the same as a 2 generation mark and sweep?). Copy type collection copies all known reachable objects to an empty half of the disk. When that's done, the original half is marked empty and the next pass copies in the other direction. Could work nicely if you have two disk drives. Yet another PhD topic on garbage collection for someone to research :-) There are lots of other garbage collection schemes that might be applicable to file systems with cycles. It could work, maybe with decent speed too! - Alex
Re: File as a directory - VFS Changes
Hans Reiser writes: What about if we have it that only the first name a directory is created with counts towards its reference count, and that if the directory is moved if it is moved from its first name, the new name becomes the one that counts towards the reference count? A bit of a hack, but would work. This means that list of names has to be kept together with every object (to find out where true reference has to be moved). And this makes rename of directory problematic, as lists of names of all directory children have to be updated. Hans Nikita.
Re: File as a directory - VFS Changes
Alexander G. M. Smith writes: [...] The typical worst case operation will be deleting a link to your photo from a directory you decided didn't classify it properly. The photo may be in several directories, such as Cottage, Aunt and Bottles if it is a picture of a champaign bottle you polished off at your aunt's cottage. You decide that it shouldn't really be in the Aunt folder, so you delete it (or rather the link) from there. This is typical operation for a desktop usage, I agree. But desktop is not interesting. It doesn't pose technical difficulty to implement whatever indexing structure when your dataset is but a few dozen thousand objects [1]. What _is_ interesting, is to make file system scalable. Solution that fails to move directory simply because sub-tree rooted at it is large is not scalable. The traversal starts with recursively finding all the children of the deleted object, which will include the photo and all attributish subobjects (thumbnail, description, ...). Not too bad, maybe a dozen objects. Then reconnect those children to objects which have a known good path to the root, reached through whatever parents remain. And at that moment user hits ^C... That is, how atomicity guarantees of rename will be preserved? Note that many applications, like some mail servers crucially depend on rename atomicity to implement their transaction mini-engines. And concurrency issues also don't look bright: what if while mv /d0/d1/d2/d2 /b0/b1/b2 is performed and thread is in the middle of scanning descendants of /d0/d1/d2/d2 recursively, another thread does mv /d0/d1 /c0/c1/c2 ? Obviously scanning cannot take locks on individual files as it sees them (because, namespace being an arbitrary graph, this will deadlock). The only remaining solution is to take whole-fs-lock during every rename/link/unlink operation. Which is another nail to the scalability coffin. [...] Now if you move the directory containing millions of files, then it's going to take a while. And if it has a hard link down to another directory, that gets traversed too. But that won't happen too often, only around spring time when you're reorganizing your mail archives. It happens all the time on my workstation, when I move Linux source trees around. - Alex Nikita. Footnotes: [1] Implementing things like Spotlight does not require any innovation at the file system layer (and not coincidentally, Spotlight is based on almost 20 years old BSDLite kernel code).
Re: File as a directory - VFS Changes
Jonathan Briggs writes: On Wed, 2005-06-01 at 21:27 +0400, Nikita Danilov wrote: [snip] Frankly speaking, I suspect that name-as-attribute is going to limit usability of file system significantly. Note, that in the real world, only names from quite limited class are attributes of objects, viz. /proper names/ like France, or Jonathan Briggs. Communication wouldn't get any far if only proper names were allowed. Nikita. Bringing up /proper names/ from the real world agrees with my idea though! :-) I don't understand why if you are liberty to design new namespace model from scratch (it seems POSIX semantics are not binding in our case), you are going to faithfully replicate deficiencies of natural languages. It is common trait in both science and engineering that when two flavors of the same functionality (real names vs. indices) arise, an attempt is made to reduce one of them to another, simplifying the system as a result. In our case, motivation to reduce one type of names to another is even more pressing, as these types are incompatible: in the presence of cycles or dynamic queries, namespace visible through the directory hierarchy is different from the namespace of real names. Indices cannot be reduced to real names (as rename is impossible to implement efficiently), but real names can very well be reduced to indices as exemplified by each and every UNIX file system out there. So, the question is: what real names buy one, that indices do not? [...] -- Jonathan Briggs [EMAIL PROTECTED] eSoft, Inc. Nikita.
RE: File as a directory - VFS Changes
Hi Nikita; The problems of files not fitting in the query of the smart folder is a serious one. We had implemented this same thing for our semantic filesystem. For ex we create a MP3 file is a JPEG folder things it wont ever get listed. This will fundamentally change the way users see your filesytem, the users expect to see the files in the folder they created. This it self should be a default search criteria. We almost solved this by having the parentdirectory as a attribute of the file. All the smart folders have thier query transparently modified as where type=jpg Or parentdirectory=thisdirectory. This make the virtual folder stuff work as EXTENSION to standard file/directory relationship rather than work as RELPLACEMENT. Personal experience says that user dont digest any change to UNIX filesystem mode. Anything extra is OK but replacements are BAD. Think of it you created a C file in a virtual folder for h files the files wont get listed(althoug they will exist). THEN WHAT??? the user has to search it BAD, your whole fancy virtual directory USECASE itself is lost and eventually we endup solving nothing. Other issues include this display name stuff etc. They are bad. what if two files with same display name get listed in the same virtual directory. No point in creating a problem and then solving it. Good Work though we dont want to get booged down once WinFS is released. Regards Faraz.
Re: File as a directory - VFS Changes
On Thu, 2005-06-02 at 14:38 +0400, Nikita Danilov wrote: Jonathan Briggs writes: On Wed, 2005-06-01 at 21:27 +0400, Nikita Danilov wrote: [snip] Frankly speaking, I suspect that name-as-attribute is going to limit usability of file system significantly. Usability as in features? Or usability as in performance? Note, that in the real world, only names from quite limited class are attributes of objects, viz. /proper names/ like France, or Jonathan Briggs. Communication wouldn't get any far if only proper names were allowed. Nikita. Bringing up /proper names/ from the real world agrees with my idea though! :-) I don't understand why if you are liberty to design new namespace model from scratch (it seems POSIX semantics are not binding in our case), you are going to faithfully replicate deficiencies of natural languages. It is common trait in both science and engineering that when two flavors of the same functionality (real names vs. indices) arise, an attempt is made to reduce one of them to another, simplifying the system as a result. A index is an arrangement of information about the indexed items. The index contents *belong* to the items. An index by name? That name belongs to the item. An index by date? Those dates are properties of the item. Anything that can be indexed about an item can be described as a property of the item. Only for efficiency reasons are index data not included with the item data. In our case, motivation to reduce one type of names to another is even more pressing, as these types are incompatible: in the presence of cycles or dynamic queries, namespace visible through the directory hierarchy is different from the namespace of real names. Queries create indexes based on properties of the items. This is no different from directories, which are indexes based on names of the items. In the same way that you can descend a directory tree and copy the names found into each item, you can check each item and copy the names found into a directory tree. Indices cannot be reduced to real names (as rename is impossible to implement efficiently), but real names can very well be reduced to indices as exemplified by each and every UNIX file system out there. So, the question is: what real names buy one, that indices do not? By storing the names in the items, cycles become solvable because you can always look at the current directory's name(s) to see where you really are. Every name becomes absolutely connected to the top of the namespace instead of depending on a parent pointer that may not ever connect to the top. If speeding up rename was very important, you can replace every pathname component with a indirect reference instead of using simple strings. Changing directory levels is still difficult. -- Jonathan Briggs [EMAIL PROTECTED] eSoft, Inc. signature.asc Description: This is a digitally signed message part
Re: File as a directory - VFS Changes
Jonathan Briggs writes: On Thu, 2005-06-02 at 14:38 +0400, Nikita Danilov wrote: Jonathan Briggs writes: On Wed, 2005-06-01 at 21:27 +0400, Nikita Danilov wrote: [snip] Frankly speaking, I suspect that name-as-attribute is going to limit usability of file system significantly. Usability as in features? Or usability as in performance? Usability as in ease of use. [...] A index is an arrangement of information about the indexed items. The index contents *belong* to the items. An index by name? That name belongs to the item. An index by date? Those dates are properties of In the flat world of relation databases, maybe. But almost nowhere else improper name is an attribute of its signified: variable is not an attribute of object it points to, URL is not an attribute of the web page, block number is not an attribute of data stored in that block on the disk, etc. [...] In the same way that you can descend a directory tree and copy the names found into each item, you can check each item and copy the names found into a directory tree. Except that as was already discussed resulting directory tree is _bound_ to be inconsistent with real names. Indices cannot be reduced to real names (as rename is impossible to implement efficiently), but real names can very well be reduced to indices as exemplified by each and every UNIX file system out there. So, the question is: what real names buy one, that indices do not? By storing the names in the items, cycles become solvable because you can always look at the current directory's name(s) to see where you really are. Every name becomes absolutely connected to the top of the namespace instead of depending on a parent pointer that may not ever connect to the top. But cycles are solvable in current file systems too: they simply do not exist there. If speeding up rename was very important, you can replace every pathname component with a indirect reference instead of using simple strings. Changing directory levels is still difficult. It is not only speed that will be extremely hard to achieve in that design; atomicity (in the face of possible crash during rename), and concurrency control look problematic too. -- Jonathan Briggs [EMAIL PROTECTED] eSoft, Inc. Nikita.
Re: File as a directory - VFS Changes
Jonathan Briggs writes: On Wed, 2005-06-01 at 02:36 +0400, Nikita Danilov wrote: [...] One problem with the above is that directory structure is inconsistent with lists of names associated with objects. For example, file1 is a child of /tmp/A/B/C/A, but Object 1001 doesn't list /tmp/A/B/C/A/file1 among its names. file1 *appears* to be a child because it is actually returned as the query result for its name of /tmp/A/file1 because A is a query I beg your pardon, but this is confusing. Objects have real names that are stings attached to them. User, on the other hand, accesses objects through paths in directory hierarchy which is just a way to execute queries on real-names. But some paths do correspond to real-names and same do not? I, personally, would be very wary to use such a behavior as a fundamental model of file system. Also, if directories are just queries, it is not clear why they have real-names on their own. For example, what does it mean, for object O1 (a directory) to have a real-name /a/b, and to return (c - O2) as a part of query result, where O2 has only one name, viz. /d/e? Basically, without some extra restrictions, your model doesn't provide consistency between user visible paths, and hidden real-names, which makes it not very useful in the practice, I am afraid. for /tmp/A/. If the shell was smart enough to normalize its path by asking the directory for its name, it would know that /tmp/A/B/C/A was /tmp/A. /tmp/A/B/C/A may have other names beyond /tmp/A, which one to choose? But yes, a stupid program could be confused by the difference between names. A _user_ will most definitely be confused, which is much more important. [...] Moving an object with mv would change its name. Moving a top-level directory like /usr would require visiting every object starting with /usr and doing an edit. A compression scheme could be used where the most-used top-level directory names were replaced with lookup tables, then /usr could be renamed just once in the table. Heh, you just invented good old directories, by the way. [...] Yes. :-) It is radical, and the idea is taken from databases. I thought that seemed to be the direction Reiser filesystems were moving. In this scheme a name is just another bit of metadata and not first-class important information. The name-query directories would be there for traditional filesystem users and Unix compatibility. They would probably be virtual and dynamic, only being created when needed and only being persistent if assigned meta-data (extra names (links), non-default permission bits, etc) or for performance reasons (faster to load from cache than searching every file). That latter bit, about making them persistent, is where the tr
Re: File as a directory - VFS Changes
Nikita Danilov writes: [...] Yes. :-) It is radical, and the idea is taken from databases. I thought that seemed to be the direction Reiser filesystems were moving. In this scheme a name is just another bit of metadata and not first-class important information. The name-query directories would be there for traditional filesystem users and Unix compatibility. They would probably be virtual and dynamic, only being created when needed and only being persistent if assigned meta-data (extra names (links), non-default permission bits, etc) or for performance reasons (faster to load from cache than searching every file). That latter bit, about making them persistent, is where the tr [Hmm... grue ate my message.] That latter bit, about making them persistent, is where the trouble begins: once queries acquire identity and a place in the file system name-space, they logically become part of that very name-space they are querying! This leads to various complication, and you are trying to work around them by claiming that queries are not _always_ part of name-space (file1 [only] **appears** to be a child...). This non-uniform behavior is a big disadvantage. Nikita.
Re: File as a directory - VFS Changes
On Wed, 2005-06-01 at 14:43 +0400, Nikita Danilov wrote: Nikita Danilov writes: [...] Yes. :-) It is radical, and the idea is taken from databases. I thought that seemed to be the direction Reiser filesystems were moving. In this scheme a name is just another bit of metadata and not first-class important information. The name-query directories would be there for traditional filesystem users and Unix compatibility. They would probably be virtual and dynamic, only being created when needed and only being persistent if assigned meta-data (extra names (links), non-default permission bits, etc) or for performance reasons (faster to load from cache than searching every file). That latter bit, about making them persistent, is where the tr [Hmm... grue ate my message.] That latter bit, about making them persistent, is where the trouble begins: once queries acquire identity and a place in the file system name-space, they logically become part of that very name-space they are querying! This leads to various complication, and you are trying to work around them by claiming that queries are not _always_ part of name-space (file1 [only] **appears** to be a child...). This non-uniform behavior is a big disadvantage. In this scheme, query objects were always part of the name-space. None of the objects are really children of any of the others. They only appear to be children when viewed through a set of name-query directories. In reality every object would be an equal in the true OID name-space. Only meta-data objects are children of their data objects. You could also create a confusing query named /tmp/G that returned results for /usr/lib/. This is the same sort of abuse that creates A-B-C-A loops: the query was deliberately set to have a misleading name/name-query relationship. The user is responsible for sensible naming. Under normal use, a user would hardly notice the difference between traditional directories and this name-query system. With persistent disk cache of queries and lookup tables for common names, it does start to look like regular directory structures, but it is still coming at the problem from the opposite direction. Traditional directories store information about a file (its name) outside the file, and this system would store everything about a file with the file itself. -- Jonathan Briggs [EMAIL PROTECTED] eSoft, Inc. signature.asc Description: This is a digitally signed message part
Re: File as a directory - VFS Changes
Jonathan Briggs writes: On Wed, 2005-06-01 at 14:43 +0400, Nikita Danilov wrote: Nikita Danilov writes: [...] That latter bit, about making them persistent, is where the trouble begins: once queries acquire identity and a place in the file system name-space, they logically become part of that very name-space they are querying! This leads to various complication, and you are trying to work around them by claiming that queries are not _always_ part of name-space (file1 [only] **appears** to be a child...). This non-uniform behavior is a big disadvantage. In this scheme, query objects were always part of the name-space. Then, paths visible through queries are inconsistent with names of underlying objects. You querying system returns fake results (/tmp/A/B/C/A/file1) that are not present in the database queries are ran against. This is *wrong*. Nobody is going to tolerate DBMS that sometimes returns extra rows in SELECT statement, right? [...] The user is responsible for sensible naming. Under normal use, a user would hardly notice the difference between traditional directories and this name-query system. Heh, this assumes that users will continue to use new namespace as they use old one. Which is not true. Usage is determined by features provided. This is, by the way, one of driving forces behind reiserfs support for small files and large directories. If file system provides ability to create namespaces in the form of arbitrary graphs, this will be used. Nikita.
Re: File as a directory - VFS Changes
On Wed, 2005-06-01 at 18:42 +0400, Nikita Danilov wrote: Jonathan Briggs writes: On Wed, 2005-06-01 at 14:43 +0400, Nikita Danilov wrote: Nikita Danilov writes: [...] That latter bit, about making them persistent, is where the trouble begins: once queries acquire identity and a place in the file system name-space, they logically become part of that very name-space they are querying! This leads to various complication, and you are trying to work around them by claiming that queries are not _always_ part of name-space (file1 [only] **appears** to be a child...). This non-uniform behavior is a big disadvantage. In this scheme, query objects were always part of the name-space. Then, paths visible through queries are inconsistent with names of underlying objects. You querying system returns fake results (/tmp/A/B/C/A/file1) that are not present in the database queries are ran against. This is *wrong*. Nobody is going to tolerate DBMS that sometimes returns extra rows in SELECT statement, right? If you wished to enforce name-query directories always having a single name and their query always being identical to their name, then that wouldn't happen. However, query directories (or smart folders) will have this namespace problem in every case and there is no avoiding it. If the query is for every file modified in the past day, the file path through the query directory is not going to match any given name of the file. Same for keyword queries, ownership queries, or whatever. In the traditional directory system, a file doesn't have an official name, just links to it from directory entries. Perhaps if you think of the proposed name meta-data as a preferred name the idea would work better for you? -- Jonathan Briggs [EMAIL PROTECTED] eSoft, Inc. signature.asc Description: This is a digitally signed message part
Re: File as a directory - VFS Changes
Jonathan Briggs writes: [...] However, query directories (or smart folders) will have this namespace problem in every case and there is no avoiding it. If the query is for every file modified in the past day, the file path through the query directory is not going to match any given name of the file. Same for keyword queries, ownership queries, or whatever. Which I think exactly points to one fundamental problem with the idea that names are attributes of object: this idea is incompatible with the notion of dynamically created views that in effect add new paths through which objects are reachable. These paths _are_ names as far as user is concerned (after all names exist to reach objects), but they are not in the name-as-attribute model. In the traditional directory system, a file doesn't have an official name, just links to it from directory entries. Perhaps if you think of the proposed name meta-data as a preferred name the idea would work better for you? Frankly speaking, I suspect that name-as-attribute is going to limit usability of file system significantly. Note, that in the real world, only names from quite limited class are attributes of objects, viz. /proper names/ like France, or Jonathan Briggs. Communication wouldn't get any far if only proper names were allowed. Nikita.
Re: File as a directory - VFS Changes
On Wed, 2005-06-01 at 21:27 +0400, Nikita Danilov wrote: [snip] Frankly speaking, I suspect that name-as-attribute is going to limit usability of file system significantly. Note, that in the real world, only names from quite limited class are attributes of objects, viz. /proper names/ like France, or Jonathan Briggs. Communication wouldn't get any far if only proper names were allowed. Nikita. Bringing up /proper names/ from the real world agrees with my idea though! :-) As a person, you have a list of proper names that you answer to and that you prefer. However, in some cases you will also answer to Hey, you over there! or Someone who left a white Honda in the parking lot, please turn your lights off. So a file could have a list of proper names, but it can also be referred to in any other way and by any other name. Proper names would be preferred, though. -- Jonathan Briggs [EMAIL PROTECTED] eSoft, Inc. signature.asc Description: This is a digitally signed message part
Re: File as a directory - VFS Changes
Hans Reiser wrote on Tue, 31 May 2005 11:32:04 -0700: What about if we have it that only the first name a directory is created with counts towards its reference count, and that if the directory is moved if it is moved from its first name, the new name becomes the one that counts towards the reference count? A bit of a hack, but would work. Sounds a lot like what I did earlier. Files got really deleted when the true name was the only name for a file (only one parent in other words). But I also had a large cycle finding pause when any file movement happened. I'm not sure if it would still be needed. Nikita Danilov wrote: - if garbage collection is implemented through the reference counting (which is the only known way tractable for a file system), then cycles are never collected. [...] But the garbage collection problem is still there. You are more than welcome to solve it by implementing generation mark-and-sweep GC on file system scale. :-) There are at least two choices: Bite the bullet and have a file system that is occasionally slow due to cycle checking, but only when the user somehow makes a huge cycle. Keep in mind that this only happens when you use the new functionality, if you only create files with one parent, it should be as fast as regular file systems. I see its features being useful for desktop use, not servers, so the occasional speed hit is less annoyance than the lack of features (the ability to file your files in several places). Another way is to not delete the files when they get unlinked. Similar to some other allocation management systems, have a background thread doing the garbage collection and cycle tracing. The drawback is that you might run out of disc space if you're creating files faster than the collector is cleaning up. I wonder if you can combine a wandering journal (or whatever it is called, where the journalled data blocks become the file's current contents) with the copy type garbage collection (is that the same as a 2 generation mark and sweep?). Copy type collection copies all known reachable objects to an empty half of the disk. When that's done, the original half is marked empty and the next pass copies in the other direction. Could work nicely if you have two disk drives. Yet another PhD topic on garbage collection for someone to research :-) There are lots of other garbage collection schemes that might be applicable to file systems with cycles. It could work, maybe with decent speed too! - Alex
Re: File as a directory - VFS Changes
Nikita Danilov wrote on Wed, 1 Jun 2005 14:58:47 +0400: For example: mv /d0 /d1 To check that this doesn't introduce a cycle one has to load each child of /d0 (which may be millions) and recursively check that from none of them /d1 is reachable. This has to be done on each rename. I believe this is unacceptable overhead. That's where we differ. I think it is an acceptable overhead. It also only happens on rename and delete operations for objects with multiple parents or descendants. If you just move or delete an ordinary file that's got just one parent directory and no children, the cost is ordinary too. If it's a fildirute object with a dozen attribute type things as children, then it will need to traverse those dozen children. Not a big deal. Consider this example: The typical worst case operation will be deleting a link to your photo from a directory you decided didn't classify it properly. The photo may be in several directories, such as Cottage, Aunt and Bottles if it is a picture of a champaign bottle you polished off at your aunt's cottage. You decide that it shouldn't really be in the Aunt folder, so you delete it (or rather the link) from there. The traversal starts with recursively finding all the children of the deleted object, which will include the photo and all attributish subobjects (thumbnail, description, ...). Not too bad, maybe a dozen objects. Then reconnect those children to objects which have a known good path to the root, reached through whatever parents remain. That path through the new link becomes their true path name. The photo goes first, finding one of the alternative parent directories, say Cottage as its new main parent. Then the other children find the Photo as their main parent. In other words, the cycle checker has to find all the children of the deleted object(s). In most cases there aren't very many of them. Now if you move the directory containing millions of files, then it's going to take a while. And if it has a hard link down to another directory, that gets traversed too. But that won't happen too often, only around spring time when you're reorganizing your mail archives. - Alex
Re: File as a directory - VFS Changes
Alexander G. M. Smith writes: Nikita Danilov wrote on Mon, 30 May 2005 15:00:52 +0400: Nothing in VFS prevents files from supporting both read(2) and readdir(3). The problem is with link(2): VFS assumes that directories form _tree_, that is, every directory has well-defined parent. At least that's one problem that's solveable. Just define one of the parents as the master parent directory, with a guaranteed path up to the root, and have the others as auxiliary parents. That also gives you a good path name to each and every file-thing. The VFS or the file system (depending on where the designers want to split the work) will still have to handle cycles in the graph to recompute the new master parents, when an old one gets deleted or moved. Cycle may consists of more graph nodes than fits into memory. Cycle detection is crucial for rename semantics, and if cycle-just-about-to-be-formed doesn't fit into memory it's not clear how to detect it, because tree has to be locked while checked for cycles, and one definitely doesn't want to keep such a lock over IO. - Alex Nikita.
Re: File as a directory - VFS Changes
Nikita Danilov wrote: Alexander G. M. Smith writes: Nikita Danilov wrote on Mon, 30 May 2005 15:00:52 +0400: Nothing in VFS prevents files from supporting both read(2) and readdir(3). The problem is with link(2): VFS assumes that directories form _tree_, that is, every directory has well-defined parent. At least that's one problem that's solveable. Just define one of the parents as the master parent directory, with a guaranteed path up to the root, and have the others as auxiliary parents. That also gives you a good path name to each and every file-thing. The VFS or the file system (depending on where the designers want to split the work) will still have to handle cycles in the graph to recompute the new master parents, when an old one gets deleted or moved. Cycle may consists of more graph nodes than fits into memory. There are pathname length restrictions already in the kernel that should prevent that, yes? Cycle detection is crucial for rename semantics, and if cycle-just-about-to-be-formed doesn't fit into memory it's not clear how to detect it, because tree has to be locked while checked for cycles, and one definitely doesn't want to keep such a lock over IO. - Alex Nikita.
Re: File as a directory - VFS Changes
Hello Hans, Hans Reiser writes: Nikita Danilov wrote: Alexander G. M. Smith writes: Nikita Danilov wrote on Mon, 30 May 2005 15:00:52 +0400: Nothing in VFS prevents files from supporting both read(2) and readdir(3). The problem is with link(2): VFS assumes that directories form _tree_, that is, every directory has well-defined parent. At least that's one problem that's solveable. Just define one of the parents as the master parent directory, with a guaranteed path up to the root, and have the others as auxiliary parents. That also gives you a good path name to each and every file-thing. The VFS or the file system (depending on where the designers want to split the work) will still have to handle cycles in the graph to recompute the new master parents, when an old one gets deleted or moved. Cycle may consists of more graph nodes than fits into memory. There are pathname length restrictions already in the kernel that should prevent that, yes? UNIX namespaces are not _that_ retarded. :-) int main(int argc, char **argv) { int i; for (i = 0; ; ++ i) { mkdir(foo, 0777); chdir(foo); if ((i % 1000) == 0) printf(%i\n, i); } return 0; } run it for a while, interrupt, and do $ find foo $ rm -frv foo Cycle detection is crucial for rename semantics, and if cycle-just-about-to-be-formed doesn't fit into memory it's not clear how to detect it, because tree has to be locked while checked for cycles, and one definitely doesn't want to keep such a lock over IO. - Alex Nikita.
Re: File as a directory - VFS Changes
On Tue, 31 May 2005 08:04:42 PDT, Hans Reiser said: Cycle may consists of more graph nodes than fits into memory. There are pathname length restrictions already in the kernel that should prevent that, yes? The problem is that although a *single* pathname can't be longer than some length, you can still create a cycle. Consider for instance a pathname restriction of 1024 chars. Filenames A, B, and C are all 400 characters long. A points at B, B points at C - and C points back to A. Also, although the set of inodes *in the cycle* fits in memory, the set of inodes *in the entire graph* that has to be searched to verify the presence of a cycle may not (in general, you have to be ready to examine *all* the inodes unless you can do some pruning (unallocated, provably un-cycleable, and so on)). THis is the sort of thing that you can afford to do in userspace during an fsck, but certainly can't do in the kernel on every syscall that might create a cycle... pgpdt2U5lIsqK.pgp Description: PGP signature
Re: File as a directory - VFS Changes
On Tue, 2005-05-31 at 12:30 -0400, [EMAIL PROTECTED] wrote: On Tue, 31 May 2005 08:04:42 PDT, Hans Reiser said: Cycle may consists of more graph nodes than fits into memory. There are pathname length restrictions already in the kernel that should prevent that, yes? The problem is that although a *single* pathname can't be longer than some length, you can still create a cycle. Consider for instance a pathname restriction of 1024 chars. Filenames A, B, and C are all 400 characters long. A points at B, B points at C - and C points back to A. Also, although the set of inodes *in the cycle* fits in memory, the set of inodes *in the entire graph* that has to be searched to verify the presence of a cycle may not (in general, you have to be ready to examine *all* the inodes unless you can do some pruning (unallocated, provably un-cycleable, and so on)). THis is the sort of thing that you can afford to do in userspace during an fsck, but certainly can't do in the kernel on every syscall that might create a cycle... You can avoid cycles by redefining the problem. Every file or data object has one single True Name which is their inode or OID. Each data object then has one or more names as properties. Names are either single strings with slash separators for directories, or each directory element is a unique object in an object list. Directories then become queries that return the set of objects holding that directory name. The query results are of course cached and updated whenever a name property changes. Now there are no cycles, although a naive Unix find program could get stuck in a loop. -- Jonathan Briggs [EMAIL PROTECTED] eSoft, Inc. signature.asc Description: This is a digitally signed message part
Re: File as a directory - VFS Changes
What happens when you unlink the True Name? Hans Jonathan Briggs wrote: You can avoid cycles by redefining the problem. Every file or data object has one single True Name which is their inode or OID. Each data object then has one or more names as properties. Names are either single strings with slash separators for directories, or each directory element is a unique object in an object list. Directories then become queries that return the set of objects holding that directory name. The query results are of course cached and updated whenever a name property changes. Now there are no cycles, although a naive Unix find program could get stuck in a loop.
Re: File as a directory - VFS Changes
Either that isn't allowed, or it immediately vanishes from all directories. If deleting by OID isn't allowed, then every name property must be removed in order to delete the file. Personally, I would allow deleting the OID. It would be a convenient way to be sure every instance of a file was deleted. On Tue, 2005-05-31 at 09:59 -0700, Hans Reiser wrote: What happens when you unlink the True Name? Hans Jonathan Briggs wrote: You can avoid cycles by redefining the problem. Every file or data object has one single True Name which is their inode or OID. Each data object then has one or more names as properties. Names are either single strings with slash separators for directories, or each directory element is a unique object in an object list. Directories then become queries that return the set of objects holding that directory name. The query results are of course cached and updated whenever a name property changes. Now there are no cycles, although a naive Unix find program could get stuck in a loop. -- Jonathan Briggs [EMAIL PROTECTED] eSoft, Inc. signature.asc Description: This is a digitally signed message part
Re: File as a directory - VFS Changes
Jonathan Briggs writes: On Tue, 2005-05-31 at 12:30 -0400, [EMAIL PROTECTED] wrote: On Tue, 31 May 2005 08:04:42 PDT, Hans Reiser said: Cycle may consists of more graph nodes than fits into memory. There are pathname length restrictions already in the kernel that should prevent that, yes? The problem is that although a *single* pathname can't be longer than some length, you can still create a cycle. Consider for instance a pathname restriction of 1024 chars. Filenames A, B, and C are all 400 characters long. A points at B, B points at C - and C points back to A. Also, although the set of inodes *in the cycle* fits in memory, the set of inodes *in the entire graph* that has to be searched to verify the presence of a cycle may not (in general, you have to be ready to examine *all* the inodes unless you can do some pruning (unallocated, provably un-cycleable, and so on)). THis is the sort of thing that you can afford to do in userspace during an fsck, but certainly can't do in the kernel on every syscall that might create a cycle... You can avoid cycles by redefining the problem. Every file or data object has one single True Name which is their inode or OID. Each data object then has one or more names as properties. Names are either single strings with slash separators for directories, or each directory element is a unique object in an object list. Directories then become queries that return the set of objects holding that directory name. The query results are of course cached and updated whenever a name property changes. Now there are no cycles, although a naive Unix find program could get stuck in a loop. Huh? Cycles are still here. Query D0 returns D1, query D1 returns D2, ... query DN returns D0. The problem is not in the mechanism used to encode tree/graph structure. The problem is in the limitations imposed by required semantics: (R) every object except some selected root is Reachable. (No leaks.) (G) unused objects are sooner or later discarded. (Garbage collection.) Neither requirement is compatible with cycles in the directory structure: - from (R) it follows that object can be discarded only if it empty (as a directory). All nodes in a cycle are not empty (because each of them contains at least a reference to the next one), and hence none of them can be ever removed; - if garbage collection is implemented through the reference counting (which is the only known way tractable for a file system), then cycles are never collected. Unless you are talking about a two-level naming scheme, where One True Names are visible to the user. In that case reachability problem evaporates, because manipulations with normal directory structure never make node unreachable---it is always accessible through its True Name. But the garbage collection problem is still there. You are more than welcome to solve it by implementing generation mark-and-sweep GC on file system scale. :-) Nikita.
Re: File as a directory - VFS Changes
Well,. if you allow multiple true names, then you start to resemble something I suggested a few years ago, in which I outlined a taxonomy of links, and suggested that some links would count towards the reference count and some would not. Of course, that does nothing for the cycle problem.. How are cycles handled for symlinks currently? Hans Jonathan Briggs wrote: Either that isn't allowed, or it immediately vanishes from all directories. If deleting by OID isn't allowed, then every name property must be removed in order to delete the file. Personally, I would allow deleting the OID. It would be a convenient way to be sure every instance of a file was deleted. On Tue, 2005-05-31 at 09:59 -0700, Hans Reiser wrote: What happens when you unlink the True Name? Hans Jonathan Briggs wrote: You can avoid cycles by redefining the problem. Every file or data object has one single True Name which is their inode or OID. Each data object then has one or more names as properties. Names are either single strings with slash separators for directories, or each directory element is a unique object in an object list. Directories then become queries that return the set of objects holding that directory name. The query results are of course cached and updated whenever a name property changes. Now there are no cycles, although a naive Unix find program could get stuck in a loop.
Re: File as a directory - VFS Changes
What about if we have it that only the first name a directory is created with counts towards its reference count, and that if the directory is moved if it is moved from its first name, the new name becomes the one that counts towards the reference count? A bit of a hack, but would work. Hans Nikita Danilov wrote: Jonathan Briggs writes: On Tue, 2005-05-31 at 12:30 -0400, [EMAIL PROTECTED] wrote: On Tue, 31 May 2005 08:04:42 PDT, Hans Reiser said: Cycle may consists of more graph nodes than fits into memory. There are pathname length restrictions already in the kernel that should prevent that, yes? The problem is that although a *single* pathname can't be longer than some length, you can still create a cycle. Consider for instance a pathname restriction of 1024 chars. Filenames A, B, and C are all 400 characters long. A points at B, B points at C - and C points back to A. Also, although the set of inodes *in the cycle* fits in memory, the set of inodes *in the entire graph* that has to be searched to verify the presence of a cycle may not (in general, you have to be ready to examine *all* the inodes unless you can do some pruning (unallocated, provably un-cycleable, and so on)). THis is the sort of thing that you can afford to do in userspace during an fsck, but certainly can't do in the kernel on every syscall that might create a cycle... You can avoid cycles by redefining the problem. Every file or data object has one single True Name which is their inode or OID. Each data object then has one or more names as properties. Names are either single strings with slash separators for directories, or each directory element is a unique object in an object list. Directories then become queries that return the set of objects holding that directory name. The query results are of course cached and updated whenever a name property changes. Now there are no cycles, although a naive Unix find program could get stuck in a loop. Huh? Cycles are still here. Query D0 returns D1, query D1 returns D2, ... query DN returns D0. The problem is not in the mechanism used to encode tree/graph structure. The problem is in the limitations imposed by required semantics: (R) every object except some selected root is Reachable. (No leaks.) (G) unused objects are sooner or later discarded. (Garbage collection.) Neither requirement is compatible with cycles in the directory structure: - from (R) it follows that object can be discarded only if it empty (as a directory). All nodes in a cycle are not empty (because each of them contains at least a reference to the next one), and hence none of them can be ever removed; - if garbage collection is implemented through the reference counting (which is the only known way tractable for a file system), then cycles are never collected. Unless you are talking about a two-level naming scheme, where One True Names are visible to the user. In that case reachability problem evaporates, because manipulations with normal directory structure never make node unreachable---it is always accessible through its True Name. But the garbage collection problem is still there. You are more than welcome to solve it by implementing generation mark-and-sweep GC on file system scale. :-) Nikita.
Re: File as a directory - VFS Changes
On Tue, 2005-05-31 at 15:01 -0600, Jonathan Briggs wrote: I should create an example. Wherever I used True Name previously, use OID instead. True Name was simply another term for a unique object identifier. Three files with OIDs of 1001, 1002, and 1003. Object 1001: name: /tmp/A/file1 name: /tmp/A/B/file1 name: /tmp/A/B/C/file1 Object 1002: name: /tmp/A/file2 Object 1003: name: /tmp/A/B/file3 Three query objects (directories) with OIDs of 1, 2, and 3. Object 1: name: /tmp/A name: /tmp/A/B/C/A query: name begins with /tmp/A/ query result cache: B-2, file1-1001, file2-1002 Object 2: name: /tmp/A/B query: name begins with /tmp/A/B/ query result cache: C-3, file1-1001, file3-1003 Object 3: name: /tmp/A/B/C query: name begins with /tmp/A/B/C/ query result cache: A-1, file1-1001 Now there is a A - B - C - A directory loop. But removing name: /tmp/A/B/C/A from Object 1 fixes the loop. Deleting Object 1 also fixes the loop. Deleting any of Object 1, 2 or 3 does not affect any other object, because in this scheme, directory objects do not need to actually exist: they are just queries that return objects with certain names. I forgot to address Nikita's point about reclaiming lost cycles. In this case, let me create Object 4 for /tmp Object 4: name: /tmp query: name begins with /tmp/ query result cache: A-1 Now, if we delete Object 4, are Objects 1,2,3 lost? I would say not because they still have names. When the shell calls chdir(/tmp) a new query object (directory) must be created dynamically, and Objects 1001,1002,1003 still have their names that start with /tmp and so they immediately appear again. Their names still start with /, so the top level query will still find them and /tmp as well. Therefore, the cycle is never detached and lost. -- Jonathan Briggs [EMAIL PROTECTED] eSoft, Inc. signature.asc Description: This is a digitally signed message part
Re: File as a directory - VFS Changes
Jonathan Briggs writes: On Tue, 2005-05-31 at 15:01 -0600, Jonathan Briggs wrote: I should create an example. Wherever I used True Name previously, use OID instead. True Name was simply another term for a unique object identifier. Three files with OIDs of 1001, 1002, and 1003. Object 1001: name: /tmp/A/file1 name: /tmp/A/B/file1 name: /tmp/A/B/C/file1 Object 1002: name: /tmp/A/file2 Object 1003: name: /tmp/A/B/file3 Three query objects (directories) with OIDs of 1, 2, and 3. Object 1: name: /tmp/A name: /tmp/A/B/C/A query: name begins with /tmp/A/ query result cache: B-2, file1-1001, file2-1002 Object 2: name: /tmp/A/B query: name begins with /tmp/A/B/ query result cache: C-3, file1-1001, file3-1003 Object 3: name: /tmp/A/B/C query: name begins with /tmp/A/B/C/ query result cache: A-1, file1-1001 Now there is a A - B - C - A directory loop. But removing name: /tmp/A/B/C/A from Object 1 fixes the loop. Deleting Object 1 also fixes the loop. Deleting any of Object 1, 2 or 3 does not affect any other object, because in this scheme, directory objects do not need to actually exist: they are just queries that return objects with certain names. One problem with the above is that directory structure is inconsistent with lists of names associated with objects. For example, file1 is a child of /tmp/A/B/C/A, but Object 1001 doesn't list /tmp/A/B/C/A/file1 among its names. I forgot to address Nikita's point about reclaiming lost cycles. In this case, let me create Object 4 for /tmp Object 4: name: /tmp query: name begins with /tmp/ query result cache: A-1 Now, if we delete Object 4, are Objects 1,2,3 lost? I would say not because they still have names. When the shell calls chdir(/tmp) a new query object (directory) must be created dynamically, and Objects 1001,1002,1003 still have their names that start with /tmp and so they immediately appear again. Their names still start with /, so the top level query will still find them and /tmp as well. Object 4 is /tmp. Once it was removed what does it _mean_ for, say, Object 1003 to have a name /tmp/A/B/file3? What is /tmp bit there? Just a string? If so, and your directories are but queries, what does it mean for directory to be removed? How mv /tmp/A /tmp/A1 is implemented? By scanning whole file system and updating leaf name-lists? It seems that what you are proposing is a radical departure from file system namespace as we know it. :-) In your scheme all structural information is encoded in leaves _only_, and directories just do some kind of pattern matching. This is closer to a relational database than to the current file-systems where directories are the only source of the structural inform
Re: File as a directory - VFS Changes
On Wed, 2005-06-01 at 02:36 +0400, Nikita Danilov wrote: Jonathan Briggs writes: On Tue, 2005-05-31 at 15:01 -0600, Jonathan Briggs wrote: I should create an example. Wherever I used True Name previously, use OID instead. True Name was simply another term for a unique object identifier. Three files with OIDs of 1001, 1002, and 1003. Object 1001: name: /tmp/A/file1 name: /tmp/A/B/file1 name: /tmp/A/B/C/file1 Object 1002: name: /tmp/A/file2 Object 1003: name: /tmp/A/B/file3 Three query objects (directories) with OIDs of 1, 2, and 3. Object 1: name: /tmp/A name: /tmp/A/B/C/A query: name begins with /tmp/A/ query result cache: B-2, file1-1001, file2-1002 Object 2: name: /tmp/A/B query: name begins with /tmp/A/B/ query result cache: C-3, file1-1001, file3-1003 Object 3: name: /tmp/A/B/C query: name begins with /tmp/A/B/C/ query result cache: A-1, file1-1001 Now there is a A - B - C - A directory loop. But removing name: /tmp/A/B/C/A from Object 1 fixes the loop. Deleting Object 1 also fixes the loop. Deleting any of Object 1, 2 or 3 does not affect any other object, because in this scheme, directory objects do not need to actually exist: they are just queries that return objects with certain names. One problem with the above is that directory structure is inconsistent with lists of names associated with objects. For example, file1 is a child of /tmp/A/B/C/A, but Object 1001 doesn't list /tmp/A/B/C/A/file1 among its names. file1 *appears* to be a child because it is actually returned as the query result for its name of /tmp/A/file1 because A is a query for /tmp/A/. If the shell was smart enough to normalize its path by asking the directory for its name, it would know that /tmp/A/B/C/A was /tmp/A. But yes, a stupid program could be confused by the difference between names. I forgot to address Nikita's point about reclaiming lost cycles. In this case, let me create Object 4 for /tmp Object 4: name: /tmp query: name begins with /tmp/ query result cache: A-1 Now, if we delete Object 4, are Objects 1,2,3 lost? I would say not because they still have names. When the shell calls chdir(/tmp) a new query object (directory) must be created dynamically, and Objects 1001,1002,1003 still have their names that start with /tmp and so they immediately appear again. Their names still start with /, so the top level query will still find them and /tmp as well. Object 4 is /tmp. Once it was removed what does it _mean_ for, say, Object 1003 to have a name /tmp/A/B/file3? What is /tmp bit there? Just a string? If so, and your directories are but queries, what does it mean for directory to be removed? How mv /tmp/A /tmp/A1 is implemented? By scanning whole file system and updating leaf name-lists? Well, the name doesn't mean anything. :-) It is just a convenient metadata for describing where to find the file in a hierarchy, and for Unix compatibility. If a directory was removed by a standard rm -rf, it would work as expected because it would descend the tree removing names (unlink) from each object it found. Moving an object with mv would change its name. Moving a top-level directory like /usr would require visiting every object starting with /usr and doing an edit. A compression scheme could be used where the most-used top-level directory names were replaced with lookup tables, then /usr could be renamed just once in the table. It seems that what you are proposing is a radical departure from file system namespace as we know it. :-) In your scheme all structural information is encoded in leaves _only_, and directories just do some kind of pattern matching. This is closer to a relational database than to the current file-systems where directories are the only source of the structural inform Yes. :-) It is radical, and the idea is taken from databases. I thought that seemed to be the direction Reiser filesystems were moving. In this scheme a name is just another bit of metadata and not first-class important information. The name-query directories would be there for traditional filesystem users and Unix compatibility. They would probably be virtual and dynamic, only being created when needed and only being persistent if assigned meta-data (extra names (links), non-default permission bits, etc) or for performance reasons (faster to load from cache than searching every file). -- Jonathan Briggs [EMAIL PROTECTED] eSoft, Inc. signature.asc Description: This is a digitally signed message part
Re: File as a directory - VFS Changes
Nikita Danilov wrote on Tue, 31 May 2005 13:34:55 +0400: Cycle may consists of more graph nodes than fits into memory. Cycle detection is crucial for rename semantics, and if cycle-just-about-to-be-formed doesn't fit into memory it's not clear how to detect it, because tree has to be locked while checked for cycles, and one definitely doesn't want to keep such a lock over IO. Sometimes you'll just have to return an error code if the rename operation is too complex to be done. The user will have to then delete individual leaf files to make the situation simpler. I hope this won't happen very often. On the plus side, the detection of all the files that may be affected means you can now delete a directory directly, contents and all, if all the related inodes fit into memory. - Alex
Re: File as a directory - VFS Changes
I think what Alex is suggesting below is reasonable and something resembling it should be done, though I will not go into details on it until we have some working code Hans Alexander G. M. Smith wrote: [EMAIL PROTECTED] wrote on Sat, 28 May 2005 15:42:35 -0400: I'm not Hans, but I *will* ask How much of this is *rationally* doable without some help from the VFS?. At the very least, some of this stuff will require the FS to tell the VFS to suspend its disbelief (for starters, doing this without confusing the VFS's concepts of dentries/inodes/reference counts is going to be interesting... :) Good point. One way would be to cram it into the existing VFS (the operating system's interface to file systems) as directories representing the objects, containing a specially named file for the raw data, mixed in with child items and symbolic links to parent objects. Some inodes would be fake ones, geneated as needed to represent the old style view of the file / directory / attribute thing (such as the parent symbolic links). But what would I (Hans likely has other views) like to see in a new VFS to support files / directories / attributes all being the same kind of object? I'll talk about the user level API view of the VFS, rather than the flip side for file systems or the gritty VFS internals, since it doesn't need to be Linux specific. For one, it would be almost the same as the existing VFS. But when you open a fildirute-thing, you can use the same file handle to read and write its data and to list its children. Thus open() and opendir() are combined into plain open(). It takes a conventional hierarchical path (or later some of Hans Reiser's more sophisticated namespaces?). Returns a file handle. The resulting file handle can be used with read(), write(), seek(), readdir(), rewinddir() and the rest of the usual directory and file basic operations. And of course, close() it when you're done. Stat() would disappear. All the miscellaneous stat data would be stored as sub-files, things like the date last modified, access permissions and so on. There would be a standard filename and file type for those metadata subfiles to distinguish them from user created subfiles (such as file/.meta.last_modified). That also makes it easier to add new kinds of metadata. And that's about it for the basics. Standard utilities, like ls would have to be changed to use the new object structure - listing the contents of a thing and avoiding recursion down paths that lead to parent objects (just like ls currently avoids listing .. recursively). That may involve more work than the kernel changes! I'd add a multi-read function to replace stat(). Give it a list of sub-file names to read and it returns their names and contents in a packed list (like a dirent structure). That way bulk reading date stamps, permissions and other attributish small metadata as subfiles won't have as much overhead as opening then individually. Particularly if under the hood they are stored as fields in the file's inode rather than as totally separate files (this is what BeOS's BFS does for small attributes). Though conceptually you treat them as separate subfiles. I'd also like to add indexing. That could be done by creating a magic directory with an associated file type to index. Then whenever a file with that file type is changed, the index is updated using the file's contents as the key, and a link to the file as the value. The file type also implies the interpretation of the values for sorting purposes - as strings, binary numbers, etc. Unlike BeOS, I'd expose the indices directly (appearing as a directory full of hard links) and have query languages implemented in userland libraries that make use the indices, rather than as part of the file system. Now should indices be system wide and maintained by the VFS, or per-volume and maintained by the file system? How about indices for things on network drives? Things on public web sites for a web-view file system? I'd also like to add change notification. If a file system object's child list changes, then a notification message gets sent to interested listeners. Similarly for an object's data content change. BeOS had useful notifications for live changes to a query - I'd punt this to the userland query library and have it build on the change notifications from an index directory. The VFS and other parts of the OS would need to support change notification (BeOS used inter-process message queues). Can a file-as-directory system fit into Linux, or some other OS? I expect that it will only happen if the new system also exposes a backwards compatible view for old software, using the old APIs. After that's done, the first big user program that needs to be updated is the desktop file browser. Once there's a good GUI for browsing file-as-directory file systems, the general public might become more aware of their advantages (easily drilling down inside files to
Re: File as a directory - VFS Changes
Alexander G. M. Smith writes: [EMAIL PROTECTED] wrote on Sat, 28 May 2005 15:42:35 -0400: I'm not Hans, but I *will* ask How much of this is *rationally* doable without some help from the VFS?. At the very least, some of this stuff will require the FS to tell the VFS to suspend its disbelief (for starters, doing this without confusing the VFS's concepts of dentries/inodes/reference counts is going to be interesting... :) Good point. One way would be to cram it into the existing VFS (the operating system's interface to file systems) as directories representing the objects, containing a specially named file for the raw data, mixed in with child items and symbolic links to parent objects. Some inodes would be fake ones, geneated as needed to represent the old style view of the file / directory / attribute thing (such as the parent symbolic links). But what would I (Hans likely has other views) like to see in a new VFS to support files / directories / attributes all being the same kind of object? I'll talk about the user level API view of the VFS, rather than the flip side for file systems or the gritty VFS internals, since it doesn't need to be Linux specific. For one, it would be almost the same as the existing VFS. But when you open a fildirute-thing, you can use the same file handle to read and write its data and to list its children. This is doable with the current VFS. Thus open() and opendir() are combined into plain open(). It takes a conventional hierarchical path (or later some of Hans Reiser's more sophisticated namespaces?). Returns a file handle. opendir(3) is user level function. It calls open(2) system call. telldir(3) and seekdir(3) also are functions that call lseek(2) under the hood. The resulting file handle can be used with read(), write(), seek(), readdir(), rewinddir() and the rest of the usual directory and file basic operations. And of course, close() it when you're done. Nothing in VFS prevents files from supporting both read(2) and readdir(3). The problem is with link(2): VFS assumes that directories form _tree_, that is, every directory has well-defined parent. Stat() would disappear. All the miscellaneous stat data would be stored as sub-files, things like the date last modified, access permissions and so on. There would be a standard filename and file type for those metadata subfiles to distinguish them from user created subfiles (such as file/.meta.last_modified). That also makes it easier to add new kinds of metadata. And that's about it for the basics. Problem with that is that in /etc/passwd/..foo-meta-thing /etc/passwd is both regular (possibly with multiple names), and directory at the same time, which is problem for VFS, see above. Read Documentation/filesystems/directory-locking and imagine the following: $ touch a $ ln a b $ mv a/..uid b/..uid (and yes, rename had to lock parent directories _before_ ever calling into file system back-end, so reiser4 code cannot somehow magically hint VFS that a and b are to be treated in a special way). Nikita.
Re: File as a directory - VFS Changes
Nikita Danilov wrote on Mon, 30 May 2005 15:00:52 +0400: Nothing in VFS prevents files from supporting both read(2) and readdir(3). The problem is with link(2): VFS assumes that directories form _tree_, that is, every directory has well-defined parent. At least that's one problem that's solveable. Just define one of the parents as the master parent directory, with a guaranteed path up to the root, and have the others as auxiliary parents. That also gives you a good path name to each and every file-thing. The VFS or the file system (depending on where the designers want to split the work) will still have to handle cycles in the graph to recompute the new master parents, when an old one gets deleted or moved. - Alex
Re: File as a directory - VFS Changes
[EMAIL PROTECTED] wrote on Sat, 28 May 2005 15:42:35 -0400: I'm not Hans, but I *will* ask How much of this is *rationally* doable without some help from the VFS?. At the very least, some of this stuff will require the FS to tell the VFS to suspend its disbelief (for starters, doing this without confusing the VFS's concepts of dentries/inodes/reference counts is going to be interesting... :) Good point. One way would be to cram it into the existing VFS (the operating system's interface to file systems) as directories representing the objects, containing a specially named file for the raw data, mixed in with child items and symbolic links to parent objects. Some inodes would be fake ones, geneated as needed to represent the old style view of the file / directory / attribute thing (such as the parent symbolic links). But what would I (Hans likely has other views) like to see in a new VFS to support files / directories / attributes all being the same kind of object? I'll talk about the user level API view of the VFS, rather than the flip side for file systems or the gritty VFS internals, since it doesn't need to be Linux specific. For one, it would be almost the same as the existing VFS. But when you open a fildirute-thing, you can use the same file handle to read and write its data and to list its children. Thus open() and opendir() are combined into plain open(). It takes a conventional hierarchical path (or later some of Hans Reiser's more sophisticated namespaces?). Returns a file handle. The resulting file handle can be used with read(), write(), seek(), readdir(), rewinddir() and the rest of the usual directory and file basic operations. And of course, close() it when you're done. Stat() would disappear. All the miscellaneous stat data would be stored as sub-files, things like the date last modified, access permissions and so on. There would be a standard filename and file type for those metadata subfiles to distinguish them from user created subfiles (such as file/.meta.last_modified). That also makes it easier to add new kinds of metadata. And that's about it for the basics. Standard utilities, like ls would have to be changed to use the new object structure - listing the contents of a thing and avoiding recursion down paths that lead to parent objects (just like ls currently avoids listing .. recursively). That may involve more work than the kernel changes! I'd add a multi-read function to replace stat(). Give it a list of sub-file names to read and it returns their names and contents in a packed list (like a dirent structure). That way bulk reading date stamps, permissions and other attributish small metadata as subfiles won't have as much overhead as opening then individually. Particularly if under the hood they are stored as fields in the file's inode rather than as totally separate files (this is what BeOS's BFS does for small attributes). Though conceptually you treat them as separate subfiles. I'd also like to add indexing. That could be done by creating a magic directory with an associated file type to index. Then whenever a file with that file type is changed, the index is updated using the file's contents as the key, and a link to the file as the value. The file type also implies the interpretation of the values for sorting purposes - as strings, binary numbers, etc. Unlike BeOS, I'd expose the indices directly (appearing as a directory full of hard links) and have query languages implemented in userland libraries that make use the indices, rather than as part of the file system. Now should indices be system wide and maintained by the VFS, or per-volume and maintained by the file system? How about indices for things on network drives? Things on public web sites for a web-view file system? I'd also like to add change notification. If a file system object's child list changes, then a notification message gets sent to interested listeners. Similarly for an object's data content change. BeOS had useful notifications for live changes to a query - I'd punt this to the userland query library and have it build on the change notifications from an index directory. The VFS and other parts of the OS would need to support change notification (BeOS used inter-process message queues). Can a file-as-directory system fit into Linux, or some other OS? I expect that it will only happen if the new system also exposes a backwards compatible view for old software, using the old APIs. After that's done, the first big user program that needs to be updated is the desktop file browser. Once there's a good GUI for browsing file-as-directory file systems, the general public might become more aware of their advantages (easily drilling down inside files to attach a description subfile or add a bunch of MP3 tags, magic query directories and indexing to find things quickly, multiple parents to put the same file in multiple folders without the breakability of symbolic links