Re: File as a directory - file-as-dir vs. link-dirs (again) - 3/3
On 11/21/05, Hubert Chan [EMAIL PROTECTED] wrote: [snip] A few points - things should not be named mumble or thingy -- things should be named descriptively (obviously -- of course, people don't always follow these obvious rules) Well, sure. But even with the best efforts to choose clear and descriptive names, the meaning of every name will not always be clear to every person, never mind to programs. For 'mumble' read '[mumble]'. - the user should be in charge of how he/she organizes the data, and so he/she should pick the names that he/she wants to use. It shouldn't be mandated. And anyone else who uses the data should have a reasonable expectation to have any confusing things documented. At least database schemas are usually documented for those people who need to use them. Documentation is good and important, but for all sorts of good and bad reasons few filesystems will ever have good, up-to-date documentation handy for every pathname. And, as Future Vision says, no-one has the time or the inclination to study up on the format of every database they might want to use. People like to learn by exploration, even when they do also read the docs. And a shell utility can't use human-readable documentation any more than it can apply human Unix lore or common sense to interpreting the filesystem. To such a program, every segment name is akin to 'mumble'; without some additional information, it can't tell that /usr/bin is not an instance of the relation /usr . Which brings us back to semantics. I've compared the filesystem interface to an ADT. But if that were all it is, if there were no conventions about how to interpret a pathname, then it /would/ be necessary to get out the manual and read up on the meaning of every new pathname you come across, because you would almost never be able to infer anything about its meaning from looking at it. But the filesystem isn't (just) a persistent-storage data structure or ADT; it's a language through which both people and programs communicate. I've described the semantics of that language before - '/usr/bin' is a predicate which is asserted of all and only the opaque descendants of /usr/bin, '/usr' is a predicate which is asserted of all and only the opaque descendants of /usr, '/usr/passwd' is a predicate which is asserted of /usr/passwd, etc. etc. ad nauseam. So if I come across /foo/bar which links to a non-directory file then I know that the predicate '/foo/bar' is asserted of that file (and that file only). Even if 'foo' and/or 'bar' is mysterious to me, I already know a good deal about the intended meaning of this bit of the filesystem, and I can use what I know to help me deduce the meanings of mysterious name-segments. (To plag^H^H^H^Hparaphrase one David Moser, imagine walking into an office and seeing a Post-It note stuck on the side of something. Even if the note contains many nonslarkish English flutzpahs, you can glork much more of its pluggandisp than if it were scríofa i dteanga éigin eile.) Having this language also means that even programs which never know the meaning of any name-segment can extract useful information from pathnames in virtue of their form. For example, listing the common attributes of two files is a matter of listing the intersection of their pathnames. But if we start using directories to assert relations as well as predicates without distingushing those directories which assert instances of a relation, then we make every sentence in the language ambiguous. Now any given full pathname /might/ assert a predicate, or it might assert an instance of a relation instead. (Or in fact an instance of any one of several relations, since '/foo:bar/baz', '/foo/bar:baz', and '/foo:bar:baz' all ambiguate to '/foo/bar/baz' .) Such an ambiguous language is much less useful. Before, for example, it took a simple shell command to find the predicates asserted of a file. When the ambiguity is introduced, that simple operation becomes an exercise in manual-reading and guesswork. Speaking of databases, if you ask someone like C.J. Date what the most important feature of the relational database is, he won't talk about view-construction or even ACID properties. He certainly won't say anything about performance. The answer he will give you is well-defined semantics. While a subgraph of a network database is basically just a bit of persistent-storage data-structure whose meaning can only be discerned by reading the documentation, a table in an RDB can (must) always be understood as expressing the present instances of some relation. - you can still have some sort of marker to indicate what role each part of the name takes (e.g. the delimiter to indicate pseudofiles). Or you can use a special naming convention (e.g. tuples have a special prefix). But I think that trying to introduce a new delimiter that does basically the same thing as '/' is going to cause a lot of problems. (See Rob Pike's paper, The Hideous Name,
Re: File as a directory - file-as-dir vs. link-dirs (again) - 3/3
On 11/21/05, Hubert Chan [EMAIL PROTECTED] wrote: [snip] As a completely different side issue, I think that using random names such as aardvark or zebra to refer to tuples is a bad idea (and I know this isn't part of your proposal). If you use things that are real words, people will get confused, since they will try to associate meaning to something that doesn't have meaning. (e.g. why is my relationship called dodo, while Bob's is called tiger?) I think that it's best to just assign random meaningless strings, so that people will know that they are meaningless. (This addresses the issue with last name-segments of link-directories which I said in part 3 that I'd get back to.) Meaningless name segments are annoying to those who know or guess that they're meaningless. Worse, they're misleading to those people and programs that don't. (After all, they amount to making up spurious information.) The ideal solution is to throw them away completely: having anonymous final segments allows two non-directory files having the predicates '/foo/aardvark' and '/foo/zebra' to both simply be '/foo'. It has some slightly weird effects, though; for example, when '/foo/aardvark:bar' and '/foo/zebra:bar' both become '/foo/:bar', what happens to cd /foo/:bar ? One solution is to dynamically generate filler text for anonymous final segments whenever text is necessary (in a POSIX legacy interface, for example), based on the linked file's inode and that of the volume; the format of the filler text should allow programs (and humans) to detect it as such when they can't find out some other, out-of-band, way. [snip] Leo. -- Leo Richard Comerford - http://www.st-and.ac.uk/~lrc1 - accept no namesakes :)
The filesystem plus browser as Agenda/Chandler
Speaking of handling email, wanting maximum power, and (not) using databases which aren't integrated into the filesystem namespace: note that the thin GUI on top of a filesystem of predicate- and link-directories with pathname-listing is, among other things, Chandler (the promised son-of-Lotus-Agenda) done right. In particular, see http://blogs.osafoundation.org/mitch/92.html . attributes and their values-- /foo/bar/baz relationships to other items -- /foo/fum:aaardvark/foo etc. payload-- file body show views of [things] organized by [foo] -- ls foo (note no distinction between d-to-m and m-to-d metadata) The other mappings are left as an exercise: they're astonishingly (and amusingly) precise. Leo. -- Leo Richard Comerford - http://www.st-and.ac.uk/~lrc1 - accept no namesakes :)
Re: File as a directory - file-as-dir vs. link-dirs (again) - 3/3
On 11/19/05, Alexander G. M. Smith [EMAIL PROTECTED] wrote: Leo Comerford wrote on Fri, 18 Nov 2005 03:42:50 +: [^.*$] Just a few points I thought of while reading through your text: Genealogy is an extremely structured arrangement of data, most people won't be doing something that complex - think of photo filing instead. Also cycles exist everywhere, even in genealogy. So cycles should be supported by default. Cycles? Sure. Hence all the quibbling about the generic tree operators actually being rooted directed graph operators. (In the case of geneaology, parent-child digraphs are acyclic unless they involve Beeblebroxes, though they're *shudder* not invariably /trees/.) It's true that a lot of file metadata doesn't involve recursive structure or suchlike, but you don't have to get very advanced or esoteric to find file metadata that does. Expose your email's Reply-to: information as file metadata and you have recursive structure. (Sidebar: Actually, you can think of the process of improving the filesystem - large parts of that process, anyway - simply as the process of fixing email. Every email ought to be an ordinary file like any other document; instead emails live as entries in spool files around which mail agents perform ritual lockfile dances. A filesystem which handles smaller files more efficiently allows us to store each email as a file, but you have to keep those files in a Maildir and continue observing special safe-access rituals. Introduce transactions and you can just put your emails into ordinary folders, but everyone knows by now that just putting each of your emails into one folder or another is a completely inadequate way of organising them. Introduce pathname-listing and every email can usefully be in several directories at once; this means that we can indicate all the categories and labels we put on our emails by just putting them in directories instead of having to use special data formats understandable only to email clients (or worse, only to one email client). But there is still the (meta)data in the email headers themselves - we don't want to have to either duplicate or ignore it in our directory metadata. So we use mount() to expose persistent queries on the header data as directories.) You had a separate directory storing relationship links. How about making that a subdirectory of the person? If I wanted to do genealogy-as-a-file-system, I'd have a children subdirectory under the person; it would contain hard links to all the person's children. If you want to find a person's mother or father, examine the list of their parent directories (a cyclic file system has more parents than just ..) to find the ones called children. The person's parents are the holders of those children directories. You can use the same one-to-many approach with link-directories: instead of creating a link directory for each (biological) parent-child pair, create one for each parent which links it to all its children. (Having the individual link-directories is better in one way: it's safe to go from the one-to-one to the one-to-many form without context knowledge, but not /vice versa/. In the case of the parent-child relationship, if a person is a parent to a bunch of children then that person is individually a parent to them all. But if a person hasLessMoneyThan a bunch of people, then (s)he may not have less money than any one of them.) The (biological) parent-child relationship is kind to subfile metadata again here: the one-to-many link makes the (biological) parent an even more obvious candidate to go on top. Problem five from part two is as strong as ever, on the other hand: there's no way for a program to tell without context knowledge that bob;children asserts a relationship between bob and several other files, rather than asserting some attributes of bob or describing some subparts of it. I wouldn't worry about naming conflicts (such as children being a magic name) since most people only define a few dozen relationships, at least in BeOS. [snip] If the names are only being created and used by one person, then yes, conflicts are likely to be rare and easily dealt with by hand. But if you have more than one person involved, and especially if you are trying to use different bundles of names created independently by different groups, then people will soon resort to the usual defensive practises used in package naming. Since any application could potentially define and use a bunch of its own subfile names just as it can create several (ordinary) directories, this situation will arise as soon as subfile names become popular. [snip] So to sum up, it seems that you're way more power hungry than I. I just want something to make finding photos easier, not a whole database equivalent system (I'd use a database for that). Early versions of BeOS did use a database as the file system, which turned out to be more trouble than it was worth. A file
Re: File as a directory - file-as-dir vs. link-dirs (again) - 3/3
(Apologies for email snafu - hit the wrong button.) Difference 2 can also be left aside for now. 3) Say we just used standard directories to indicate relations. Then if a user comes across these names - /(something)/father-son /(something)/father-son/aardvark /(something)/father-son/aardvark/son /(something)/father-son/aardvark/father - while browsing the filesystem she can use her common sense to guess that /(something)/father-son expresses a father-son relation. 'father-son' likely suggests a relationship between fathers and sons, and indeed /(something)/father-son/aardvark has children called father and son. Pretty obvious. Similarly, if she comes across /usr/bin /usr/bin/alpha /usr/bin/bravo [etc. etc.] , if she is familiar with Unix she will know that /usr/bin/ indicates user binaries. Every (non-directory) file in /usr/bin/ isA '/usr/bin', a user binary. (Also, common sense might suggest that /usr/bin/ has so many children that it's unlikely to be one giant relationship.) But what if she comes across /(something)/mumble /(something)/mumble/thingy/alpha /(something)/mumble/thingy/bravo ? Is alpha a '/(something)/mumble/thingy', or is it in the 'alpha' role of a '/(something)/mumble' relationship with another file? (Or it could be in the 'thingy/alpha' role of a '/(something)' relationship.) This matters a lot. The distinction between being a foo and being a party in a foo relationship is clear and important. For example, there is a big difference between being a marriage and being a married person. So we need to know which one is meant. What's more, we need to be *told*, because the other two solutions - guessing and knowing already - aren't good enough. The person who created /(something)/mumble/thingy would be able to tell us that if only she had some way of indicating to us that we should interpret /(something)/mumble/thingy as an instance of a relation. And that is (to a first approximation) all that link-directories are - directories with a simple binary flag set at creation time to indicate how they should be interpreted. Leo. -- Leo Richard Comerford - http://www.st-and.ac.uk/~lrc1 - accept no namesakes :)
Re: File as a directory - file-as-dir vs. link-dirs (again) - 3/3
On 11/19/05, Hubert Chan [EMAIL PROTECTED] wrote: P.S. your relational model can easily be expressed using file-as-dir (well, actually, just standard directories): /(something)/father-son/aardvark/father is a symlink to '/(whatever)/portrait/Mike') /(something)/father-son/aardvark/son is a symlink to '/(whatever)/portrait/Bob') Yes absolutely. Yes, my relational model *does* uses standard directories, with three differences. 1) foofs's internal implementation of link-directories and other directories might be different. Or it might not. Entirely unimportant at this level. 2) gc might treat some link-directories differently to predicate-directories. (If Bob and Mike have been deleted, I don't want /(something)/father-son/aardvark/ lying around.) 3) -- Hubert Chan [EMAIL PROTECTED] - http://www.uhoreg.ca/ PGP/GnuPG key: 1024D/124B61FA Fingerprint: 96C5 012F 5F74 A5F7 1FF7 5291 AF29 C719 124B 61FA Key available at wwwkeys.pgp.net. Encrypted e-mail preferred. -- Leo Richard Comerford - http://www.st-and.ac.uk/~lrc1 - accept no namesakes :)
Re: File as a directory - file-as-dir vs. link-dirs (again) - 1/3
Once again, I have to apologise for a stupidly long and stupidly late reply. I've tried to make this thing a little more digestible by chopping it into three chunks. In order to keep any replies together, I suggest that people reply to the third part unless the reply is very specific to one of the other parts. This first part is (I hope) relatively fun. First of all: I'll refer to 'relation-directories' as 'link-directories' from now on; the new term should be more enlightening and less misleading. (Sorry if the change causes any temporary confusion.) Again, each link-directory expresses one instance of a relation; in RDB terms that's one tuple of a relation or one row of a table, while in OO theory terms it's one link of a relation. (In fact that's not completely and invariably true, because of the weakly-typed nature of link-dirs.) The directory which (by definition) has as its children every link-directory of a given type is *not* a link-directory. (It is an ordinary predicate-directory.) In RDB terms it is the table, and its children are its rows. In OO terms it is the relation (which makes it a class) and its children are the links of that relation (the objects which are instances of the relation). Second, in the coming examples, assume that the present working directory can be set to any name, those of ordinary atomic files as well as those of link- and predicate-directories. This isn't essential to anything that follows, but it does make things more tidy. The ability to list the pathnames of a given file makes it useful to have the pwd point to an atomic file: a command, say $ ls -P , can list (some of) the parents of the current file, whether or not it is a directory. The change also creates consistency with link-directories, which are non-(predicate-)directory files that can be the target of the pwd. On 5/28/05, Alexander G. M. Smith [EMAIL PROTECTED] wrote: Leo Comerford wrote on Wed, 18 May 2005 12:50:38 +0100: But if you have relation-directories and the ability to find the pathnames of a given file, you can do everything you can do with subfiles, just as nicely, and more besides. And if subfiles are completely redundant and bad news anyway, we shouldn't have them. I prefer subfiles (or fildirutes) as being easier to understand. But maybe that's just due to lots of experience with using file hierarchies. I can see having a relational system, but I'd always want to also have a directory hierarchy namespace, so that all files can be named. Having those relationship directories seems kind of clunky - since they're not located near the object being investigated. Though that's a GUI matter of making/(something)/friend the system file browser pop up a Show Relationships... menu item as contrasted with drilling down to a subfile directory listing by clicking on an item. I'll start with an example here. Imagine a directory, /(whatever)/portrait , in which there are portrait photos of a number of men, one photo per man. Each photo is identified under /(whatever)/portrait by the guy's first name, so you have /(whatever)/portrait/Mike /(whatever)/portrait/Bob and so on. Now suppose we use link-directories to express father/son relationships between the guys in the photos. So, for example, if Mike is Bob's father, we could have /(something)/father-son/ /(something)/father-son/aardvark: /(something)/father-son/aardvark:father (which is the file also known as '/(whatever)/portrait/Mike') /(something)/father-son/aardvark:son (the file also known as '/(whatever)/portrait/Bob') Using these link directories, we can easily express the information in this (father's-side) family tree: Mike | | v v --- Bob -- Ted || | | vv v v Joe DeanEd Todd , where Mike Bob means Mike is the picture of the father of the guy pictured in Bob. But this is where the clunkiness comes in. The family-tree representation above is an obvious and natural way to conceive of and manipulate the father/son relationships. We want there to be a father-son link straight from Mike to Bob; what's more, we want to be able to list the children (in the graph sense!) of Mike and see Bob and Ted, and to move leafward from Mike to Bob or rootward from Bob to Mike. But when we look at how we expressed the information using link-directories, we see this instead: --- /(something)/father-son/ | | v v aardvark - -- zebra
Re: File as a directory - file-as-dir vs. link-dirs (again) - 2/3
(This long essay has been posted in three parts. In order to keep any replies together, I suggest that people reply to the third part unless the reply is very specific to one of the others. This is part two, in which I criticise file-as-directory some more - far from exciting, but apparently still necessary. Things should pick up in part three.) But now let's try to express the father's/son's-photo relationships between the /(whatever)/portrait photos using subfile metadata instead of link-directories. /(whatever)/portrait/Mike is (the photo of) the father of (the man pictured in) /(whaterver)/portrait/Bob - how to express that using files as directories? We could decide that /(whatever)/portrait/Bob should have the additional pathname /(whatever)/portrait/Mike/son-photo . But that would mangle the filesystem semantics: /(whatever)/portrait/Mike/son-photo isNotA /(whatever)/portrait/Mike . We need to distinguish the links from files to their metadata files from ordinary directory-to-directory and directory-to-file links. As the man said, don't try to make things simpler than possible. So let's call our new pathname /(whatever)/portrait/Mike;son-photo instead, where ';' is a name segment delimiter in the same way that '/' or (in my examples) ':' is. (Having a reserved segment-name like ..metas is an alternative implementation of the same idea.) Now this seems to work fairly well, but there are problems. Here are some of them. Problem one: We can assume that the partial pathname after the ';' , from the file-as-directory to the metadata file, describes the type of relationship between the two files. So, for example, ';son-picture' describes one type of relationship, while others could be ';friend-picture', ';thumbnail' or ';social-sec-no'. So are all files in the same namespace as regards these relationship-names or not? In other words, if I see /(whatever)/foo;aardvark and /(something)/bar;aardvark , can I always safely assume that /(something)/bar;aardvark is to /(something)/bar as /(whatever)/foo;aardvark is to /(whatever)/foo ? If so, then there will be substantial risk of namespace collisions. So in practise, the subfile part of filenames will probably have to be fairly long-winded to minimise the risk: not ';foo' but ';something/not/altogether/unlike/a/third-party/java/package/name/foo' . If not, if there is some context in which I should interpret what ';aardvark' means, so that it can mean one thing for one file-as-directory and something else for another, what is that context and how can I know about it? Might it have something to do with the file-as-directory's file type? (As defined how?) With one or more of the pathnames that the file-as-directory might have? By contrast, the type of a link-directory is defined by the predicate-directory it is a child of (by a non-opaque link). So the namespace of link-directory types is the same namespace of pathnames that all predicate-directories are in. Pathnames aren't necessarily very concise either, but at least we're not creating a second namespace, and equivalent pathnames ought to be a lot shorter on average when you have pathname-listing and advanced searching on pathnames; for example, a user binary can have the two pathnames /usr and /bin rather than one long pathname /usr/bin. Problem two: consider that you discover Mike's photo-of-son by looking into its subfiles and seeing /(whatever)/portrait/Mike;son-photo , while you discover Bob's is-son-photo-of (in effect, its photo-of-father) by looking through its pathnames and also seeing /(whatever)/portrait/Mike;son-photo . To find all the relationships which a given file is involved in, you must check both its subfiles and its pathnames. And whether a given relationship will be found among one or the other is arbitrary. Had we chosen to use ;father-photo rather than ;son-photo links, then Bob's metadata would have been a subfile while Mike's would have been a pathname. But, one could argue, this is only a problem in the special cases where both directions of a two-part relationship are worth expressing. It just so happens that the reverse of the is-son-of relation is a useful relation to consider. It just happens to be the case that every man is a father to all his sons; or rather, the reverse of 'x is the son of y' - 'y has the son x' - happens to be important enough to have another form, 'y is the father of x'. So in these special cases, we can create a link in both directions: for example, we can create both /(whatever)/portrait/Mike;son-photo and /(whatever)/portrait/Bob;father-photo . Then the user can find all of a file's useful file-is-dir metadata by inspecting its subfiles, and so happily ignore its subfile pathnames. But creating both /(whatever)/portrait/Mike;son-photo and /(whatever)/portrait/Bob;father-photo means having a cycle in the representation of some simple non-cyclic data. Also, the fact that Mike was the parent of Bob through a ;son-photo in the base filesystem tree
Re: File as a directory - file-as-dir vs. link-dirs (again) - 3/3
(This is the third and final choke-sized chunk. In order to keep any replies together, I suggest that people reply to this part unless the reply is very specific to one of the others.) File-as-dir is a flawed way of expressing parent-child relations. Unfortunately, when it comes to relations, expressing two-way parent-child links and providing a tree view of them is what file-as-dir does /best/. Even simple two-way relationships that don't have an obvious parent-child nature cause additional problems. Say we decided to create metadata to record which of the men are friends. So if Dean gets along with his brother Ed we could create /(something)/friend/aardvark: /(something)/friend/aardvark:1 (which is the file also known as '/(whatever)/portrait/Ed') /(something)/friend/aardvark:2 (which is the file also known as '/(whatever)/portrait/Dean') using link-directories. In fact, if we have anonymous last name segments, we can just create /(something)/friend/aardvark: , which links anonymously to both the Ed and Dean photos. But try to express this using subfiles: which of the two brothers will we arbitrarily choose to make the subfile of the other? In general, because the subfile relationship is always parent-child, to express a symmetric relationship in it we have to make up spurious extra data, declaring one participant in the relationship to be the 'parent' when no such distinction exists. Ed and Dean are unlikely to care about this, but try deciding whether Sales worksClosely with Marketing or Marketing worksClosely with Sales on your firm's computerised org chart. (Apparently things like LDAPisation projects have provoked wars over less.) And in the link-directory example using anonymous links, even the dumbest program that knows nothing about either /(something)/friends or friendship can tell that /(something)/friends/aardvark: is symmetric. In the link-directory example that doesn't use anonymous links, it doesn't know that - and subfile metadata will actively give it the false parent/child information. And of course even if we already know that a specific relationship is symmetric, or if it's not important that we find out, problems two and four from part two bite hard. For example, reliably finding all of Ed's friends' photos requires looking for both all his photo's ;friends children and all its ;friends parents every time. We have similar problems for relationships that aren't symmetric, but for which we don't want to have to declare one role to be the parent of the other. Which party in a is-husband-of/is-wife-of relation should be indicated as the parent? Then there are (2)-way relations. Here's a good example of a three-way relation, lifted from the Rumbaugh-Blaha-Premerlani-Eddy-Lorensen OO book. Say that we have files representing programmers, software projects and programming languages. Now say that, for example, Bob is using Algol 68 on the Foomatic and both SNOBOL and PL/1 on Project Omega, while Dean is coding in PL/1 on the Computron and in PILOT on Project Omega, and Todd is formally specifying the Foomatic in Z. We would represent this information using link-directories by creating /(thingy)/impl-lang/aardvark:coder -- /(whatever)/portrait/Bob /(thingy)/impl-lang/aardvark:lang -- /bin/algol68 /(thingy)/impl-lang/aardvark:proj-- /(whatever)/projects/foomatic /(thingy)/impl-lang/zebra:coder -- /(whatever)/portrait/Dean /(thingy)/impl-lang/zebra:lang -- /bin/pilot /(thingy)/impl-lang/zebra:proj-- /(whatever)/projects/foomatic and so on: one link-directory for each triple of programmer, project and language. If we want to express the same information using subfile metadata we are going to have to create something like /(whatever)/portrait/Bob;impl-lang/1/proj -- /bin/algol68 /(whatever)/portrait/Bob;impl-lang/1/lang -- /(whatever)/projects/foomatic /(whatever)/portrait/Dean;impl-lang/1/proj -- /bin/pilot /(whatever)/portrait/Dean;impl-lang/1/lang -- /(whatever)/projects/foomatic and so on. Problem two is worse in this case. Not only do we have to look through the pathnames of /(whatever)/projects/foomatic in order to find out what programmers are working on it, but in order to find out what languages Bob is using on the Foomatic we have to find the /(whatever)/portrait/Bob;impl-lang/* directories among the pathnames of /(whatever)/projects/foomatic and then examine those directories' ./language names. And to find out what projects Bob is working on, we have to list all the /(whatever)/projects/* files which are linked from /(whatever)/portrait/Bob;impl-lang/*/project . All this is basically the same as working with link-directories using base-filesystem commands; indeed /(whatever)/portrait/Bob;impl-lang/1 is basically /(thingy)/impl-lang/aardvark: shoved under an arbitrary choice of one of the three files it relates. We created tools so that we could handle parent-child relations expressed as link-directories without clunkiness; naturally we can do similar
Erratum
On 11/18/05, Leo Comerford [EMAIL PROTECTED] wrote: $ setroot /(whatever)/friend/Ed This should be $ setroot /(whatever)/portrait/Ed - this is what comes of writing things in a hurry Leo. -- Leo Richard Comerford - http://www.st-and.ac.uk/~lrc1 - accept no namesakes :)
Re: File as a directory - back to predicates
On 8/25/05, Hubert Chan [EMAIL PROTECTED] wrote: On Wed, 24 Aug 2005 07:51:19 +0100, Leo Comerford [EMAIL PROTECTED] said: [... lots of stuff snipped ...] At other levels, of course, the differences assert themselves. For one thing, the normal Unix filesystem API doesn't have calls to, for instance, check the pathnames asserted of a given file. That's easily solved; just add the calls. It's not so easy. You need to determine how to figure out the pathnames. UN*X filesystems and filesystems for UN*X-like operating systems don't store uplinks, Yes, I know. so there's no quick way to figure out the pathnames; the only way currently is to traverse the entire tree. And that's exactly the point. (Less easily solved are the performance issues.) Again, if you took the expanded API and put a typical Unix filesystem implementation behind it, you would find that its performance at things like finding pathnames was abysmally slow, while its performance at doing the traditional Unix-filesystem things was as good as ever. Conversely, if you mounted some kind of registry system instead (or as well) you' d find that it was very fast at finding pathnames, but very slow at many traditional-Unix-filesystem tasks (for example rename()ing a directory). Again, consider the analogy of an abstract collection type with two or more different concrete implementations. The data model is not any of its implementations. Just because two different data systems have different performance characteristics doesn't mean they need to present different data models. P.S. most of the stuff that you're saying is already in the Future Vision paper. At least the main idea of trying to query via metadata. Future Vision is predominantly about searching from metadata to data. (Which files are emails about Santa?) It says almost nothing about going from data to metadata. (Is this file an email?) (This is especially unfortunate since Future Vision is in large part about how to improve the effectiveness of search in the real world, and one of the most ubiquitous, natural and effective real-world search strategies is to start with an m-to-d search, then apply d-to-m searching on the results. An example: I remember Santa flamed somebody out a while ago. Let's see - search for emails from Santa. Hm, thirty hits. [m-to-d] Let's take a look... This one here also relates to elves and a strike - /that's/ what it was about, I remember now! [d-to-m] Any other elf strike emails from Santa? No, just the one: bingo! [m-to-d again].) The one thing it /does/ say about data-to-metadata searching is that file streams are inelegant, and should be replaced by ... pathname metadata, yet another way to represent d-to-m metadata that is separate from file naming. By contrast, my email argues that unifying all OS namespaces into the file naming system, as proposed by Hans in Future Vision, is such a good idea that it ought to be applied properly to d-to-m metadata too. Especially since the only non-bogus distinction between m-to-d metadata and d-to-m metadata is their performance requirements. [snip] -- Leo Richard Comerford - http://www.st-and.ac.uk/~lrc1 - accept no namesakes :)
Re: File as a directory - back to predicates
Firstly, I apologise for the absurdly late reply! Secondly, I'm going to backtrack for a bit here. Let's forget about relations for the moment, and concentrate solely on single-place predicates. I also apologise for (partly) repeating myself at length like this, but this time I have what I hope is a really nice explanation. So: Consider an absolutely vanilla registry-type thing of the kind that Joe Code would produce if you asked him to implement a metadata system for files/objects/documents/whatever. Users can assign arbitrary name-value pairs to the objects in the registry; they can also delete/edit the pairs. And, of course, they can view all the name-value pairs associated with a given object. The names are strings. The objects are opaque binaries to the registry. So, in use, a photo object in the registry might have something like the following pairs associated with it: date_taken=2004-03-04 title=My dog Spot type=photo (The order in which the pairs are listed is obviously arbitrary.) Now note that the registry system could use Unix-style segmented pathnames instead of name-value pairs, in the cases where the value is a sufficiently short string. For example, we can think of date_taken=2004-03-04 as alternative to date_taken/2004-03-04 which could be further parsed up into date_taken/2004/3/4 . There's nothing magic about name-value pairs: each one simply asserts a predicate just as a Unix filename does. In fact, a name-value pair where both the name and value are strings is in effect just a limited Unix pathname which must have exactly two segments. So our mark-2 registry system assigns its objects Unix pathnames instead of name-value pairs, and the photo object might have have the following ones: date_taken/2004/3/4 title/My\ dog\ Spot type/photo Isn't this new registry fundamentally very similar to a Unix filesystem, in which each file has one or more such pathnames? No, better than that. It *is* a Unix filesystem, subject to a few caveats. It has the same syntax (it's built of files and pathnames); it has the same semantics (pathnames express propositions which are asserted of files); all it need is an implementation which can expose these to the OS by being mounted. So a completely simple-minded iteration of the registry pattern, without any consideration for namespace integration, Unix philosophy or what have you, translates almost effortlessly into a Unix filesystem which uses the standard semantics. This is strong evidence for the power of that filesystem, using those semantics, to integrate different namespaces into itself. But more, our registry is a data-to-metadata system: it is designed to allow the user to find the metadata that has been associated with a particular piece of data. (When was this photo taken?, which in our revised registry becomes what date_taken pathname does this file have?.) Registry systems in general, file streams, stat blocks, subfile metadata, and the old Macintosh resource fork are all data-to-metadata systems. By contrast, the Unix file naming system is a metadata-to-data system, designed to allow the user to find one or more pieces of data (files) through the metadata that has been associated with them. (What photos were shot in 2004?, becoming what files are in the directory date_taken/2004 ?) But in the example above, registry data - the metadata of the classical data-to-metadata system - is being expressed in Unix filenames using the standard semantics - the language of Unix's classical metadata-to-data system. This shows the power of the Unix filesystem to integrate both d-to-m and m-to-d systems into the one namespace - the one language of pathnames-as-predicates. This is not so surprising when you consider that at the semantic/logical level both types of system are exactly the same: they both just associate metadata with data. At other levels, of course, the differences assert themselves. For one thing, the normal Unix filesystem API doesn't have calls to, for instance, check the pathnames asserted of a given file. That's easily solved; just add the calls. Less easily solved are the performance issues. rmdir date_taken/2004 is going to be rather slow on a registry-type volume which contains many files, just as listing all the pathnames through date_taken to a particular file is going to be relatively painful on a volume which is closer to a traditional Unix filesystem implementation. The important thing here is that these /are/ *performance issues*, although certainly not trivial ones. By stripping away the superficial differences between m-to-d and d-to-m systems, we have revealed the real difference, performance. The situation is similar to general programming languages. No-one would dream of creating a language which has, say, two radically different and incompatible function call interfaces, one of which is supposed to be used by functions whose time performance is O(n) or better, while the other is for O(n) time functions. But of
Re: Installing Fedora Core with root on Reiserfs
On 7/18/05, Russell Coker [EMAIL PROTECTED] wrote: On Monday 18 July 2005 06:01, Edward Shishkin [EMAIL PROTECTED] wrote: FC4-test3 (and perhaps FC4) installs its own version of grub which seems to interact incorrectly with reiserfs. The problem is that reiserfs.ko module located on reiserfs partition can not be loaded. I can confirm that there is a reiserfs/GRUB problem in the final FC4 release too. (I assume it's the same problem, but I haven't investigated it.) FWIW - evidently not much - the relevant Fedora Bugzilla bug would appear to be 161306.
Re: file as a directory
And then there are ReiserFS plugins, which might give you a magic directory that when read for data, yields the concatenation of its children's data contents. Better, you could have a little custom filesystem which can take the /(something)/concatenation/zebra: subgraph as its device and generate a single file which is the concatenation of zebra:1, zebra:2, and so on. (Remember that we can redefine 'file' as 'an atomic file or a link between files'. So since the zebra: subgraph is a file, we can mount it!) We could also have, for example, a pair of filesystems, one which can mount an XML file and present it as an instance of an XML-document association, and another which can mount such a link and present it as a flat XML file. Leo.
Re: file as a directory
On 5/17/05, Alexander G. M. Smith [EMAIL PROTECTED] wrote: This is a bit of a shotgun reply, but I hope this answers your questions and clarifies things. If not, please ask again and I'll try to give a better answer. There are some other things I should add soon anyway. (Not immediately, though - I'll be very busy over the next few days.) In the photo-with-description use-case, if you want to give the photo a name, don't link to the relationship between the photo and the description. Just link straight to the photo file itself, more/less as you would do today. Remember, in my example, the jpeg file's original name is '~/photos/dessau-bauhaus' . That links straight to the actual jpeg binary file itself. (The name would be '~/photos/dessau-bauhaus.jpeg', but in my example we're not using file extensions to track file type anymore.) And after I've come along and associated the dessau-bauhaus jpeg with a description, ~/photos/dessau-bauhaus *still* just links straight to the jpeg binary. As far as the pathname '~/photos/dessau-bauhaus' is concerned, '/(something)/description/aardvark:described' is just another pathname that happens to link to the same file as it. Inspecting the pathnames of the file ~/photos/dessau-bauhaus will reveal '/(something)/description/aardvark:described' along with all the other names it has. The directory /(something)/description/aardvark: is the link between the photo and the description. (To be clear, the colon isn't part of any name-segment. It's a delimiter betweeen name-segments: it's the relation-directory equivalent of the forward-slash.) In OO terms, the directory /(something)/description/aardvark: isA '/(something)/description' and it hasA 'description' and a 'described'. Being the child of /(something)/description/ gives the relation-directory its file type: it's the pathname '/(something)/description/(whatever)' that tells us to interpret the relation-directory as a link between a description and the file it describes. Actually aardvark: already has a file type of relation-directory - the link from /(something)/description/ specifies what type of relation-directory it is. Every relation-dir that is a description-described link is (by definition) in /(something)/description/ , regardless of which files they have as their 'description' or 'described'. (This is the sense in which /(something)/description/ is an association - in OOese the term means a /type/ of link between objects.) For example, let's suppose a different user, Bob, decides to attach some descriptions too. He puts a description on the jpeg file /home/bob/photos/petit-trianon ; he also decides he doesn't like my description of /home/leo/photos/dessau-bauhaus and puts his own description on it. So now we have: /(something)/description/aardvark: /(something)/description/aardvark:description -- this is /home/leo/dessau-bauhaus /(something)/description/aardvark:described -- this is my description of it /(something)/description/manticore: /(something)/description/manticore:description -- this is /home/leo/dessau-bauhaus again /(something)/description/manticore:described-- this is Bob's description of it /(something)/description/sheep: /(something)/description/sheep:description-- this is /home/bob/petit-trianon /(something)/description/sheep:described -- this is Bob's description of it And now if I list the pathnames of ~/photos/dessau-bauhaus, I get both '/(something)/description/aardvark:described' and '/(something)/description/manticore:described' - telling me that there are two different descriptions of the file. And everything still just links directly to the jpeg file. About the colon: there's nothing magic about the choice of character, of course, but there does absolutely need to be a way to identify links from relation-directories in pathnames. Programs (and humans) need to be able to mechanically tell when (and where) a pathname asserts part of a relation and when it asserts a predicate, just by looking at it. Changing the delimiter is a good way to do it because it highlights the important point: that the relationship between aardvark and description is different to the relationship between a (predicate-)directory and its child. (You could use file extensions as a makeshift subsititute. How about .: ?) Are both methods useful? Yup. What's the difference between associations and properties? Many-to-one and one-to-one? The idea is that if you want to assert a single-place predicate of a file, like file x is important, you just use give the file an approprate full pathname ('~/important' or whatever). If you want to assert a multi-place predicate - a relation - like file x is more important than file y then you use a relation-directory. That goes for every kind of multi-way relation/association you might want to assert between files - one to one, one to many, many to many. Actually, '/(something)/description' asserts a predicate just like '~/important' does. But
Re: file as a directory
Serialiser.) But note for now that if we define 'atomic file' as 'just a simple sequence of bytes', we can redefine 'file' as 'either an atomic file or a relationship between files'. To amplify, though, the main, real purpose (so to speak) of relation-directories is to express relationships between files. Doing this properly just happens to give us compund objects for free. As Rumbaugh, Blaha, Premerlani, Eddy and Lorensen say (in unison?), Aggregation is a special form of association, not an independent concept. Beware the visual/spatial metaphors which subtly warp one's understanding of the Unix file system. It's not a set of Russian dolls or a maze of twisty little passages. In particular, files are not physically inside directories, at any lelel of abstraction. aardvark: just provides some metadata about ~/photos/dessau-bauhaus, and that's all that ~/photos does too. (Great, isn't it? The filesystem namespace: they're not names, and it's not a space. :) ) On the other hand, garbage collection will be a significant hurdle, for two reasons. One is cycles. The semantics of predicate-directories mean that it's unnecessary to permit cycles containing only predicate-directories, but if you're going to instances of, say the singly-linked-list-node relation, then the need for cycles is unavoidable. The other is more sophisticated needs for automatic deletion. For example, we would probably need /(something)/description/aardvark: to be marked for deletion as soon as either of its children were unlinked. Leo Comerford.
[reiserfs-list] A clarification
On Thu, 2002-03-07 at 18:37, Nikita Danilov wrote: Hans Reiser writes: The notion that we need to be able to delete metadata when we delete a file, and that this means we need two way links, is one I have been pondering for ~18 years though I did not address it in my paper. There And udanax people were pondering for it about... oh no... 30? 40? I do hope you're not under the impression that I claimed to be the first person to see a need for two-way file links in the context of file deletion? That would indeed have been chancing my arm, especially since I haven't read the literature. When I wrote, in my introduction, of points that no-one had made before, I meant only that no-one had made them in the public discussion of ReiserFS before. I thought that this was clear from the context of the paragraph; if not I'm happy to clarify it now. Leo Comerford.
[reiserfs-list] New essay on Reiser4 and file metadata
I have written a response to Dr. Reiser's Future Vision and Reiser4 papers, specifically to his proposals for file metadata in the Reiser4 paper. Since it's quite long, I have placed it on the WWW at http://www.st-andrews.ac.uk/~lrc1/pathname_metadata.html . In the interests of self-promotion through controversy, the first paragraph is repeated out of context below: :) After reading Dr. Hans Reiser's Future Vision paper, I was sold on (and very excited by) it. I later read the Reiser4 proposal document . At first I was enthusiastic about this too, especially his proposed new system of file metadata. As I thought further about it, however, I started to see shortcomings. By now I remain enthusiastic about the Future Vision proposals, but I have come to believe that the metadata system proposed for Reiser4 is a mistake. It's a Jekyll and Hyde design, which overcomes shortcomings and inelegancies in *nix at the cost of creating worse ones. Moreover, it's actually a step away from the Future Vision program. Examination of Future Vision and the nature of Unix suggests a better way forward. I am eager to hear peoples' (considered) responses to the essay. A sequel essay, which will resolve some points left open in the current one and cover significant new ground, will follow, relatively soon if there is interest in seeing it. Leo Comerford.