Re: File as a directory - file-as-dir vs. link-dirs (again) - 3/3

2005-12-22 Thread Leo Comerford
On 11/21/05, Hubert Chan [EMAIL PROTECTED] wrote:

[snip]

 A few points
 - things should not be named mumble or thingy -- things should be
  named descriptively (obviously -- of course, people don't always
  follow these obvious rules)

Well, sure. But even with the best efforts to choose clear and
descriptive names, the meaning of every name will not always be
clear to every person, never mind to programs. For 'mumble' read
'[mumble]'.

 - the user should be in charge of how he/she organizes the data, and so
  he/she should pick the names that he/she wants to use.  It shouldn't
  be mandated.  And anyone else who uses the data should have a
  reasonable expectation to have any confusing things documented.  At
  least database schemas are usually documented for those people who
  need to use them.

Documentation is good and important, but for all sorts of good and bad
reasons few filesystems will ever have good, up-to-date documentation
handy for every pathname. And, as Future Vision says, no-one has the
time or the inclination to study up on the format of every database
they might want to use. People like to learn by exploration, even when
they do also read the docs. And a shell utility can't use
human-readable documentation any more than it can apply human Unix
lore or common sense to interpreting the filesystem. To such a
program, every segment name is akin to 'mumble'; without some
additional information, it can't tell that /usr/bin is not an instance
of the relation /usr .

Which brings us back to semantics. I've compared the filesystem
interface to an ADT. But if that were all it is, if there were no
conventions about how to interpret a pathname, then it /would/ be
necessary to get out the manual and read up on the meaning of every
new pathname you come across, because you would almost never be able
to infer anything about its meaning from looking at it. But the
filesystem isn't (just) a persistent-storage data structure or ADT;
it's a language through which both people and programs communicate.
I've described the semantics of that language before - '/usr/bin' is a
predicate which is asserted of all and only the opaque descendants of
/usr/bin, '/usr' is a predicate which is asserted of all and only the
opaque descendants of /usr, '/usr/passwd' is a predicate which is
asserted of /usr/passwd, etc. etc. ad nauseam. So if I come across
/foo/bar which links to a non-directory file then I know that the
predicate '/foo/bar' is asserted of that file (and that file only).
Even if 'foo' and/or 'bar' is mysterious to me, I already know a good
deal about the intended meaning of this bit of the filesystem, and I
can use what I know to help me deduce the meanings of mysterious
name-segments. (To plag^H^H^H^Hparaphrase one David Moser, imagine
walking into an office and seeing a Post-It note stuck on the side of
something. Even if the note contains many nonslarkish English
flutzpahs, you can glork much more of its pluggandisp than if it were
scríofa i dteanga éigin eile.) Having this language also means that
even programs which never know the meaning of any name-segment can
extract useful information from pathnames in virtue of their form. For
example, listing the common attributes of two files is a matter of
listing the intersection of their pathnames.

But if we start using directories to assert relations as well as
predicates without distingushing those directories which assert
instances of a relation, then we make every sentence in the language
ambiguous. Now any given full pathname /might/ assert a predicate,
or it might assert an instance of a relation instead. (Or in fact an
instance of any one of several relations, since '/foo:bar/baz',
'/foo/bar:baz', and '/foo:bar:baz' all ambiguate to '/foo/bar/baz' .)
Such an ambiguous language is much less useful. Before, for example,
it took a simple shell command to find the predicates asserted of a
file. When the ambiguity is introduced, that simple operation becomes
an exercise in manual-reading and guesswork.

Speaking of databases, if you ask someone like C.J. Date what the most
important feature of the relational database is, he won't talk about
view-construction or even ACID properties. He certainly won't say
anything about performance. The answer he will give you is
well-defined semantics. While a subgraph of a network database is
basically just a bit of persistent-storage data-structure whose
meaning can only be discerned by reading the documentation, a table in
an RDB can (must) always be understood as expressing the present
instances of some relation.

 - you can still have some sort of marker to indicate what role each part
  of the name takes (e.g. the  delimiter to indicate
  pseudofiles).  Or you can use a special naming convention (e.g. tuples
  have a special prefix).  But I think that trying to introduce a new
  delimiter that does basically the same thing as '/' is going to cause
  a lot of problems.  (See Rob Pike's paper, The Hideous Name, 

Re: File as a directory - file-as-dir vs. link-dirs (again) - 3/3

2005-12-22 Thread Leo Comerford
On 11/21/05, Hubert Chan [EMAIL PROTECTED] wrote:

[snip]


 As a completely different side issue, I think that using random names
 such as aardvark or zebra to refer to tuples is a bad idea (and I
 know this isn't part of your proposal).  If you use things that are real
 words, people will get confused, since they will try to associate
 meaning to something that doesn't have meaning.  (e.g. why is my
 relationship called dodo, while Bob's is called tiger?)  I think
 that it's best to just assign random meaningless strings, so that people
 will know that they are meaningless.


(This addresses the issue with last name-segments of
link-directories which I said in part 3 that I'd get back to.)

Meaningless name segments are annoying to those who know or guess
that they're meaningless. Worse, they're misleading to those people
and programs that don't. (After all, they amount to making up spurious
information.) The ideal solution is to throw them away completely:
having anonymous final segments allows two non-directory files having
the predicates '/foo/aardvark' and '/foo/zebra' to both simply be
'/foo'. It has some slightly weird effects, though; for example, when
'/foo/aardvark:bar' and '/foo/zebra:bar' both become '/foo/:bar', what
happens to

cd /foo/:bar

? One solution is to dynamically generate filler text for anonymous
final segments whenever text is necessary (in a POSIX legacy
interface, for example), based on the linked file's inode and that of
the volume; the format of the filler text should allow programs (and
humans) to detect it as such when they can't find out some other,
out-of-band, way.

[snip]

Leo.
--
Leo Richard Comerford - http://www.st-and.ac.uk/~lrc1 - accept no namesakes :)


The filesystem plus browser as Agenda/Chandler

2005-12-09 Thread Leo Comerford
Speaking of handling email, wanting maximum power, and (not) using
databases which aren't integrated into the filesystem namespace: note
that the thin GUI on top of a filesystem of predicate- and
link-directories with pathname-listing is, among other things,
Chandler (the promised son-of-Lotus-Agenda) done right.

In particular, see

http://blogs.osafoundation.org/mitch/92.html

.

attributes and their values-- /foo/bar/baz
relationships to other items   -- /foo/fum:aaardvark/foo etc.
payload-- file body
show views of [things] organized by [foo]  -- ls foo (note no distinction
 between d-to-m and m-to-d
 metadata)

The other mappings are left as an exercise: they're astonishingly (and
amusingly) precise.

Leo.
--
Leo Richard Comerford - http://www.st-and.ac.uk/~lrc1 - accept no namesakes :)


Re: File as a directory - file-as-dir vs. link-dirs (again) - 3/3

2005-12-08 Thread Leo Comerford
On 11/19/05, Alexander G. M. Smith [EMAIL PROTECTED] wrote:
 Leo Comerford wrote on Fri, 18 Nov 2005 03:42:50 +:
  [^.*$]

 Just a few points I thought of while reading through your text:

 Genealogy is an extremely structured arrangement of data, most people won't be
 doing something that complex - think of photo filing instead.  Also cycles
 exist everywhere, even in genealogy.  So cycles should be supported by 
 default.

Cycles? Sure. Hence all the quibbling about the generic tree
operators actually being rooted directed graph operators.

(In the case of geneaology, parent-child digraphs are acyclic unless
they involve Beeblebroxes, though they're *shudder* not invariably
/trees/.)

It's true that a lot of file metadata doesn't involve recursive
structure or suchlike, but you don't have to get very advanced or
esoteric to find file metadata that does. Expose your email's
Reply-to: information as file metadata and you have recursive
structure.

(Sidebar: Actually, you can think of the process of improving the
filesystem - large parts of that process, anyway - simply as the
process of fixing email. Every email ought to be an ordinary file like
any other document; instead emails live as entries in spool files
around which mail agents perform ritual lockfile dances. A filesystem
which handles smaller files more efficiently allows us to store each
email as a file, but you have to keep those files in a Maildir and
continue observing special safe-access rituals. Introduce transactions
and you can just put your emails into ordinary folders, but everyone
knows by now that just putting each of your emails into one folder or
another is a completely inadequate way of organising them. Introduce
pathname-listing and every email can usefully be in several
directories at once; this means that we can indicate all the
categories and labels we put on our emails by just putting them in
directories instead of having to use special data formats
understandable only to email clients (or worse, only to one email
client). But there is still the (meta)data in the email headers
themselves - we don't want to have to either duplicate or ignore it in
our directory metadata. So we use mount() to expose persistent queries
on the header data as directories.)

 You had a separate directory storing relationship links.  How about making 
 that
 a subdirectory of the person?  If I wanted to do genealogy-as-a-file-system, 
 I'd
 have a children subdirectory under the person; it would contain hard links 
 to
 all the person's children.  If you want to find a person's mother or father,
 examine the list of their parent directories (a cyclic file system has more
 parents than just ..) to find the ones called children.  The person's
 parents are the holders of those children directories.


You can use the same one-to-many approach with link-directories:
instead of creating a link directory for each (biological)
parent-child pair, create one for each parent which links it to all
its children. (Having the individual link-directories is better in one
way: it's safe to go from the one-to-one to the one-to-many form
without context knowledge, but not /vice versa/. In the case of the
parent-child relationship, if a person is a parent to a bunch of
children then that person is individually a parent to them all. But if
a person hasLessMoneyThan a bunch of people, then (s)he may not have
less money than any one of them.)

The (biological) parent-child relationship is kind to subfile metadata
again here: the one-to-many link makes the (biological) parent an even
more obvious candidate to go on top. Problem five from part two is
as strong as ever, on the other hand: there's no way for a program to
tell without context knowledge that bob;children asserts a
relationship between bob and several other files, rather than
asserting some attributes of bob or describing some subparts of it.

 I wouldn't worry about naming conflicts (such as children being a magic 
 name)
 since most people only define a few dozen relationships, at least in BeOS.  
 [snip]

If the names are only being created and used by one person, then yes,
conflicts are likely to be rare and easily dealt with by hand. But if
you have more than one person involved, and especially if you are
trying to use different bundles of names created independently by
different groups, then people will soon resort to the usual defensive
practises used in package naming. Since any application could
potentially define and use a bunch of its own subfile names just as it
can create several (ordinary) directories, this situation will arise
as soon as subfile names become popular.

[snip]


 So to sum up, it seems that you're way more power hungry than I.  I just want
 something to make finding photos easier, not a whole database equivalent 
 system
 (I'd use a database for that).  Early versions of BeOS did use a database as
 the file system, which turned out to be more trouble than it was worth.  A 
 file

Re: File as a directory - file-as-dir vs. link-dirs (again) - 3/3

2005-11-20 Thread Leo Comerford
(Apologies for email snafu - hit the wrong button.)

Difference 2 can also be left aside for now.

3) Say we just used standard directories to indicate relations. Then
if a user comes across  these names -

/(something)/father-son
/(something)/father-son/aardvark
/(something)/father-son/aardvark/son
/(something)/father-son/aardvark/father

- while browsing the filesystem she can use her common sense to guess
that /(something)/father-son expresses a father-son relation.
'father-son' likely suggests a relationship between fathers and sons,
and indeed /(something)/father-son/aardvark has children called father
and son. Pretty obvious. Similarly, if she comes across

/usr/bin
/usr/bin/alpha
/usr/bin/bravo
[etc. etc.]

, if she is familiar with Unix she will know that /usr/bin/ indicates
user binaries. Every (non-directory) file in /usr/bin/ isA '/usr/bin',
a user binary. (Also, common sense might suggest that /usr/bin/ has so
many children that it's unlikely to be one giant relationship.) But
what if she comes across

/(something)/mumble
/(something)/mumble/thingy/alpha
/(something)/mumble/thingy/bravo

? Is alpha a '/(something)/mumble/thingy', or is it in the 'alpha'
role of a '/(something)/mumble' relationship with another file? (Or it
could be in the 'thingy/alpha' role of a '/(something)' relationship.)

This matters a lot. The distinction between being a foo and being a
party in a foo relationship is clear and important. For example, there
is a big difference between being a marriage and being a married
person. So we need to know which one is meant. What's more, we need to
be *told*, because the other two solutions - guessing and knowing
already - aren't good enough. The person who created
/(something)/mumble/thingy would be able to tell us that if only she
had some way of indicating to us that we should interpret
/(something)/mumble/thingy as an instance of a relation. And that is
(to a first approximation) all that link-directories are - directories
with a simple binary flag set at creation time to indicate how they
should be interpreted.

Leo.
--
Leo Richard Comerford - http://www.st-and.ac.uk/~lrc1 - accept no namesakes :)


Re: File as a directory - file-as-dir vs. link-dirs (again) - 3/3

2005-11-19 Thread Leo Comerford
On 11/19/05, Hubert Chan [EMAIL PROTECTED] wrote:
 P.S. your relational model can easily be expressed using file-as-dir
 (well, actually, just standard directories):

 /(something)/father-son/aardvark/father is a symlink to
   '/(whatever)/portrait/Mike')
 /(something)/father-son/aardvark/son is a symlink to
   '/(whatever)/portrait/Bob')


Yes absolutely. Yes, my relational model *does* uses standard
directories, with three differences.

1) foofs's internal implementation of link-directories and other
directories might be different. Or it might not. Entirely unimportant
at this level.

2) gc might treat some link-directories differently to
predicate-directories. (If Bob and Mike have been deleted, I don't
want /(something)/father-son/aardvark/ lying around.)

3)

 --
 Hubert Chan [EMAIL PROTECTED] - http://www.uhoreg.ca/
 PGP/GnuPG key: 1024D/124B61FA
 Fingerprint: 96C5 012F 5F74 A5F7 1FF7  5291 AF29 C719 124B 61FA
 Key available at wwwkeys.pgp.net.   Encrypted e-mail preferred.




--
Leo Richard Comerford - http://www.st-and.ac.uk/~lrc1 - accept no namesakes :)


Re: File as a directory - file-as-dir vs. link-dirs (again) - 1/3

2005-11-17 Thread Leo Comerford
Once again, I have to apologise for a stupidly long and stupidly late
reply. I've tried to make this thing a little more digestible by
chopping it into three chunks. In order to keep any replies together,
I suggest that people reply to the third part unless the reply is very
specific to one of the other parts. This first part is (I hope)
relatively fun.

First of all: I'll refer to 'relation-directories' as
'link-directories' from now on; the new term should be more
enlightening and less misleading. (Sorry if the change causes any
temporary confusion.) Again, each link-directory expresses one
instance of a relation; in RDB terms that's one tuple of a relation or
one row of a table, while in OO theory terms it's one link of a
relation. (In fact that's not completely and invariably true, because
of the weakly-typed nature of link-dirs.) The directory which (by
definition) has as its children every link-directory of a given type
is *not* a link-directory. (It is an ordinary predicate-directory.) In
RDB terms it is the table, and its children are its rows. In OO terms
it is the relation (which makes it a class) and its children are the
links of that relation (the objects which are instances of the
relation).

Second, in the coming examples, assume that the present working
directory can be set to any name, those of ordinary atomic files
as well as those of link- and predicate-directories. This isn't
essential to anything that follows, but it does make things more tidy.
The ability to list the pathnames of a given file makes it useful to
have the pwd point to an atomic file: a command, say

$ ls -P

, can list (some of) the parents of the current file, whether or not
it is a directory. The change also creates consistency with
link-directories, which are non-(predicate-)directory files that can
be the target of the pwd.

On 5/28/05, Alexander G. M. Smith [EMAIL PROTECTED] wrote:
 Leo Comerford wrote on Wed, 18 May 2005 12:50:38 +0100:
  But if you have relation-directories and the ability to find the
  pathnames of a given file, you can do everything you can do with
  subfiles, just as nicely, and more besides. And if subfiles are
  completely redundant and bad news anyway, we shouldn't have them.

 I prefer subfiles (or fildirutes) as being easier to understand.  But
 maybe that's just due to lots of experience with using file hierarchies.
 I can see having a relational system, but I'd always want to also have
 a directory hierarchy namespace, so that all files can be named.

 Having those relationship directories seems kind of clunky - since
 they're not located near the object being investigated.  Though
 that's a GUI matter of making/(something)/friend the system file browser pop 
 up a
 Show Relationships... menu item as contrasted with drilling down
 to a subfile directory listing by clicking on an item.

I'll start with an example here. Imagine a directory,

/(whatever)/portrait

, in which there are portrait photos of a number of men, one photo per
man. Each photo is identified under /(whatever)/portrait by the guy's
first name, so you have

/(whatever)/portrait/Mike
/(whatever)/portrait/Bob

and so on. Now suppose we use link-directories to express father/son
relationships between the guys in the photos. So, for example, if Mike
is Bob's father, we could have

/(something)/father-son/
/(something)/father-son/aardvark:
/(something)/father-son/aardvark:father (which is the file also known
as '/(whatever)/portrait/Mike')
/(something)/father-son/aardvark:son (the file also known as
'/(whatever)/portrait/Bob')

Using these link directories, we can easily express the information in
this (father's-side) family tree:

     Mike  
  |   |
  v   v
 --- Bob --  Ted
 ||   |   |
 vv   v   v
Joe  DeanEd  Todd

, where Mike  Bob means Mike is the picture of the father of the
guy pictured in Bob.

But this is where the clunkiness comes in. The family-tree
representation above is an obvious and natural way to conceive of and
manipulate the father/son relationships. We want there to be a
father-son link straight from Mike to Bob; what's more, we want to be
able to list the children (in the graph sense!) of Mike and see Bob
and Ted, and to move leafward from Mike to Bob or rootward from Bob to
Mike. But when we look at how we expressed the information using
link-directories, we see this instead:

--- /(something)/father-son/ 
|   |
v   v
aardvark - -- zebra

Re: File as a directory - file-as-dir vs. link-dirs (again) - 2/3

2005-11-17 Thread Leo Comerford
(This long essay has been posted in three parts. In order to keep any
replies together, I suggest that people reply to the third part unless
the reply is very specific to one of the others. This is part two, in
which I criticise file-as-directory some more - far from exciting, but
apparently still necessary. Things should pick up in part three.)

But now let's try to express the father's/son's-photo relationships
between the /(whatever)/portrait photos using subfile metadata instead
of link-directories. /(whatever)/portrait/Mike is (the photo of) the
father of (the man pictured in) /(whaterver)/portrait/Bob - how to
express that using files as directories? We could decide that
/(whatever)/portrait/Bob should have the additional pathname
/(whatever)/portrait/Mike/son-photo . But that would mangle the
filesystem semantics: /(whatever)/portrait/Mike/son-photo isNotA
/(whatever)/portrait/Mike . We need to distinguish the links from
files to their metadata files from ordinary directory-to-directory
and directory-to-file links. As the man said, don't try to make things
simpler than possible. So let's call our new pathname
/(whatever)/portrait/Mike;son-photo instead, where ';' is a name
segment delimiter in the same way that '/' or (in my examples) ':' is.
(Having a reserved segment-name like ..metas is an alternative
implementation of the same idea.) Now this seems to work fairly well,
but there are problems. Here are some of them.

Problem one: We can assume that the partial pathname after the ';' ,
from the file-as-directory to the metadata file, describes the
type of relationship between the two files. So, for example,
';son-picture' describes one type of relationship, while others could
be ';friend-picture', ';thumbnail' or ';social-sec-no'. So are all
files in the same namespace as regards these relationship-names or
not? In other words, if I see /(whatever)/foo;aardvark and
/(something)/bar;aardvark , can I always safely assume that
/(something)/bar;aardvark is to /(something)/bar as
/(whatever)/foo;aardvark is to /(whatever)/foo ? If so, then there
will be substantial risk of namespace collisions. So in practise, the
subfile part of filenames will probably have to be fairly
long-winded to minimise the risk: not ';foo' but
';something/not/altogether/unlike/a/third-party/java/package/name/foo'
. If not, if there is some context in which I should interpret what
';aardvark' means, so that it can mean one thing for one
file-as-directory and something else for another, what is that
context and how can I know about it? Might it have something to do
with the file-as-directory's file type? (As defined how?) With one
or more of the pathnames that the file-as-directory might have? By
contrast, the type of a link-directory is defined by the
predicate-directory it is a child of (by a non-opaque link). So the
namespace of link-directory types is the same namespace of pathnames
that all predicate-directories are in. Pathnames aren't necessarily
very concise either, but at least we're not creating a second
namespace, and equivalent pathnames ought to be a lot shorter on
average when you have pathname-listing and advanced searching on
pathnames; for example, a user binary can have the two pathnames
/usr and /bin rather than one long pathname /usr/bin.

Problem two: consider that you discover Mike's photo-of-son by looking
into its subfiles and seeing /(whatever)/portrait/Mike;son-photo ,
while you discover Bob's is-son-photo-of (in effect, its
photo-of-father) by looking through its pathnames and also seeing
/(whatever)/portrait/Mike;son-photo . To find all the relationships
which a given file is involved in, you must check both its subfiles
and its pathnames. And whether a given relationship will be found
among one or the other is arbitrary. Had we chosen to use
;father-photo rather than ;son-photo links, then Bob's metadata would
have been a subfile while Mike's would have been a pathname.

But, one could argue, this is only a problem in the special cases
where both directions of a two-part relationship are worth
expressing. It just so happens that the reverse of the is-son-of
relation is a useful relation to consider. It just happens to be the
case that every man is a father to all his sons; or rather, the
reverse of 'x is the son of y' - 'y has the son x' - happens to be
important enough to have another form, 'y is the father of x'. So in
these special cases, we can create a link in both directions: for
example, we can create both /(whatever)/portrait/Mike;son-photo and
/(whatever)/portrait/Bob;father-photo . Then the user can find all of
a file's useful file-is-dir metadata by inspecting its subfiles, and
so happily ignore its subfile pathnames.

But creating both /(whatever)/portrait/Mike;son-photo and
/(whatever)/portrait/Bob;father-photo means having a cycle in the
representation of some simple non-cyclic data. Also, the fact that
Mike was the parent of Bob through a ;son-photo in the base filesystem
tree 

Re: File as a directory - file-as-dir vs. link-dirs (again) - 3/3

2005-11-17 Thread Leo Comerford
(This is the third and final choke-sized chunk. In order to keep any
replies together, I suggest that people reply to this part unless the
reply is very specific to one of the others.)

File-as-dir is a flawed way of expressing parent-child relations.
Unfortunately, when it comes to relations, expressing two-way
parent-child links and providing a tree view of them is what
file-as-dir does /best/.

Even simple two-way relationships that don't have an obvious
parent-child nature cause additional problems. Say we decided to
create metadata to record which of the men are friends. So if Dean
gets along with his brother Ed we could create

/(something)/friend/aardvark:
/(something)/friend/aardvark:1 (which is the file also known as
'/(whatever)/portrait/Ed')
/(something)/friend/aardvark:2 (which is the file also known as
'/(whatever)/portrait/Dean')

using link-directories. In fact, if we have anonymous last name
segments, we can just create

/(something)/friend/aardvark: , which links anonymously to both the Ed
and Dean photos.

But try to express this using subfiles: which of the two brothers will
we arbitrarily choose to make the subfile of the other?

In general, because the subfile relationship is always parent-child,
to express a symmetric relationship in it we have to make up spurious
extra data, declaring one participant in the relationship to be the
'parent' when no such distinction exists. Ed and Dean are unlikely to
care about this, but try deciding whether Sales worksClosely with
Marketing or Marketing worksClosely with Sales on your firm's
computerised org chart. (Apparently things like LDAPisation projects
have provoked wars over less.) And in the link-directory example using
anonymous links, even the dumbest program that knows nothing about
either /(something)/friends or friendship can tell that
/(something)/friends/aardvark: is symmetric. In the link-directory
example that doesn't use anonymous links, it doesn't know that - and
subfile metadata will actively give it the false parent/child
information. And of course even if we already know that a specific
relationship is symmetric, or if it's not important that we find out,
problems two and four from part two bite hard. For example, reliably
finding all of Ed's friends' photos requires looking for both all his
photo's ;friends children and all its ;friends parents every time. We
have similar problems for relationships that aren't symmetric, but for
which we don't want to have to declare one role to be the parent of
the other. Which party in a is-husband-of/is-wife-of relation should
be indicated as the parent?

Then there are (2)-way relations. Here's a good example of a
three-way relation, lifted from the
Rumbaugh-Blaha-Premerlani-Eddy-Lorensen OO book. Say that we have
files representing programmers, software projects and programming
languages. Now say that, for example, Bob is using Algol 68 on the
Foomatic and both SNOBOL and PL/1 on Project Omega, while Dean is
coding in PL/1 on the Computron and in PILOT on Project Omega, and
Todd is formally specifying the Foomatic in Z. We would represent this
information using link-directories by creating

/(thingy)/impl-lang/aardvark:coder -- /(whatever)/portrait/Bob
/(thingy)/impl-lang/aardvark:lang   -- /bin/algol68
/(thingy)/impl-lang/aardvark:proj-- /(whatever)/projects/foomatic
/(thingy)/impl-lang/zebra:coder -- /(whatever)/portrait/Dean
/(thingy)/impl-lang/zebra:lang   -- /bin/pilot
/(thingy)/impl-lang/zebra:proj-- /(whatever)/projects/foomatic

and so on: one link-directory for each triple of programmer, project
and language. If we want to express the same information using subfile
metadata we are going to have to create something like

/(whatever)/portrait/Bob;impl-lang/1/proj   -- /bin/algol68
/(whatever)/portrait/Bob;impl-lang/1/lang   -- /(whatever)/projects/foomatic
/(whatever)/portrait/Dean;impl-lang/1/proj  -- /bin/pilot
/(whatever)/portrait/Dean;impl-lang/1/lang  -- /(whatever)/projects/foomatic

and so on. Problem two is worse in this case. Not only do we have to
look through the pathnames of /(whatever)/projects/foomatic in order
to find out what programmers are working on it, but in order to find
out what languages Bob is using on the Foomatic we have to find the
/(whatever)/portrait/Bob;impl-lang/* directories among the pathnames
of /(whatever)/projects/foomatic and then examine those directories'
./language names. And to find out what projects Bob is working on, we
have to list all the /(whatever)/projects/* files which are linked
from /(whatever)/portrait/Bob;impl-lang/*/project . All this is
basically the same as working with link-directories using
base-filesystem commands; indeed /(whatever)/portrait/Bob;impl-lang/1
is basically /(thingy)/impl-lang/aardvark: shoved under an arbitrary
choice of one of the three files it relates.

We created tools so that we could handle parent-child relations
expressed as link-directories without clunkiness; naturally we can do
similar 

Erratum

2005-11-17 Thread Leo Comerford
On 11/18/05, Leo Comerford [EMAIL PROTECTED] wrote:

 $ setroot /(whatever)/friend/Ed

This should be

$ setroot /(whatever)/portrait/Ed

- this is what comes of writing things in a hurry

Leo.

--
Leo Richard Comerford - http://www.st-and.ac.uk/~lrc1 - accept no namesakes :)


Re: File as a directory - back to predicates

2005-08-28 Thread Leo Comerford
On 8/25/05, Hubert Chan [EMAIL PROTECTED] wrote:
 On Wed, 24 Aug 2005 07:51:19 +0100, Leo Comerford [EMAIL PROTECTED] said:
 
 [... lots of stuff snipped ...]
 
  At other levels, of course, the differences assert themselves. For one
  thing, the normal Unix filesystem API doesn't have calls to, for
  instance, check the pathnames asserted of a given file. That's
  easily solved; just add the calls.
 
 It's not so easy.  You need to determine how to figure out the
 pathnames.  UN*X filesystems and filesystems for UN*X-like operating
 systems don't store uplinks,

Yes, I know.

 so there's no quick way to figure out the
 pathnames; the only way currently is to traverse the entire tree.

And that's exactly the point. (Less easily solved are the performance
issues.) Again, if you took the expanded API and put a typical Unix
filesystem implementation behind it, you would find that its
performance at things like finding pathnames was abysmally slow, while
its performance at doing the traditional Unix-filesystem things was as
good as ever. Conversely, if you mounted some kind of registry system
instead (or as well) you' d find that it was very fast at finding
pathnames, but very slow at many traditional-Unix-filesystem tasks
(for example rename()ing a directory). Again, consider the analogy of
an abstract collection type with two or more different concrete
implementations. The data model is not any of its implementations.
Just because two different data systems have different performance
characteristics doesn't mean they need to present different data
models.

 P.S. most of the stuff that you're saying is already in the Future
 Vision paper.  At least the main idea of trying to query via metadata.

Future Vision is predominantly about searching from metadata to data.
(Which files are emails about Santa?) It says almost nothing about
going from data to metadata. (Is this file an email?) (This is
especially unfortunate since Future Vision is in large part about how
to improve the effectiveness of search in the real world, and one of
the most ubiquitous, natural and effective real-world search
strategies is to start with an m-to-d search, then apply d-to-m
searching on the results. An example:

I remember Santa flamed somebody out a while ago. Let's see - search
for emails from Santa. Hm, thirty hits. [m-to-d]

Let's take a look... This one here also relates to elves and a strike
- /that's/ what it was about, I remember now! [d-to-m]

Any other elf strike emails from Santa? No, just the one: bingo!
[m-to-d again].)

The one thing it /does/ say about data-to-metadata searching is that
file streams are inelegant, and should be replaced by ... pathname
metadata, yet another way to represent d-to-m metadata that is
separate from file naming. By contrast, my email argues that unifying
all OS namespaces into the file naming system, as proposed by Hans in
Future Vision, is such a good idea that it ought to be applied
properly to d-to-m metadata too. Especially since the only non-bogus
distinction between m-to-d metadata and d-to-m metadata is their
performance requirements.

[snip]
-- 
Leo Richard Comerford - http://www.st-and.ac.uk/~lrc1 - accept no namesakes :)


Re: File as a directory - back to predicates

2005-08-24 Thread Leo Comerford
Firstly, I apologise for the absurdly late reply! Secondly, I'm going
to backtrack for a bit here. Let's forget about relations for the
moment, and concentrate solely on single-place predicates. I also
apologise for (partly) repeating myself at length like this, but this
time I have what I hope is a really nice explanation. So:

Consider an absolutely vanilla registry-type thing of the kind that
Joe Code would produce if you asked him to implement a metadata system
for files/objects/documents/whatever. Users can assign arbitrary
name-value pairs to the objects in the registry; they can also
delete/edit the pairs. And, of course, they can view all the
name-value pairs associated with a given object. The names are
strings. The objects are opaque binaries to the registry. So, in use,
a photo object in the registry might have something like the following
pairs associated with it:

date_taken=2004-03-04
title=My dog Spot
type=photo

(The order in which the pairs are listed is obviously arbitrary.)

Now note that the registry system could use Unix-style segmented
pathnames instead of name-value pairs, in the cases where the value
is a sufficiently short string. For example, we can think of

date_taken=2004-03-04

as alternative to

date_taken/2004-03-04

which could be further parsed up into

date_taken/2004/3/4

. There's nothing magic about name-value pairs: each one simply
asserts a predicate just as a Unix filename does. In fact, a
name-value pair where both the name and value are strings is in effect
just a limited Unix pathname which must have exactly two segments.

So our mark-2 registry system assigns its objects Unix pathnames
instead of name-value pairs, and the photo object might have have the
following ones:

date_taken/2004/3/4
title/My\ dog\ Spot
type/photo

Isn't this new registry fundamentally very similar to a Unix
filesystem, in which each file has one or more such pathnames? No,
better than that. It *is* a Unix filesystem, subject to a few caveats.
It has the same syntax (it's built of files and pathnames); it has the
same semantics (pathnames express propositions which are asserted of
files); all it need is an implementation which can expose these to the
OS by being mounted. So a completely simple-minded iteration of the
registry pattern, without any consideration for namespace
integration, Unix philosophy or what have you, translates almost
effortlessly into a Unix filesystem which uses the standard semantics.

This is strong evidence for the power of that filesystem, using those
semantics, to integrate different namespaces into itself. But more,
our registry is a data-to-metadata system: it is designed to allow the
user to find the metadata that has been associated with a particular
piece of data. (When was this photo taken?, which in our revised
registry becomes what date_taken pathname does this file have?.)
Registry systems in general, file streams, stat blocks, subfile
metadata, and the old Macintosh resource fork are all data-to-metadata
systems. By contrast, the Unix file naming system is a
metadata-to-data system, designed to allow the user to find one or
more pieces of data (files) through the metadata that has been
associated with them. (What photos were shot in 2004?, becoming
what files are in the directory date_taken/2004 ?) But in the
example above, registry data - the metadata of the classical
data-to-metadata system - is being expressed in Unix filenames using
the standard semantics - the language of Unix's classical
metadata-to-data system. This shows the power of the Unix filesystem
to integrate both d-to-m and m-to-d systems into the one namespace
- the one language of pathnames-as-predicates. This is not so
surprising when you consider that at the semantic/logical level both
types of system are exactly the same: they both just associate
metadata with data.

At other levels, of course, the differences assert themselves. For one
thing, the normal Unix filesystem API doesn't have calls to, for
instance, check the pathnames asserted of a given file. That's
easily solved; just add the calls. Less easily solved are the
performance issues. rmdir date_taken/2004 is going to be rather slow
on a registry-type volume which contains many files, just as listing
all the pathnames through date_taken to a particular file is going to
be relatively painful on a volume which is closer to a traditional
Unix filesystem implementation. The important thing here is that these
/are/ *performance issues*, although certainly not trivial ones. By
stripping away the superficial differences between m-to-d and d-to-m
systems, we have revealed the real difference, performance. The
situation is similar to general programming languages. No-one would
dream of creating a language which has, say, two radically different
and incompatible function call interfaces, one of which is supposed to
be used by functions whose time performance is O(n) or better, while
the other is for  O(n) time functions. But of 

Re: Installing Fedora Core with root on Reiserfs

2005-07-18 Thread Leo Comerford
On 7/18/05, Russell Coker [EMAIL PROTECTED] wrote:
 On Monday 18 July 2005 06:01, Edward Shishkin [EMAIL PROTECTED] wrote:
  FC4-test3 (and perhaps FC4) installs its own version of grub which seems
  to interact incorrectly with reiserfs. The problem is that reiserfs.ko
  module located on reiserfs partition can not be loaded.

I can confirm that there is a reiserfs/GRUB problem in the final FC4
release too. (I assume it's the same problem, but I haven't
investigated it.) FWIW - evidently not much - the relevant Fedora
Bugzilla bug would appear to be 161306.


Re: file as a directory

2005-05-18 Thread Leo Comerford
 And then there are ReiserFS plugins, which might give you a magic
 directory that when read for data, yields the concatenation of its
 children's data contents.

Better, you could have a little custom filesystem which can take the
/(something)/concatenation/zebra: subgraph as its device and generate
a single file which is the concatenation of zebra:1, zebra:2, and so
on. (Remember that we can redefine 'file' as 'an atomic file or a link
between files'. So since the zebra: subgraph is a file, we can mount
it!) We could also have, for example, a pair of filesystems, one which
can mount an XML file and present it as an instance of an XML-document
association, and another which can mount such a link and present it as
a flat XML file.

Leo.


Re: file as a directory

2005-05-18 Thread Leo Comerford
On 5/17/05, Alexander G. M. Smith [EMAIL PROTECTED] wrote:

This is a bit of a shotgun reply, but I hope this answers your
questions and clarifies things. If not, please ask again and I'll try
to give a better answer. There are some other things I should add soon
anyway. (Not immediately, though - I'll be very busy over the next few
days.)

In the photo-with-description use-case, if you want to give the photo
a name, don't link to the relationship between the photo and the
description. Just link straight to the photo file itself, more/less as
you would do today. Remember, in my example, the jpeg file's original
name is '~/photos/dessau-bauhaus' . That links straight to the actual
jpeg binary file itself. (The name would be
'~/photos/dessau-bauhaus.jpeg', but in my example we're not using file
extensions to track file type anymore.) And after I've come along and
associated the dessau-bauhaus jpeg with a description,
~/photos/dessau-bauhaus *still* just links straight to the jpeg
binary. As far as the pathname '~/photos/dessau-bauhaus' is concerned,
'/(something)/description/aardvark:described' is just another pathname
that happens to link to the same file as it. Inspecting the pathnames
of the file ~/photos/dessau-bauhaus will reveal
'/(something)/description/aardvark:described' along with all the other
names it has.

The directory /(something)/description/aardvark: is the link between
the photo and the description. (To be clear, the colon isn't part of
any name-segment. It's a delimiter betweeen name-segments: it's the
relation-directory equivalent of the forward-slash.) In OO terms, the
directory /(something)/description/aardvark: isA
'/(something)/description' and it hasA 'description' and a
'described'. Being the child of /(something)/description/ gives the
relation-directory its file type: it's the pathname
'/(something)/description/(whatever)' that tells us to interpret the
relation-directory as a link between a description and the file it
describes. Actually aardvark: already has a file type of
relation-directory - the link from /(something)/description/ specifies
what type of relation-directory it is. Every relation-dir that is a
description-described link is (by definition) in
/(something)/description/ , regardless of which files they have as
their 'description' or 'described'. (This is the sense in which
/(something)/description/ is an association - in OOese the term
means a /type/ of link between objects.)

For example, let's suppose a different user, Bob, decides to attach
some descriptions too. He puts a description on the jpeg file
/home/bob/photos/petit-trianon ; he also decides he doesn't like my
description of /home/leo/photos/dessau-bauhaus and puts his own
description on it. So now we have:

/(something)/description/aardvark:
/(something)/description/aardvark:description   -- this is
/home/leo/dessau-bauhaus
/(something)/description/aardvark:described -- this is my description of it
/(something)/description/manticore:
/(something)/description/manticore:description  -- this is
/home/leo/dessau-bauhaus again
/(something)/description/manticore:described-- this is Bob's
description of it
/(something)/description/sheep:
/(something)/description/sheep:description-- this is
/home/bob/petit-trianon
/(something)/description/sheep:described  -- this is Bob's
description of it

And now if I list the pathnames of ~/photos/dessau-bauhaus, I get both
'/(something)/description/aardvark:described' and
'/(something)/description/manticore:described' - telling me that there
are two different descriptions of the file. And everything still just
links directly to the jpeg file.

About the colon: there's nothing magic about the choice of character,
of course, but there does absolutely need to be a way to identify
links from relation-directories in pathnames. Programs (and humans)
need to be able to mechanically tell when (and where) a pathname
asserts part of a relation and when it asserts a predicate, just by
looking at it. Changing the delimiter is a good way to do it because
it highlights the important point: that the relationship between
aardvark and description is different to the relationship between a
(predicate-)directory and its child. (You could use file extensions as
a makeshift subsititute. How about .: ?)

 Are both methods useful?  Yup.  What's the difference between
 associations and properties?  Many-to-one and one-to-one?

The idea is that if you want to assert a single-place predicate of a
file, like file x is important, you just use give the file an
approprate full pathname ('~/important' or whatever). If you want to
assert a multi-place predicate - a relation - like file x is more
important than file y then you use a relation-directory. That goes
for every kind of multi-way relation/association you might want to
assert between files - one to one, one to many, many to many.
Actually, '/(something)/description' asserts a predicate just like
'~/important' does. But 

Re: file as a directory

2005-05-16 Thread Leo Comerford
 Serialiser.) But note for now that if we
define 'atomic file' as 'just a simple sequence of bytes', we can
redefine 'file' as 'either an atomic file or a relationship between
files'.

To amplify, though, the main, real purpose (so to speak) of
relation-directories is to express relationships between files. Doing
this properly just happens to give us compund objects for free. As
Rumbaugh, Blaha, Premerlani, Eddy and Lorensen say (in unison?),
Aggregation is a special form of association, not an independent
concept. Beware the visual/spatial metaphors which subtly warp one's
understanding of the Unix file system. It's not a set of Russian dolls
or a maze of twisty little passages. In particular, files are not
physically inside directories, at any lelel of abstraction. aardvark:
just provides some metadata about ~/photos/dessau-bauhaus, and that's
all that ~/photos does too.

(Great, isn't it? The filesystem namespace: they're not names, and
it's not a space. :) )

On the other hand, garbage collection will be a significant hurdle,
for two reasons. One is cycles. The semantics of predicate-directories
mean that it's unnecessary to permit cycles containing only
predicate-directories, but if you're going to instances of, say the
singly-linked-list-node relation, then the need for cycles is
unavoidable. The other is more sophisticated needs for automatic
deletion. For example, we would probably need
/(something)/description/aardvark: to be marked for deletion as soon
as either of its children were unlinked.

Leo Comerford.


[reiserfs-list] A clarification

2002-03-07 Thread Leo Comerford

On Thu, 2002-03-07 at 18:37, Nikita Danilov wrote:
 Hans Reiser writes:
   The notion that we need to be able to delete metadata when we delete a 
   file, and that this means we need two way links, is one I have been 
   pondering for ~18 years though I did not address it in my paper.  There 
 
 And udanax people were pondering for it about... oh no... 30? 40?

I do hope you're not under the impression that I claimed to be the first
person to see a need for two-way file links in the context of file
deletion? That would indeed have been chancing my arm, especially since
I haven't read the literature. When I wrote, in my introduction, of
points that no-one had made before, I meant only that no-one had made
them in the public discussion of ReiserFS before. I thought that this
was clear from the context of the paragraph; if not I'm happy to clarify
it now.

Leo Comerford.




[reiserfs-list] New essay on Reiser4 and file metadata

2002-03-06 Thread Leo Comerford

I have written a response to Dr. Reiser's Future Vision and Reiser4
papers, specifically to his proposals for file metadata in the Reiser4
paper. Since it's quite long, I have placed it on the WWW at
http://www.st-andrews.ac.uk/~lrc1/pathname_metadata.html .

In the interests of self-promotion through controversy, the first
paragraph is repeated out of context below: :)

After reading Dr. Hans Reiser's Future Vision paper, I was sold on
(and very excited by) it. I later read the Reiser4 proposal document
. At first I was enthusiastic about this too, especially his
proposed new system of file metadata. As I thought further about it,
however, I started to see shortcomings. By now I remain enthusiastic
about the Future Vision proposals, but I have come to believe that
the metadata system proposed for Reiser4 is a mistake. It's a Jekyll
and Hyde design, which overcomes shortcomings and inelegancies in
*nix at the cost of creating worse ones. Moreover, it's actually a
step away from the Future Vision program. Examination of Future
Vision and the nature of Unix suggests a better way forward.

I am eager to hear peoples' (considered) responses to the essay. A
sequel essay, which will resolve some points left open in the current
one and cover significant new ground, will follow, relatively soon if
there is interest in seeing it.

Leo Comerford.