[Freenet-dev] A Filesystem on top of Freenet

Joseph Solbrig Sun, 28 May 2000 15:10:47 -0700

At 11:34 AM 5/28/2000 +0300, you wrote:
>(There really should be a freenet-apps list.  Maybe rename freenet-client?)
>


I started posting to freenet-client but all the replies came back from
freenet-dev. freenet-client seems to be configured so replies to the
freenet-client posts by default don't go back to the freenet-client list -
perhaps someone could look how the list-serv is configured. 

>Applications based off Freenet will probably have some common design
>patterns, and I've been thinking about one of them as a way of implementing
>a filesystem-ish system over Freenet.  

YES!

And considering that there has been a lot of research over the years about
what makes a really good file system, I hope that we place ourselves at the
end of this research rather than being haphazard about what would minimally
work. I don't believe that any of the "limitations" of freenet would
prevent us from creating the most modern possible system - indeed, I
wouldn't think of them as limitations at all. 

I would submit that a good goal would be to be able to publish, view and
update an arbitrary, updatable XML document on freenet - XML is a
technology that is designed to and being embraced as a way to unify
numerous protocols and create more. This would require us to also some
internal search conventions distinct from straight XML but it would be
possible to create something that appeared to the outside as an XML document. 
(though I also think there are things that need to be extended with XML -
though I haven't reviewed the "schema" structure which replaces DTDs as the
base definition mechanism for XML dialects). 

I believe that the system I've been developing can be extended into such an
XML view/interface for freenet.  Although it's current syntax may look
entirely different, it's structure is very similar to XML - statements
consist of element-productions and element-containment. 

In any case, this would mean that the public interface could be one of the
public interfaces of XML - DOM (document object model) or SAX. 

>You could for example use very
>similar code to create a email based system on top Freenet (using
>SMTP->Freenet and Freenet->IMAP apps), but with the unfortunate property
>that you could read every single email sent to any person (as long as you
>know their "email".  

Well, theoretically, encryption should take care of that. 

>
>===========================================================
>
>


>We assume an unlimited number of "filesystems", by tacking on an arbitrary
>prefix to keys representing directory structure.  In other words, I can
>choose "/filesystems/areallylongprefix376/" as a prefix, and then when I
>read the file "/directory/dir2/" I use the key
>"/filesystems/areallylongprefix376/directory/dir2/" as the Freenet KHK for
>the file.
>
Hmm, this is similar to what I've proposed and implemented with my viewer
object. I've got a file describing the language at
www.crucialquestions.com. I could update it to show give future freenet
directions. 

>
>The problem with storing directory information in a Freenet doc is that we
>cannot Update it every time we change the contents of the directory. 
>Consider - 
>
>1. Alice loads directory listing, adds a file, and updates the directory
>listing with her file added.
>2. Bob does the same at the same time.
>
>If Bob loads the directory listing before Alice updated it, after Bob's
>update Alice's new file will not be in the directory listing.
>
>
>Therefore, instead of using (unimplemented) update mechanism, we use Inserts
>and Updates.  Let's say we have a directory '/dir1' - in this directory we
>will have two special files, '.listing' and '.approxrevision'.
>
>'.listing' is not in fact a single file.  Instead, we first create
>'.listing0', and then every time we change the contents of the directory we
>insert a new '.listing' file with an added 1 to it's old number.  So, we
>have '.listingN' where N = 0, 1, 2, 3, 4, ....
>
>'.approxrevision' contains the approximate revision number that '.listingN'
>has reached.  Since we usually only want the latest revision of '.listing',
>we don't want to start checking if 'listiing0' exists, then '.listing1' etc.
>till we reach the latest one, especially since older ones might have
>expired.  Instead, we start searching from the number listed in
>'.approxrevision', which we should Update every 10 or so revisions of
>'.listing'.

This is a good summary of the problem and general solution. 
Yeh, these approxrevision "clues or index headings" are a cool idea. For
document with many revisions, you could use another clue every hundred
revisions. But then you're going to be doing two or three accesses for
every document object. 
I personally that think the another good way to deal with this is for
freenet itself to be able keep the keys and headers longer than it keeps
the actual documents.  
Malicious insertion wouldn't complicate our problem if we filter for only
signed documents but hapzard removal could.  After all, the approxrevisions
item could expire - maybe even some keys and not others would expire and
then revisions would start to bifurcate - essentially key expiration could
wreck havoc on these schemes no matter what. 

I would say that best way to get around these different trade-off is to
have the directory structure configurable by who-ever sets the directory or
key. 

[inserting from another post]
>
>What happens if a malicious person starts inserting .listingNN files
>which are incorrect?  Say, all the files in the directory are deleted,
>or have been changed to point to commercial spam?  Any system which
>allows everyone to perform updates could have this problem.
>
>Hal
>
This problem could be solved by having the original directory specification
be signed and having a viewer/interface filter out all those posts which
don't have a signature for their in their header. 
Essentially, a viewer or application would load commands by specifying
which files could be found where and how to filter these files. As it
searches for further files, it would also get further commands. Some
directory structures will be set-up to accommodate multiple revisions,
others to accommodate multiple sub-directories and so-forth. As long as
this can all be read by single reader/viewer, it's OK if there are
different structures to different directory-type objects. (And the
directory structure specification could itself updated according its own
original specification). 

And the huge variety of requirements is certainly what will drive the need
for configurable system. 
Consider a file or files that are updated continuously - every day or every
hour. The convention of just adding to a revision number wouldn't be very
efficient, you'd have to iterate through thousands of documents. 
The directory could be given the convention that a current file has the
structure Key+Datetime+f+N. The trick is that you would specify the chunk
size of the Datetime variable. You wouldn't want to do a backward search
for the seconds chunk size if thing were being updated every hour. But you
might want seconds if the documents are being updated everything second. 

Moreover if you don't have a programmable specification, all the
application writers would have to sign off on the original file-system
specification. 

>
>
>Why is storing the directory contents in numbered "files" a good idea? 
>Consider our original scenario - Bob loads directory listing - say,
>'.listing31', Alice does the same, adds new file and inserts an updated
>'.listing32'.
>
>Now, Bob adds a file, and tries to insert '.listing32'.  Since it already
>exists, the Freenet *returns it to him* - and now he sees that, refreshes
>his own directory listing to match the newer version, and then inserts
>'.listing33' that also contains the changes Alice made.
>
>
>Now that we know how to store the data, we have to decide what to store. 
>The simplest info we can store is a list of filenames, and then the key for
>the filenames can be matched to a Freenet doc by appending the filename to
>the "directory path" - the directory's key with a '/' tacked on.  So if the
>directory is '/dir/a', the file is 'readme.txt' and the prefix is
>'/fs/ac432' then the directory's "key" is '/fs/ac432/dir/a (and it's actual
>info is in '/fs/ac432/dir/a/.listingN' and
>'/fs/ac432/dir/a/.approxrevision') and the file's key is
>'/fs/ac432/dir/a/readme.txt'.
>
>In fact, we should not only store the list of files (and whatever metadata
>we fell necessary - e.g. we decide we want a way to differentiate between
>files and directories) but also the list of changes made between each
>revision.  That is, '.listing3' might say (in a more terse format, of
>course):
>
>       Files in this directory:
>       readme1.txt
>       
>       Changes:
>       Revision 1 - added file test.txt
>       Revision 2 - deleted file test.txt
>       Revision 3 - added file readme1.txt
>
>Every 50 or so revisions we can drop the changelog for the previous 50
>revisions, so for example '.listing100' will only list changes from 99 to
>100, and if we need them the changes from 50 -99 are im '.listing99' and
>from revision 0 - 49 in '.listing49'.
>
>Why is this changelog necessary?  Since we have a few people working at once
>on a directory, and it might take different amount of time for their changes
>to propagate, we might have different versions of the same revision N
>'.listingN' on the Freenet at the same time, etc..  This way, it's much
>easier to resynchronize so that everyone has the same version of the
>filesystem.
>

Again, this seems like good system. But still, since different applications
will different specific requirements for directories, why not use a single
configurable directory language to specify these rather than having folks
agree with the hoary details that would be required for this. 

This doesn't preclude a particular directory structure being used widely.
Indeed, if the directory structure was a series of commands, further
features could be added to it by signed updates.  

A lot of consideration like this go into things like the internals of ODBC
or other interfaces between databases. You have a high-interface with a
series of functions that it tries to simulate using whatever low-level
interface is out there, using what it has and simulating the rest. 

Now, the point also is if we start with just ad-hoc search criteria
implemented without a configurable system, we be stuck with it - a later
application would have to replace an earlier system rather than extending
it. This could result in different application fighting each other. 

>-- 
>Itamar S.T.  itamar at maxnm.com
>
>_______________________________________________
>Freenet-dev mailing list
>Freenet-dev at lists.sourceforge.net
>http://lists.sourceforge.net/mailman/listinfo/freenet-dev
>
>


_______________________________________________
Freenet-dev mailing list
Freenet-dev at lists.sourceforge.net
http://lists.sourceforge.net/mailman/listinfo/freenet-dev

[Freenet-dev] A Filesystem on top of Freenet

Reply via email to