Re: A niche for the Hurd - next step: reality check

olafBuddenhagen Wed, 12 Nov 2008 05:22:38 -0800

Hi,

On Wed, Oct 29, 2008 at 04:00:49PM +0100, Arne Babenhauserheide wrote:
> Am Mittwoch 29 Oktober 2008 12:16:58 schrieb [EMAIL PROTECTED]:


> I'll digest it in little pieces and answer directly... 

I actually already considered splitting it up myself :-)

> > > #### Give back power to users:
> >
> > While this was indeed the main idea behind the Hurd design, it is
> > rather vague for the most part. We have the architecture which
> > potentially gives users more power, but we have very few actual use
> > cases for that...
> 
> I attached one: 
> 
> - You have 1.000.000 files to store, which makes a simple "ls" damn
> slow. - So you develop a simple container format with reduced metadata
> and specialized access characteristics. - Now you want to make that
> cotainer accessible via the normal filesystem. 
> 
> Please check the two attached presentations to see the pain this
> causes using Linux. 

I must admit that I fail to read the "pain" in these presentations...

The only problem with FUSE in this context seems to be performance. I
wonder whether Hurd translators would do better on that score.

The container stuff itself is quite interesting BTW. It's closely
related to some things I have been pondering about -- yet another thing
I'm meaning to blog about One Of These Days (TM)...

I realized at some point that for some hurdish applications, we need a
way to store fine-grained structured data. What is the best approach for
that?

One way is to put it into a file, using some structured file format.
(XML, s-expr, or the likes.) The problem is that changing
(adding/removing/replacing) individual pieces of data inmidst of the
file is both awkward and inefficient: It requires rewriting the file
starting from the affected region up to the end. Also, accessing
individual data items is quite complicated, as it always requires a
parser for the respective format.

Storing as a large directory tree on the other hand allows for very easy
and efficient access and updates of individual items. However, it takes
a lot of disk space. (Due to redundant file metadata like permissions
etc., and also the internal structure of the filesystem imposing a lot
of overhead with many tiny files.) And working with a whole set of data
items at once (e.g. copying a subtree, or replacing a whole subtree)
becomes quite awkward.

First I was thinking of some kind of DB translator, which stores the
data in a normal file, but instead of storing the contents linearily,
uses some internal allocation mechanism -- just like a full-blown DBMS.

I soon realized though that this would be too rigid in many cases: Often
it is useful to access the *same* data in different ways, depending on
context. The storage mechanism and the access method are indeed quite
orthogonal -- what we really want is the ability to access *any* data
both through a directory tree or through a structured file interface as
needed. Whether the data is actually stored in individual files, or in a
container, should be totally transparent.

So on the frontend we want a dual interface that allows accessing the
data either as directory trees or as structured files. On the backend, a
normal filesystem, with the aid of containers where appropriate, could
serve as a temporary solution -- but in the long run, we probably want a
special filesystem, allowing both efficient storage of complex
structures and efficient access/update of individual items at the same
time. I wonder whether this could be implemented as an extension of some
existing filesystem, or rather some completely new approach is
required...

-antrik-

Re: A niche for the Hurd - next step: reality check

Reply via email to