Hi Andrey!

Sorry, it took me a longer time to dig through the code than
I hoped to. So, if you're still game...

On Jan 6, 2009, at 6:22 AM, andrey mirtchovski wrote:
i'm using zfs right now for a project storing a few terabytes worth of
data and vm images.

Is it how it was from the get go, or did you use venti-based solutions
before?

i have two zfs servers and about 10 pools of
different sizes with several hundred different zfs filesystems and
volumes of raw disk exported via iscsi.

What kind of clients are on the other side of iscsi?

clones play a vital part in the whole set up (they number in the thousands). for what it's worth, zfs is the best thing in linux-world (sorry, solaris and *bsd too)

You're using it on Linux?

Fair enough. But YourTextGoesHere then becomes a transient property
of my namespace, where in case of ZFS it is truly a tag for a snapshot.

all snapshots have tags: their top-level sha1 score. what i supplied
was simply a way to translate that to any random text. you don't need
to, nor do you have to do this (by the way, do you get the irony of
forcing snapshots to contain the '@' character in their name? sounds a
lot like '#' to me ;)

Ok. Fair enough. I think I'm convinced on that point.

snapshots are generally accessible via fossil as a directory with the
date of the snapshot as its name. this starts making more sense when
you take into consideration that snapshots are global per fossil, but
then you can run several fossils without having them step on their
toes when it comes to venti. at least until you get a collision in
blocks' hashes.

Aha! And here are my first questions: you say that I can run multiple fossils off of the same venti and thus have a setup that is very close to zfs clones: 1. how do you do that exactly? fossil -f doesn't work for me (nor should it
       according to the docs)
   2. how do you work around the fact that each fossil needs its own
partition (unlike ZFS where all the clones can share the same pool
        of blocks)?

venti is write-once. if you instantiate a fossil from a venti score it
is, by definition, read-only, as all changes to the current fossil
will not appear to another fossil instantiated from the same venti
score. changes are committed to venti once you do a fossil snap,
however that automatically generates a new snapshot score (not
modifying the old one). it should be clear from the paper.

I think I understand it now (except for the fossil -f part), but how do
you promote (zfs promote) such a clone?

where the second choice becomes a nuisance for me is in the case where
one has thousands of clones and needs to keep track of thousands of
names in order to ensure that when the right one has finished the
right clone disappears.

I see what you mean, but in case of venti -- nothing disappears, really.
From that perspective you can sort of make those zfs clones linger.
The storage consumption won't be any different, right?

- none of this can be done remotely

Meaning?

from machine X in the datacentre i want to be able to say "please
create me a clone of the latest snapshot of this filesystem" without
having to ssh to the solaris node running zfs.

Well, if its the protocol you don't like -- writing your own daemon
that will respond to such requests sounds like a trivial task
to me.

i couldn't find the source for libzfs either, without having to
register to the opensolaris developers' site.
[...]

and i think i'm using a pretty new version of zfs and my experiences
are, in fact, quite recent :)

well, the fact that you had to register in order to access the code
suggest a pretty dated experience ;-)
    http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libzfs/

instead of reverse engineering a library that i have not much faith
in, i wrote a python 9p server that uses local zfs/zpool commands to
do what i could've done with C and libzfs. it's a hack but it gets the
job done. now i can access block X of zfs volume Y remotely via 9p (at
one third the speed, to be fair).

Well, Solaris desperately wanted to enter the Open Source geekdom
and from your experience it seems like it was a success ;-) Seriously
though, I personally found reading source code of zdb to be
absolutely illuminating about all sorts of things ZFS:
    http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/zdb/zdb.c

But yes -- just like with any unruly OS project you have to really invest your time if you want to tag along. I think it was Russ who made a comment
that the Free Software is only free if your time has no value :-(

i would be glad to help you understand the differences between zfs and
fossil/venti with my limited knowledge of both.

Great! I tired to do as much homework as possible (hence the delay) but
I still have some questions left:
0. A dumb one: what's the proper way of cleanly shutting down fossil
    and venti?

   1. What's the use of copying arenas to CD/DVD? Is it purely back up,
    since they have to stay on-line forever?

   2. Would fossil/venti notice silent data corruptions in blocks?

   3. Do you think its a good idea to have volume management be
   part of filesystems, since that way you can try to heal the data
   on-the-fly?

   4. If I have a venti server and a bunch of sha1 codes, can I somehow
   instantiate a single fossil serving all of them under /archive?


Thanks,
Roman.


Reply via email to