Re: [announce] Sharebox, a FUSE filesystem relying on git-annex

2011-04-10 Thread Harley J Pig
I've written a metastore clone for a project where we need to store a
linux distribution in version control (legacy code).  I'm also using
it for my personal vcs-home stuff.  It is a naive and bluntly
straightforward way to do this, but it seems to be working.  You can
find it at https://github.com/harleypig/gitperms

I use git hooks and a central file to (re)store the metadata.  Maybe
it can be of some use to someone else.
-- 
Harley J Pig
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home


Re: [announce] Sharebox, a FUSE filesystem relying on git-annex

2011-04-10 Thread Richard Hartmann
On Sun, Apr 10, 2011 at 16:43, Harley J Pig harley...@gmail.com wrote:

 You can
 find it at https://github.com/harleypig/gitperms

Are you willing to bounce that onto the git list or should I do so?


RIchard
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home

Re: [announce] Sharebox, a FUSE filesystem relying on git-annex

2011-04-10 Thread Harley J Pig
On Sun, Apr 10, 2011 at 09:48, Richard Hartmann
richih.mailingl...@gmail.com wrote:
 Are you willing to bounce that onto the git list or should I do so?

I'm not subscribed to that list, go ahead and post it if you would.  Thank you.
-- 
Harley J Pig
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home


Re: [announce] Sharebox, a FUSE filesystem relying on git-annex

2011-04-10 Thread Richard Hartmann
On Mon, Apr 11, 2011 at 02:07, Harley J Pig harley...@gmail.com wrote:

 I'm not subscribed to that list, go ahead and post it if you would.  Thank 
 you.

Done.


Richard
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home

Re: [announce] Sharebox, a FUSE filesystem relying on git-annex

2011-04-09 Thread Richard Hartmann
On Sat, Apr 9, 2011 at 04:42, Christophe-Marie Duquesne
chm.duque...@gmail.com wrote:

 git-annex does location tracking. Even if you delete the link, the file is
 still there and other repositories know what repositories have the file. If
 you want to be sure the file is always reachable, you have to force a
 repository to act central and to download every files. That is a mount
 option I have already added ( -o getall).

FYI, git-annex gained the ability to use a bup remote. This will solve
all problems in this regard if used correctly and will even give you
indefinite and full history.

As an aside, please look here [1] for a current discussion on how to
store metadata in git, enabling git-annex to do so, enabling any FUSE
front-ends to act more in line with normal file systems. Smudge
filters were mentioned so this must be good ;)


Richard


[1] http://marc.info/?l=gitm=130220380412726w=4
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home


Re: [announce] Sharebox, a FUSE filesystem relying on git-annex

2011-04-08 Thread Christophe-Marie Duquesne
I'll try to gather things I can answer to:

I see you include fuse.py - http://code.google.com/p/fusepy/ - in your repo.
 how does it compare to fuse-python -
 http://pypi.python.org/pypi/fuse-python ?


fusepy is written with ctypes while fuse-python is a full-blown C extension.
At first, I was using fuse-python, but I ended thinking fusepy was less
bloated and less painful (just a file to include versus a library to compile
and install).

where will you store this backup copy? introducing a node/repository which
 will hold backup copies can be considered going to a centralized model;
 which is something you (Christophe-Marie) try to explicitly avoid, but I
 think this is not necessarily a problem)


git-annex does location tracking. Even if you delete the link, the file is
still there and other repositories know what repositories have the file. If
you want to be sure the file is always reachable, you have to force a
repository to act central and to download every files. That is a mount
option I have already added ( -o getall).

This is also an area I hope to improve in git-annex, by using git smudge
 filters. So it might get a mode where files can be modified and git
 commit just annexes the new content.


That would be great. I am not sure using fuse would still be necessary,
then.
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home


Re: [announce] Sharebox, a FUSE filesystem relying on git-annex

2011-04-05 Thread Christophe-Marie Duquesne
Hi

I see there have been some good thoughts given about this. I am
currently on vacation in a place where I do not have internet access.
I'll come back to you in a week.

Regards,
Christophe-Marie
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home


Re: [announce] Sharebox, a FUSE filesystem relying on git-annex

2011-04-03 Thread Richard Hartmann
On Sun, Apr 3, 2011 at 11:35, Dieter Plaetinck die...@plaetinck.be wrote:

 - centralized: have 1 (or more) remotes that always keep a copy of the files 
 which are being removed on all other remotes, these would be backup-nodes, 
 they don't follow the strict always in sync rule that applies to the 
 regular nodes. (they follow the original git-annex idea more strictly)

FWIW, there has been talk about using bup as a storage back-end for
git-annex. That would allow you to keep full revision history and all
files in one or two main locations and just use plain git-annex on the
other ones.


 - decentralized: allow users to remove files by removing the symlink, but 
 still keep the blob in .git-annex on at least one of the nodes, so that it 
 can be restored from that.

Leaving a stale object in the store that no one really knows about
seems like an extremely bad idea. And even if git-annex were able to
track its existence internally while hiding the symlink from the user,
I fear this would cause confusion. I would prefer a way to properly
delete a file from all repos, but the bup-backed one would obviously
still keep everything around. Of course, you wouldn't need the bup
back-end for your podcasts, but for photos or other important personal
data, it would be useful.


Richard
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home


Re: [announce] Sharebox, a FUSE filesystem relying on git-annex

2011-04-03 Thread Joey Hess
Dieter Plaetinck wrote:

 I think having support for this in git-annex would be very useful,
 even if it's not that efficient: if this can be dealt with in
 git-annex, individual higherlevel projects like sharebox and
 dvcs-autosync have less headaches.  Not to mention
 sharebox/dvcs-autosync would need to do really inefficient things to
 deal with it anyway. (because they can't involve themselves into the
 actual git/dvcs tricks, they work on a higher level of abstraction),
 and it might also benefit some users who work with git-annex manually.
 How do you see this? How hard/cumbersome is it to implement this in
 git-annex? Why is it inefficient?  It's not really clear to me after
 reading the smudge information on
 http://www.kernel.org/pub/software/scm/git/docs/gitattributes.html

http://git-annex.branchable.com/todo/smudge/

  if toobig
  then git_annex_add file
  else git_add file
  git_commit file
 
 unfortunately I don't think so:
 - with dvcs-autosync we often commit early, as in, the file could still be 
 in the process of being written to, or it could be modified again after we 
 added it.
 From what I understand, we would need to forbid our users from changing the 
 file after it is added to git-annex, and worse: if git-annex does its move 
 file, replace file with symlink trick, while the user is writing to it, this 
 might break things.

You're right. However, you would also not want to commit many partial
versions of a large file as it was being written.

 - when a remote A pulls in the changes from remote B, for dropbox-like 
 behavior it should also automatically:
  * run `git annex get`
  * git commit .git-annex/*/*.log
 Does this seem about right?

Yes.

 - deletes will also need to propagate automatically (see next paragraph), 
 still need to figure out how to do that best.
 Note that dropbox-like behavior is different from the behavior you usually 
 expect from git-annex users.
 * usual git-annex behavior: every remote stands on it's own, there is no 
 forced being in sync, so that deletes must happen as initiated by the user, 
 and this way you can prevent them from removing files if you expect it could 
 be the last instance of the file.
 * dropbox-like : remote A remove a file - *all other remotes* should remove 
 the file, so that their working copy looks the same. BUT the file should 
 still be available *somewhere* so that a restore can be initiated (preferably 
 from any of these nodes)
 
 I see two solutions here:
 - centralized: have 1 (or more) remotes that always keep a copy of the files 
 which are being removed on all other remotes, these would be backup-nodes, 
 they don't follow the strict always in sync rule that applies to the 
 regular nodes. (they follow the original git-annex idea more strictly)
 - decentralized: allow users to remove files by removing the symlink, but 
 still keep the blob in .git-annex on at least one of the nodes, so that it 
 can be restored from that.

Yes, that's the default behavior if the symlink is removed. There is
then a git annex unused pass that can be used to find and remove unused
content when space is needed. Given the size of modern drives, that
could be run nightly or something.

-- 
see shy jo


signature.asc
Description: Digital signature
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home

Re: [announce] Sharebox, a FUSE filesystem relying on git-annex

2011-04-03 Thread Joey Hess
Richard Hartmann wrote:
 I know Joey pondered this as well, you will find some references on
 git-annex' ikiwiki. This is needed for S3 in the medium term, anyway.
 
 Basically, the plan is to encrypt the files with a symmetric key and
 then allow access to that key via other keys. That way, you can share
 some files between machines/people and still make sure no one gets at
 stuff they shouldn't.
 
 The way to encrypt object files' names is still somewhat open to
 discussion, afaik.
 
 
 Classical dilemma: Where should this be discussed? On this list or
 within the ikiwiki? Maybe everyone interested should read through the
 ikiwiki and after some discussion here, we can dump use cases, design
 decisions etc back into ikiwiki as a TODO once Joey is happy with it?

I've put together my current thoughts at
http://git-annex.branchable.com/design/encryption/
Comments appreciated in any medium (except watercolors).

-- 
see shy jo


signature.asc
Description: Digital signature
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home

Re: [announce] Sharebox, a FUSE filesystem relying on git-annex

2011-04-03 Thread Dieter Plaetinck
On Sun, 3 Apr 2011 11:18:05 -0400
Joey Hess j...@kitenet.net wrote:

 Dieter Plaetinck wrote:
 
  I think having support for this in git-annex would be very useful,
  even if it's not that efficient: if this can be dealt with in
  git-annex, individual higherlevel projects like sharebox and
  dvcs-autosync have less headaches.  Not to mention
  sharebox/dvcs-autosync would need to do really inefficient things to
  deal with it anyway. (because they can't involve themselves into the
  actual git/dvcs tricks, they work on a higher level of abstraction),
  and it might also benefit some users who work with git-annex manually.
  How do you see this? How hard/cumbersome is it to implement this in
  git-annex? Why is it inefficient?  It's not really clear to me after
  reading the smudge information on
  http://www.kernel.org/pub/software/scm/git/docs/gitattributes.html
 
 http://git-annex.branchable.com/todo/smudge/
 
 if toobig
 then git_annex_add file
 else git_add file
 git_commit file
  
  unfortunately I don't think so:
  - with dvcs-autosync we often commit early, as in, the file could still 
  be in the process of being written to, or it could be modified again after 
  we added it.
  From what I understand, we would need to forbid our users from changing the 
  file after it is added to git-annex, and worse: if git-annex does its move 
  file, replace file with symlink trick, while the user is writing to it, 
  this might break things.
 
 You're right. However, you would also not want to commit many partial
 versions of a large file as it was being written.

Well, if it ever happens once, that's once too many.

Since we're aiming for a dropbox-like near-instant-synchronisation system, the 
way of working is different then when using git for , say.. version controlling 
source code. So it _will_ happen that we commit versions of files as they are 
in the progress of being written.  Even if the user decides to store something 
like a continuously being updated logfile in his dropbox-like system, I want to 
be able to support that.


___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home


Re: [announce] Sharebox, a FUSE filesystem relying on git-annex

2011-04-02 Thread Joey Hess
Dieter Plaetinck wrote:
 @Joey: you mentioned you think inotify might be a better
 backend/paradigm for this than fuse, so do you think implementing
 git-annex in something like dvcs-autosync is feasible? and/or
 preferable?

Feasable? Certianly. Preferable? I'm in the let a thousand flowers
bloom phase. It's spring. :)

As Christophe-Marie has pointed out, git-annex makes annexed files
semi-immutable, and FUSE can hide that quirk, while inotify watching cannot.
That could be confusing for certian users or use cases, if they are not
aware of what is going on. Or it could be something quickly learned
about how these special replicated directories work, that files have to
be copied to be changed.

This is also an area I hope to improve in git-annex, by using git smudge
filters. So it might get a mode where files can be modified and git
commit just annexes the new content. Last time I looked at this, git was
not *quite* there to let it be done efficiently.

 I quite like dvcs-autosync (partially because inotify is more simple
 than fuse, partially because it currently works already quite well) and I'm
 interested in making it support space efficient storage of big files;
 from what I've read it should be possible to do this with git-annex
 (which should not even change how we currently deal with small files,
 they would still be in git) but I'm still doing my first baby steps
 with git-annex so I wouldn't know. Advice very welcome..

All it probably needs at is simplest is something like this
(excuse the haskell):

toobig - checkFileSize file
if toobig
then git_annex_add file
else git_add file
git_commit file

 Another note : files being tracked with git-annex through sharebox or
 dvcs-autosync or whatever should always have at least 1 backup copy,
 so that if the file gets deleted everywhere, it still can be retrieved
 from somewhere (which raises the interesting question: where will you
 store this backup copy? introducing a node/repository which will hold
 backup copies can be considered going to a centralized model; which is
 something you (Christophe-Marie) try to explicitly avoid, but I think
 this is not necessarily a problem)

This is something git annex goes to large lengths to deal with.
It will enforce N backup copies; it tracks which other repositories
have which files; it can transfer wanted file contents from other
repositories in either a decentralized or a centralized manner; the
other repositories can be on other drives of the same computer, or
accessible by ssh, or even, now, Amazon S3.

-- 
see shy jo


signature.asc
Description: Digital signature
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home

Re: [announce] Sharebox, a FUSE filesystem relying on git-annex

2011-03-31 Thread Dieter Plaetinck
On Thu, 31 Mar 2011 18:56:54 +0200
Christophe-Marie Duquesne chm.duque...@gmail.com wrote:

 Hi,
 
 I am currently writing a FUSE file system based on git-annex for
 replicating binary files on several machines. I thought I could share
 it here in order to get some ideas and contributors.
 
 What are your goals?
 Seamless synchronization à la dropbox.
 Ability to use with big binary files such as mp3/movies.
 Entirely decentralized.
 Don't use unnecessary space
 Keep it simple: avoid special VCS commands and keep a filesystem
 interface as much as possible.

you also need to do various git/git-annex commands, or am I missing something?
 
 Why?
 Because sparkleshare and dvcs-autosync are bad at versioning binary files

I quite like dvcs-autosync, but it indeed lacks space-efficient storage of big 
files.
I would like to try if we can use git-annex to support this in dvcs-autosync, 
although AFAIK git-annex is not transparent in the way regular git is 
transparent (i.e. it needs to explicitly copy files between locations), I 
assume this is the reason you need to go for a FUSE-based approach? or do you 
just prefer this over regular fs + inotify?

 Because Unison needs disk space for each couple of hosts it
 synchronizes and thus does not really scales for more than 2 hosts
 Because Coda is not completely decentralized and it bothers me

you actually tried coda? it's something I'm interested in, on paper it looks 
like an awesome, maybe-even-perfect open source dropbox-clone but the reality 
is probably different, I never tried it so I wouldn't know.
 
 What do you have?
 A python implementation. It is about 600 sloc, and you'll find it on
 https://github.com/chmduquesne/sharebox
 Be careful, it is very alpha and it still does not have a proper
 conflict handler.
 
 Hey, but copying is slow!
 On my machine, copying files to a sharebox fs is about 10 times slower
 than copying it on a normal fs. All the time is spent in python's
 os.write(): I guess the only way to work around this problem is to
 rewrite the whole thing in C, but I am keeping this for later.

hmm, writing files is i/o-bound, I doubt the language will have much effect 
here.
check with top/vmstat if you get iowait, if so your storage medium is getting 
saturated and rewriting in C won't help. maybe a network/buffering/.. issue.

 I am interested in:
 - suggestions for the functional design (I have my ideas, but I'd love
 to be challenged).

in your REAMDE you suggest to use a crontab for synchronisation; maybe you can 
reuse/be inspired by the xmpp system dvcs-autosync uses; it works quite well, 
it's quite robust and it's instant :)


Dieter
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home

Re: [announce] Sharebox, a FUSE filesystem relying on git-annex

2011-03-31 Thread Christophe-Marie Duquesne
On Thu, Mar 31, 2011 at 8:04 PM, Dieter Plaetinck die...@plaetinck.be wrote:
 you also need to do various git/git-annex commands, or am I missing something?

Ideally, that would be only at set up time.

 I quite like dvcs-autosync, but it indeed lacks space-efficient storage of 
 big files.
 I would like to try if we can use git-annex to support this in dvcs-autosync, 
 although AFAIK git-annex is not transparent in the way regular git is 
 transparent (i.e. it needs to explicitly copy files between locations), I 
 assume this is the reason you need to go for a FUSE-based approach? or do you 
 just prefer this over regular fs + inotify?

I don't really like FUSE, and I would actually prefer using inotify,
but I think it would not be transparent enough. I think a filesystem
is the right abstraction here.

 you actually tried coda? it's something I'm interested in, on paper it looks 
 like an awesome, maybe-even-perfect open source dropbox-clone but the reality 
 is probably different, I never tried it so I wouldn't know.

I did not try it, but I looked at the documentation. It is not purely
decentralized: some machines are servers, others are clients and the
roles stay the same (If I believe this page:
http://www.coda.cs.cmu.edu/ljpaper/lj.html).

 hmm, writing files is i/o-bound, I doubt the language will have much effect 
 here.
 check with top/vmstat if you get iowait, if so your storage medium is getting 
 saturated and rewriting in C won't help. maybe a network/buffering/.. issue.

I'll have a look. Actually to come to this conclusion, I used the
loopback-fs provided by fusepy, which just mirrors another part of
your file system, and I timed the copy of an iso. This copy was 10
times slower than on a real fs (60 seconds instead of 6). I concluded
that this was due to python. I have about the same performance on my
filesystem. I'll complete the experiment tomorrow with fuse_xmp, which
is another fuse loopback-fs, but done in C.

 in your REAMDE you suggest to use a crontab for synchronisation; maybe you 
 can reuse/be inspired by the xmpp system dvcs-autosync uses; it works quite 
 well, it's quite robust and it's instant :)

Yes. I had a 'sync=xx' option, for specifying an interval time between
synchronisations, but I removed it for this very reason.
___
vcs-home mailing list
vcs-home@lists.madduck.net
http://lists.madduck.net/listinfo/vcs-home