Re: Storing permissions

2005-04-17 Thread David A. Wheeler
Linus Torvalds wrote:
On Sat, 16 Apr 2005, Paul Jackson wrote:
Morten wrote:
It makes some sense in principle, but without storing what they mean
(i.e., group==?) it certainly makes no sense. 
There's no they there.
I think Martin's proposal, to which I agreed, was to store a _single_
bit.  If any of the execute permissions of the incoming file are set,
then the bit is stored ON, else it is stored OFF.  On 'checkout', if the
bit is ON, then the file permission is set mode 0777 (modulo umask),
else it is set mode 0666 (modulo umask).

I think I agree.
Anybody willing to send me a patch? One issue is that if done the obvious
way it's an incompatible change, and old tree objects won't be valid any
more. It might be ok to just change the compare cache check to only care
about a few bits, though: S_IXUSR and S_IFDIR.
There's a minor reason to write out ALL the perm bit data, but
only care about a few bits coming back in: Some people use
SCM systems as a generalized backup system, so you can back up
your system to an arbitrary known state in the past
(e.g., Change my /etc files to the state I was at
just before I installed that *#@ program!).
For more on this, see:
 http://www.onlamp.com/pub/a/onlamp/2005/01/06/svn_homedir.html
If you store all the bits, then you CAN restore things
more exactly the way they were.  This is imperfect, since
it doesn't cover more exotic permission
values from SELinux, xattrs, whatever.  For some, that's enough.
Yeah, I know, not the main purpose of git.  But what the heck,
I _like_ flexible infrastructures.
--- David A. Wheeler
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Storing permissions

2005-04-17 Thread Paul Jackson
David wrote:
 There's a minor reason to write out ALL the perm bit data, but

There's always the 'configurable option' approach.

Someone, I doubt Linus will have any interest in it, could volunteer to
make the masks of st_mode, used when storing and recovering file
permissions, be configurable by some environment variable settings,
which default to whatever Linus provided.

But, in general, if you want a generalized backup system, git is not it.

Git skips all files whose name begins with the dot '.' character, and
anything that is not a regular file or directory.  Git makes no
concessions to working adequately on file systems lacking normal inode
numbers (such as smb, fat, vfat).  Git obscures the archive format a
modest amount, for pure speed and to encourage use only via appropriate
wrappers.  Git is tuned for blazing speed at the operations that Linus
needs, not for trivial recovery, using the most basic tools, under harsh
circumstances.

The basic idea of using such an 'object database' (though I dislike that
term -- too high falutin vague) of files stored by their hash is a
good one.  But a different core implementation is needed for backups.

I have one that I use for my own backups, but it is written in Python,
and uses MD5, one or the other of which likely disqualifies it from
further consideration by half the readers of this list.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Storing permissions

2005-04-17 Thread Linus Torvalds


On Sun, 17 Apr 2005, David A. Wheeler wrote:
 
 There's a minor reason to write out ALL the perm bit data, but
 only care about a few bits coming back in: Some people use
 SCM systems as a generalized backup system

Yes. I was actually thinking about having system config files in a git 
repository when I started it, since I noticed how nicely it would do 
exactly that.

However, since the mode bits also end up being part of the name of the 
tree object (ie they are most certainly part of the hash), it's really 
basically impossible to only care about one bit but writing out many bits: 
it's the same issue of having multiple identical blocks with different 
names.

It's ok if it happens occasionally (it _will_ happen at the point of a
tree conversion to the new format, for example), but it's not ok if it
happens all the time - which it would, since some people have umask 002
(and individual groups) and others have umask 022 (and shared groups), and
I can imagine that some anal people have umask 0077 (I don't want to play
with others).

The trees would constantly bounce between a million different combinations 
(since _some_ files would be checked out with the other mode).

At least if you always honor umask or always totally ignore umask, you get 
a nice repetable thing. We tried the always ignore umask thing, and the 
problem with that is that while _git_ ended up always doing a fchmod() 
to reset the whole permission mask, anybody who created files any other 
way and then checked them in would end up using umask.

One solution is to tell git with a command line flag and/or config file 
entry that for this repo, I want you to honor all bits. That should be 
easy enough to add at some point, and then you really get what you want.

That said, git won't be really good at doing system backup. I actually 
_do_ save a full 32-bit of mode (hey, you could have immutable bits 
etc set), but anybody who does anything fancy at all with mtime would be 
screwed, for example.

Also, right now we don't actually save any other type of file than
regular/directory, so you'd have to come up with a good save-format for
symlinks (easy, I guess - just make a link blob) and device nodes (that
one probably should be saved in the cache_entry  itself, possibly
encoded where the sha1 hash normally is).

Also, I made a design decision that git only cares about non-dotfiles. Git 
literally never sees or looks at _anything_ that starts with a .. I 
think that's absolutely the right thing to do for an SCM (if you hide your 
files, I really don't think you should expect the SCM to see it), but it's 
obviously not the right thing for a backup thing.

(It _might_ be the right thing for a system config file, though, eg 
tracking something like /etc with git might be ok, modulo the other 
issues).

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Storing permissions

2005-04-17 Thread David A. Wheeler
Linus Torvalds wrote:
On Sun, 17 Apr 2005, David A. Wheeler wrote:
There's a minor reason to write out ALL the perm bit data, but
only care about a few bits coming back in: Some people use
SCM systems as a generalized backup system
Yes. I was actually thinking about having system config files in a git 
repository when I started it, since I noticed how nicely it would do 
exactly that.

However, since the mode bits also end up being part of the name of the 
tree object (ie they are most certainly part of the hash), it's really 
basically impossible to only care about one bit but writing out many bits: 
it's the same issue of having multiple identical blocks with different 
names.
...
One solution is to tell git with a command line flag and/or config file 
entry that for this repo, I want you to honor all bits. That should be 
easy enough to add at some point, and then you really get what you want.
Yes, I thought of that too.  And I agree, that should do the job.
My real concern is I'm looking at the early design of the
storage format so that it's POSSIBLE to extend git in obvious ways.
As long as it's possible later, then that's a great thing.
...
Also, I made a design decision that git only cares about non-dotfiles. Git 
literally never sees or looks at _anything_ that starts with a .. I 
think that's absolutely the right thing to do for an SCM (if you hide your 
files, I really don't think you should expect the SCM to see it), but it's 
obviously not the right thing for a backup thing.
Again, a command line flag or config file entry could change that
in the future, if desired.  So this is a decision that could be
changed later... the best kind of decision :-).
--- David A. Wheeler
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Storing permissions

2005-04-16 Thread Junio C Hamano
 PJ == Paul Jackson [EMAIL PROTECTED] writes:

PJ That matches my experience - store 1 bit of mode state - executable or not.

Sounds like svn ;-).

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Storing permissions

2005-04-16 Thread Paul Jackson
Junio wrote:
 Sounds like svn 

I have no idea what svn is.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Storing permissions

2005-04-16 Thread Morten Welinder
 Does it really make sense to store full permissions in the trees? I think
 that remembering the x-bit should be good enough for almost all purposes
 and the other permissions should be left to the local environment.

It makes some sense in principle, but without storing what they mean
(i.e., group==?) it certainly makes no sense.  It's a bit like unpacking a
tar file.

I suspect a non-readable file would cause a bit of a problem in the low-level
commands.

Morten
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Storing permissions

2005-04-16 Thread Paul Jackson
Morten wrote:
 It makes some sense in principle, but without storing what they mean
 (i.e., group==?) it certainly makes no sense. 

There's no they there.

I think Martin's proposal, to which I agreed, was to store a _single_
bit.  If any of the execute permissions of the incoming file are set,
then the bit is stored ON, else it is stored OFF.  On 'checkout', if the
bit is ON, then the file permission is set mode 0777 (modulo umask),
else it is set mode 0666 (modulo umask).

You might disagree that this is a good idea, but it certainly does
'make sense' (as in 'is sensibly well defined').

 I suspect a non-readable file would cause a bit of a problem in the low-level
 commands.

Probably so.  If someone sets their umask 0333 or less, then they are
either fools or QA (software quality assurance, or test) engineers.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Storing permissions

2005-04-16 Thread Linus Torvalds


On Sat, 16 Apr 2005, Paul Jackson wrote:

 Morten wrote:
  It makes some sense in principle, but without storing what they mean
  (i.e., group==?) it certainly makes no sense. 
 
 There's no they there.
 
 I think Martin's proposal, to which I agreed, was to store a _single_
 bit.  If any of the execute permissions of the incoming file are set,
 then the bit is stored ON, else it is stored OFF.  On 'checkout', if the
 bit is ON, then the file permission is set mode 0777 (modulo umask),
 else it is set mode 0666 (modulo umask).

I think I agree.

Anybody willing to send me a patch? One issue is that if done the obvious
way it's an incompatible change, and old tree objects won't be valid any
more. It might be ok to just change the compare cache check to only care
about a few bits, though: S_IXUSR and S_IFDIR. And then always write new 
tree objects out with mode set to one of
 - 04: we already do this for directories
 - 100644: normal files without S_IXUSR set
 - 100755: normal files _with_ S_IXUSR set

Then, at compare time, we only look at S_IXUSR matching for files (we
never compare directory modes anyway). And at file create time, we create
them with 0666 and 0777 respectively, and let the users umask sort it out
(and if the user has 0100 set in his umask, he can damn well blame
himself).

This would pretty much match the existing kernel tree, for example. We'd 
end up with some new trees there (and in git), but not a lot of 
incompatibility. And old trees would still work fine, they'd just get 
written out differently.

Anybody want to send a patch to do this?

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Storing permissions

2005-04-16 Thread David A. Wheeler
Paul Jackson wrote:
Junio wrote:
Sounds like svn 

I have no idea what svn is.
svn = common abbreviation for Subversion, a
widely-used centralized SCM tool intentionally
similar to CVS.
--- David A. Wheeler
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Storing permissions

2005-04-16 Thread Paul Jackson
Linus wrote:
 It might be ok to just change the compare cache check to only care
 about a few bits, though: S_IXUSR and S_IFDIR. And then ...

I think I agree.  But since I am reluctant to take enough time to
understand the code well enough to write this patch, I'll shut up now ;).

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Storing permissions

2005-04-16 Thread Linus Torvalds


On Sat, 16 Apr 2005, Linus Torvalds wrote:
 
 Anybody want to send a patch to do this?

Actually, I just did it. Seems to work for the only test-case I tried,
namely I just committed it, and checked that the permissions all ended up
being recorded as 0644 in the tree (if it has the -x bit set, they get
recorded as 0755).

When checking out, we always check out with 0666 or 0777, and just let 
umask do its thing. We only test bit 0100 when checking for differences.

Maybe I missed some case, but this does indeed seem saner than the try to 
restore all bits case. If somebody sees any problems, please holler.

(Btw, you may or may not need to blow away your index file by just 
re-creating it with a read-tree after you've updated to this. I _tried_ 
to make sure that the compare just ignored the ce_mode bits, but the fact 
is, your index file may be corrupt in the sense that it has permission 
sets that sparse expects to never generate in an index file any more..)

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html