Re: [PATCH v4 00/32] Lockfile correctness and refactoring

2014-09-13 Thread Jeff King
On Fri, Sep 12, 2014 at 04:21:09PM +0200, Michael Haggerty wrote:

> > But I still wonder how hard it would be to just remove lock_file structs
> > from the global list when they are committed or rolled back.
> [...]
>
> To make that change, we would have to remove entries from the list of
> lock_file objects in a way that the code can be interrupted at any time
> by a signal while leaving it in a state that is traversable by the
> signal handler.
> 
> I think that can be done pretty easily with a singly-linked list. But
> with a singly-linked list, we would have to iterate through the list to
> find the node that needs to be removed. This could get expensive if
> there are a lot of nodes in the list (see below).

Yes, I considered that, but noticed that if we actually cleaned up
closed files, the list would not grow to more than a handful of entries.
But...

> The ref-transaction code is, I think, moving in the direction of
> updating all references in a single transaction. This means that we
> would need to hold locks for all of the references at once anyway. So it
> might be all for naught.

That nullifies the whole discussion. Besides the list-traversal thing
above, it would mean that we literally _do_ have all of the lockfiles
open at once. So cleaning up after ourselves would be nice, but it would
not impact the peak memory usage, which would necessarily have one
allocated struct per ref.

The use of a strbuf is probably a big enough change to save us there.
This case was pathological for a few reasons:

  1. A ridiculous number of refs in the repository.

  2. Touching a large number of them in sequence (via pack-refs).

  3. Allocating a 4K buffer per object.

For (3), if the average allocation is dropped even to 400 bytes (which
would accommodate quite a long pathname), that would reduce the memory
usage to ~700MB. Not amazing, but enough not to tip over most modern
machines.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 00/32] Lockfile correctness and refactoring

2014-09-12 Thread Michael Haggerty
On 09/10/2014 10:13 AM, Jeff King wrote:
> [...]
> I did coincidentally have an interesting experience with our lockfile
> code earlier today, which I'd like to relate.
> 
> I was running pack-refs on a repository with a very large number of
> loose refs (about 1.8 million). [...] Each call to lock_ref_sha1_basic
> allocates a "struct lock_file", which then gets added to the global
> lock_file list. Each one contains a fixed PATH_MAX buffer (4K on this
> machine). After we're done updating the ref, we leak the lock_file
> struct, since there's no way to remove it from the list.
> 
> As a result, git tried to allocate 7G of RAM and got OOM-killed (the
> machine had only 8G). In addition to thrashing the disk even harder,
> since there was no room left for disk cache while we touched millions of
> loose refs. :)
> 
> Your change in this series to use a strbuf would make this a lot better.
> But I still wonder how hard it would be to just remove lock_file structs
> from the global list when they are committed or rolled back. That would
> presumably also make the "avoid transitory valid states" patch from your
> series a bit easier, too (you could prepare the lockfile in peace, and
> then link it in fully formed, and do the opposite when removing it).
> 
> I think with your strbuf patch, this leak at least becomes reasonable.
> So maybe it's not worth going further. But I'd be interested to hear
> your thoughts since you've been touching the area recently.

I've thought about this too, but it didn't seem to be worth the effort.
(Though your use case certainly adds a bit of justification.)

To make that change, we would have to remove entries from the list of
lock_file objects in a way that the code can be interrupted at any time
by a signal while leaving it in a state that is traversable by the
signal handler.

I think that can be done pretty easily with a singly-linked list. But
with a singly-linked list, we would have to iterate through the list to
find the node that needs to be removed. This could get expensive if
there are a lot of nodes in the list (see below).

So we would probably want to switch to using a doubly-linked list. I
think this would also be fairly simple, given that the signal handler
only needs to iterate through the list in a single direction. You'd just
have to be careful about adjusting the pointers in the right order to
let (say) a forwards traversal always work.

Then the callers who use heap-allocated lock_file objects would have to
be changed to free them when they're done with them, probably using a
special function that releases the strbufs, too. Callers using
statically-allocated lock_file objects would probably not have to be
changed.

But...

The ref-transaction code is, I think, moving in the direction of
updating all references in a single transaction. This means that we
would need to hold locks for all of the references at once anyway. So it
might be all for naught.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 00/32] Lockfile correctness and refactoring

2014-09-12 Thread Michael Haggerty
On 09/07/2014 04:21 PM, Torsten Bögershausen wrote:
> 
> On 2014-09-06 09.50, Michael Haggerty wrote:
>> Sorry for the long delay since v3. This version mostly cleans up a
>> couple more places where the lockfile object was left in an
>> ill-defined state. 
> No problem with the delay.
> The most important question is if we do the lk->active handling right.
> Set it to false as seen as possible, and to true as late as possible,
> then die() cleanly.
> So the ->acive handling looks (more or less, please see below) and
> deserves another critical review, may be.
> 
> Instead of commenting each patch, I collected a mixture of small questions
> and possible suggestions into a diff file.
> 
> diff --git a/lockfile.c b/lockfile.c
> index e54d260..7f27ea2 100644
> --- a/lockfile.c
> +++ b/lockfile.c
> @@ -18,6 +18,10 @@
>   *Usually, if $FILENAME is a symlink, then it is resolved, and the
>   *file ultimately pointed to is the one that is locked and later
>   *replaced.  However, if LOCK_NODEREF is used, then $FILENAME
> +LOCK_NODEREF can be read either as
> +LOCK_NODE_REF or LOCK_NO_DEREF
> +should we change it ?
> +

Good idea.

>   *itself is locked and later replaced, even if it is a symlink.
>   *
>   * 2. Write the new file contents to the lockfile.
> @@ -46,9 +50,18 @@
>   *   state:
>   *   - the lockfile exists
>   *   - active is set
> - *   - filename holds the filename of the lockfile
> + *   - filename holds the filename of the lockfile in a strbuf

I don't think this is necessary. The point of this list is to describe
the state machine, not the contents of the lock_file structure, so this
detail is only a distraction. And even if a reader is confused, the
compiler will warn if he tries to use the strbuf as if it were a string.

>   *   - fd holds a file descriptor open for writing to the lockfile
>   *   - owner holds the PID of the process that locked the file
> +question: Why do we need the PID here ?
> +Do we open a lock file and do a fork() ?
> +And if yes, the child gets a new PID, what happens when the
> +child gets a signal ?
> +Who "owns" the lockfile, the parent, the child, both ?
> +The child has access to all data, the fd is open and can be used,
> +why do we not allow a rollback, when the child dies ?

Good questions. I will add an explanation of the purpose of the pid in
this docstring.

>   *
>   * - Locked, lockfile closed (after close_lock_file()).  Same as the
>   *   previous state, except that the lockfile is closed and fd is -1.
> @@ -57,7 +70,7 @@
>   *   rollback_lock_file(), or a failed attempt to lock).  In this
>   *   state:
>   *   - active is unset
> - *   - filename is the empty string (usually, though there are
> + *   - filename is an empty string buffer (usually, though there are
>   * transitory states in which this condition doesn't hold)
>   *   - fd is -1
>   *   - the object is left registered in the lock_file_list, and
> @@ -68,7 +81,7 @@
>  
>  static struct lock_file *volatile lock_file_list;
>  
> -static void remove_lock_file(void)
> +static void remove_lock_files(void)

Good idea. I will rename these two functions.

>  {
>  pid_t me = getpid();
>  
> @@ -79,9 +92,9 @@ static void remove_lock_file(void)
>  }
>  }
>  
> -static void remove_lock_file_on_signal(int signo)
> +static void remove_lock_files_on_signal(int signo)
>  {
> -remove_lock_file();
> +remove_lock_files();
>  sigchain_pop(signo);
>  raise(signo);
>  }
> @@ -93,7 +106,7 @@ static void remove_lock_file_on_signal(int signo)
>   * "/", if any).  If path is empty or the root directory ("/"), set
>   * path to the empty string.
>   */
> -static void trim_last_path_elm(struct strbuf *path)
> +static void trim_last_path_elem(struct strbuf *path)

I agree that the old name was bad. I will make it even more explicit:
trim_last_path_component().

>  {
>  int i = path->len;
>  
> @@ -143,7 +156,7 @@ static void resolve_symlink(struct strbuf *path)
>   * link is a relative path, so replace the
>   * last element of p with it.
>   */
> -trim_last_path_elm(path);
> +trim_last_path_elem(path);
>  
>  strbuf_addbuf(path, &link);
>  }
> @@ -153,13 +166,16 @@ static void resolve_symlink(struct strbuf *path)
>  /* Make sure errno contains a meaningful value on error */
>  static int lock_file(struct lock_file *lk, const char *path, int flags)
>  {
> +struct stat st;
> +int mode = 0666;
>  if (!lock_file_list) {
>  /* One-time initialization */
> -sigchain_push_common(remove_lock_file_on_signal);
> -atexit(remove_lock_file);
> +sigchain_push_common(remove_lock_files_on_signal);
> +atexit(remove_lock_files);
>  }
>  
> -assert(!lk->active);
> +  if (lk->active)
> +die("lk->active %s", path);

OK, but I will use die("BUG:...") since this would be an indication of a
bug in git.

>  if (!lk->on_list) {
>  /* Initialize 

Re: [PATCH v4 00/32] Lockfile correctness and refactoring

2014-09-12 Thread Michael Haggerty
On 09/10/2014 09:11 PM, Jeff King wrote:
> On Wed, Sep 10, 2014 at 09:51:03AM -0700, Junio C Hamano wrote:
> 
>> Jeff King  writes:
>>
>>> Yes, we don't let normal fetchers see these repos. They're only for
>>> holding shared objects and the ref tips to keep them reachable.
>>
>> Are these individual refs have relations to the real world after
>> they are created?  To ask it another way, let's say that a branch in
>> a repository, which is using this as a shared object store, caused
>> one of these refs to be created; now the origin repository rewinds
>> or deletes that branch---do you do anything to the ref in the shared
>> object store at that point?
> 
> Yes, we fetch from them before doing any maintenance in the shared
> repository (like running repack). That's how objects migrate into the
> shared repository, as well.
> 
>> I am wondering if it makes sense to maintain a single ref that
>> reaches all the commits in this shared object store repository,
>> instead of keeping these millions of refs.  When you need to make
>> more objects kept and reachable, create an octopus with the current
>> tip and tips of all these refs that causes you to wish making these
>> "more objects kept and reachable".  Obviously that won't work well
>> if the reason why your current scheme uses refs is because you
>> adjust individual refs to prune some objects---hence the first
>> question in this message.
> 
> Exactly. You could do this if you threw away and re-made the octopus
> after each fetch (and then threw away the individual branches that went
> into it). For that matter, if all you really want are the tips for
> reachability, you can basically run "for-each-ref | sort -u"; most of
> these refs are tags that are duplicated between each fork.
> 
> However, having the individual tips does make some things easier. If I
> only keep unique tips and I drop a tip from fork A, I would then need to
> check every other fork to see if any other fork has the same tip. OTOH,
> that means visiting N packed-refs files, each with (let's say) 3000
> refs. As opposed to dealing with a packed-refs file with N*3000 refs. So
> it's really not that different.

I think it would make more sense to have one octopus per fork, rather
than one octopus for all of the forks. The octopus for a fork could be
discarded and re-created every time that fork changed, without having to
worry about which of the old tips is still reachable from another fork.
In fact, if you happen to know that a particular update to a fork is
pure fast-forward, you could update the fork octopus by merging the
changed tips into the old fork octopus without having to re-knit all of
the unchanged tips together again.

The shared repository would need one reference per fork--still much
better than a reference for every reference in every fork. If even that
is too many references, you could knit the fork octopuses into an
overall octopus for all forks. But then updating that octopus for a
change to a fork would become more difficult, because you would have to
read the octopuses for the N-1 unchanged forks from the old octopus and
knit those together with the new octopus for the modified fork.

But all this imagineering doesn't mitigate the other reasons you list
below for not wanting to guarantee reachability using this trick.

> We also use the individual ref tips for packing. They factor into the
> bitmap selection, and we have some patches (which I've been meaning to
> upstream for a while now) to make delta selections in the shared-object
> repository that will have a high chance of reuse in clones of individual
> forks. And it's useful to query them for various reasons (e.g., "who is
> referencing this object?").
> [...]

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 00/32] Lockfile correctness and refactoring

2014-09-12 Thread Michael Haggerty
On 09/10/2014 06:51 PM, Junio C Hamano wrote:
> [...]
> I am wondering if it makes sense to maintain a single ref that
> reaches all the commits in this shared object store repository,
> instead of keeping these millions of refs.  When you need to make
> more objects kept and reachable, create an octopus with the current
> tip and tips of all these refs that causes you to wish making these
> "more objects kept and reachable".  Obviously that won't work well
> if the reason why your current scheme uses refs is because you
> adjust individual refs to prune some objects---hence the first
> question in this message.

This is an interesting idea.

I wouldn't trust git to handle efficiently commits with thousands of
parents, but it would be easy to knit a huge number of tips together
using multiple layers of, say, 16-parent octopus merges.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 00/32] Lockfile correctness and refactoring

2014-09-10 Thread Jeff King
On Wed, Sep 10, 2014 at 09:51:03AM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > Yes, we don't let normal fetchers see these repos. They're only for
> > holding shared objects and the ref tips to keep them reachable.
> 
> Are these individual refs have relations to the real world after
> they are created?  To ask it another way, let's say that a branch in
> a repository, which is using this as a shared object store, caused
> one of these refs to be created; now the origin repository rewinds
> or deletes that branch---do you do anything to the ref in the shared
> object store at that point?

Yes, we fetch from them before doing any maintenance in the shared
repository (like running repack). That's how objects migrate into the
shared repository, as well.

> I am wondering if it makes sense to maintain a single ref that
> reaches all the commits in this shared object store repository,
> instead of keeping these millions of refs.  When you need to make
> more objects kept and reachable, create an octopus with the current
> tip and tips of all these refs that causes you to wish making these
> "more objects kept and reachable".  Obviously that won't work well
> if the reason why your current scheme uses refs is because you
> adjust individual refs to prune some objects---hence the first
> question in this message.

Exactly. You could do this if you threw away and re-made the octopus
after each fetch (and then threw away the individual branches that went
into it). For that matter, if all you really want are the tips for
reachability, you can basically run "for-each-ref | sort -u"; most of
these refs are tags that are duplicated between each fork.

However, having the individual tips does make some things easier. If I
only keep unique tips and I drop a tip from fork A, I would then need to
check every other fork to see if any other fork has the same tip. OTOH,
that means visiting N packed-refs files, each with (let's say) 3000
refs. As opposed to dealing with a packed-refs file with N*3000 refs. So
it's really not that different.

We also use the individual ref tips for packing. They factor into the
bitmap selection, and we have some patches (which I've been meaning to
upstream for a while now) to make delta selections in the shared-object
repository that will have a high chance of reuse in clones of individual
forks. And it's useful to query them for various reasons (e.g., "who is
referencing this object?").

There a lot of different ways to do it, and the giant refs file is a
pain (not to mention writing objects to disk in the forks, and then
migrating them separately to the shared storage). But doing it this way
means that the forks and the shared-object repository are all real
first-class git repositories. We follow the usual object reachability
guarantees, and it's safe to run any stock git command on them.

I am leaning towards a system where the shared-object repository is a
pseudo-repository, forks actually write directly into the shared object
store (probably with a symlinked objects/ directory), and implementing a
ref backend that generates a virtual mapping on the fly (e.g., all refs in
"../foo.git" appear as "refs/remotes/foo/*" in the pseudo-repository.
I'm watching the refs-backend work anxiously. :)

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 00/32] Lockfile correctness and refactoring

2014-09-10 Thread Junio C Hamano
Jeff King  writes:

> Yes, we don't let normal fetchers see these repos. They're only for
> holding shared objects and the ref tips to keep them reachable.

Are these individual refs have relations to the real world after
they are created?  To ask it another way, let's say that a branch in
a repository, which is using this as a shared object store, caused
one of these refs to be created; now the origin repository rewinds
or deletes that branch---do you do anything to the ref in the shared
object store at that point?

I am wondering if it makes sense to maintain a single ref that
reaches all the commits in this shared object store repository,
instead of keeping these millions of refs.  When you need to make
more objects kept and reachable, create an octopus with the current
tip and tips of all these refs that causes you to wish making these
"more objects kept and reachable".  Obviously that won't work well
if the reason why your current scheme uses refs is because you
adjust individual refs to prune some objects---hence the first
question in this message.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 00/32] Lockfile correctness and refactoring

2014-09-10 Thread Jeff King
On Wed, Sep 10, 2014 at 05:25:36PM +0700, Duy Nguyen wrote:

> On Wed, Sep 10, 2014 at 3:13 PM, Jeff King  wrote:
> > I was running pack-refs on a repository with a very large number of
> > loose refs (about 1.8 million). Needless to say, this ran very slowly
> > and thrashed the disk, as that's almost 7G using 4K inodes. But it did
> > eventually generate a packed-refs file, at which point it tried to prune
> > the loose refs.
> 
> Urghh.. ref advertisment for fetch/push would be unbelievably large.
> Gotta look at the "git protocol v2" again soon..

Yes, we don't let normal fetchers see these repos. They're only for
holding shared objects and the ref tips to keep them reachable. So we
never fetch from them, even locally. We only fetch to them from normal
repos (and never push to or from them at all).

It's still rather painful just to do normal things, though. Every git
operation loads the whole packed-refs file into memory. I'm biding my
time on the ref-backend patches that are being worked on. :)

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 00/32] Lockfile correctness and refactoring

2014-09-10 Thread Duy Nguyen
On Wed, Sep 10, 2014 at 3:13 PM, Jeff King  wrote:
> I was running pack-refs on a repository with a very large number of
> loose refs (about 1.8 million). Needless to say, this ran very slowly
> and thrashed the disk, as that's almost 7G using 4K inodes. But it did
> eventually generate a packed-refs file, at which point it tried to prune
> the loose refs.

Urghh.. ref advertisment for fetch/push would be unbelievably large.
Gotta look at the "git protocol v2" again soon..
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 00/32] Lockfile correctness and refactoring

2014-09-10 Thread Jeff King
On Sat, Sep 06, 2014 at 09:50:14AM +0200, Michael Haggerty wrote:

> Sorry for the long delay since v3. This version mostly cleans up a
> couple more places where the lockfile object was left in an
> ill-defined state. Thanks to Johannes Sixt and Torsten Bögershausen
> for their review of v3.
> 
> I believe that this series addresses all of the comments from v1 [1],
> v2 [2], and v3 [3].

This looks pretty good to me overall.

I did coincidentally have an interesting experience with our lockfile
code earlier today, which I'd like to relate.

I was running pack-refs on a repository with a very large number of
loose refs (about 1.8 million). Needless to say, this ran very slowly
and thrashed the disk, as that's almost 7G using 4K inodes. But it did
eventually generate a packed-refs file, at which point it tried to prune
the loose refs.

To do so, we have to lock each ref before removing it (to protect
against a simultaneous update). Each call to lock_ref_sha1_basic
allocates a "struct lock_file", which then gets added to the global
lock_file list. Each one contains a fixed PATH_MAX buffer (4K on this
machine). After we're done updating the ref, we leak the lock_file
struct, since there's no way to remove it from the list.

As a result, git tried to allocate 7G of RAM and got OOM-killed (the
machine had only 8G). In addition to thrashing the disk even harder,
since there was no room left for disk cache while we touched millions of
loose refs. :)

Your change in this series to use a strbuf would make this a lot better.
But I still wonder how hard it would be to just remove lock_file structs
from the global list when they are committed or rolled back. That would
presumably also make the "avoid transitory valid states" patch from your
series a bit easier, too (you could prepare the lockfile in peace, and
then link it in fully formed, and do the opposite when removing it).

I think with your strbuf patch, this leak at least becomes reasonable.
So maybe it's not worth going further. But I'd be interested to hear
your thoughts since you've been touching the area recently.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 00/32] Lockfile correctness and refactoring

2014-09-08 Thread Junio C Hamano
Michael Haggerty  writes:

> This series applies to the current "master". There is a trivial
> conflict between these changes and "next", and a few not-too-serious
> conflicts between these changes and Ronnie's reference-related series
> in "pu".

The conflicts weren't very pretty as rs/transaction* series
progressed, but I think I got something readable queued at the tip
of 'jch' (which is what I usually run myself and is at somewhere
between 'pu^{/match next}' and 'pu').

I'd appreciate if both of you can double check the result.

> I've figured out how to resolve the conflicts locally. Is
> there some form in which I can put the conflict resolution that would
> help you?

A public tree that shows a suggested conflict resolution, so that I
can try it myself without looking at it first and then compare my
result with yours, would probably be the easiest for both of us.  In
the worst case, I could fetch from such a tree, use rerere-train in
contrib/ and figure out any necessary evil-merges that are not
covered by rerere.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 00/32] Lockfile correctness and refactoring

2014-09-07 Thread Torsten Bögershausen

On 2014-09-06 09.50, Michael Haggerty wrote:
> Sorry for the long delay since v3. This version mostly cleans up a
> couple more places where the lockfile object was left in an
> ill-defined state. 
No problem with the delay.
The most important question is if we do the lk->active handling right.
Set it to false as seen as possible, and to true as late as possible,
then die() cleanly.
So the ->acive handling looks (more or less, please see below) and
deserves another critical review, may be.

Instead of commenting each patch, I collected a mixture of small questions
and possible suggestions into a diff file.

diff --git a/lockfile.c b/lockfile.c
index e54d260..7f27ea2 100644
--- a/lockfile.c
+++ b/lockfile.c
@@ -18,6 +18,10 @@
  *Usually, if $FILENAME is a symlink, then it is resolved, and the
  *file ultimately pointed to is the one that is locked and later
  *replaced.  However, if LOCK_NODEREF is used, then $FILENAME
+LOCK_NODEREF can be read either as
+LOCK_NODE_REF or LOCK_NO_DEREF
+should we change it ?
+
  *itself is locked and later replaced, even if it is a symlink.
  *
  * 2. Write the new file contents to the lockfile.
@@ -46,9 +50,18 @@
  *   state:
  *   - the lockfile exists
  *   - active is set
- *   - filename holds the filename of the lockfile
+ *   - filename holds the filename of the lockfile in a strbuf
  *   - fd holds a file descriptor open for writing to the lockfile
  *   - owner holds the PID of the process that locked the file
+question: Why do we need the PID here ?
+Do we open a lock file and do a fork() ?
+And if yes, the child gets a new PID, what happens when the
+child gets a signal ?
+Who "owns" the lockfile, the parent, the child, both ?
+The child has access to all data, the fd is open and can be used,
+why do we not allow a rollback, when the child dies ?
+
+
  *
  * - Locked, lockfile closed (after close_lock_file()).  Same as the
  *   previous state, except that the lockfile is closed and fd is -1.
@@ -57,7 +70,7 @@
  *   rollback_lock_file(), or a failed attempt to lock).  In this
  *   state:
  *   - active is unset
- *   - filename is the empty string (usually, though there are
+ *   - filename is an empty string buffer (usually, though there are
  * transitory states in which this condition doesn't hold)
  *   - fd is -1
  *   - the object is left registered in the lock_file_list, and
@@ -68,7 +81,7 @@
 
 static struct lock_file *volatile lock_file_list;
 
-static void remove_lock_file(void)
+static void remove_lock_files(void)
 {
 pid_t me = getpid();
 
@@ -79,9 +92,9 @@ static void remove_lock_file(void)
 }
 }
 
-static void remove_lock_file_on_signal(int signo)
+static void remove_lock_files_on_signal(int signo)
 {
-remove_lock_file();
+remove_lock_files();
 sigchain_pop(signo);
 raise(signo);
 }
@@ -93,7 +106,7 @@ static void remove_lock_file_on_signal(int signo)
  * "/", if any).  If path is empty or the root directory ("/"), set
  * path to the empty string.
  */
-static void trim_last_path_elm(struct strbuf *path)
+static void trim_last_path_elem(struct strbuf *path)
 {
 int i = path->len;
 
@@ -143,7 +156,7 @@ static void resolve_symlink(struct strbuf *path)
  * link is a relative path, so replace the
  * last element of p with it.
  */
-trim_last_path_elm(path);
+trim_last_path_elem(path);
 
 strbuf_addbuf(path, &link);
 }
@@ -153,13 +166,16 @@ static void resolve_symlink(struct strbuf *path)
 /* Make sure errno contains a meaningful value on error */
 static int lock_file(struct lock_file *lk, const char *path, int flags)
 {
+struct stat st;
+int mode = 0666;
 if (!lock_file_list) {
 /* One-time initialization */
-sigchain_push_common(remove_lock_file_on_signal);
-atexit(remove_lock_file);
+sigchain_push_common(remove_lock_files_on_signal);
+atexit(remove_lock_files);
 }
 
-assert(!lk->active);
+  if (lk->active)
+die("lk->active %s", path);
 
 if (!lk->on_list) {
 /* Initialize *lk and add it to lock_file_list: */
@@ -167,16 +183,25 @@ static int lock_file(struct lock_file *lk, const char 
*path, int flags)
 lk->active = 0;
 lk->owner = 0;
 lk->on_list = 1;
-strbuf_init(&lk->filename, 0);
+strbuf_init(&lk->filename, strlen(path) + LOCK_SUFFIX_LEN);
 lk->next = lock_file_list;
 lock_file_list = lk;
 }
 
+strbuf_reset(&lk->filename); /* Better to be save */
 strbuf_addstr(&lk->filename, path);
 if (!(flags & LOCK_NODEREF))
 resolve_symlink(&lk->filename);
 strbuf_addstr(&lk->filename, LOCK_SUFFIX);
-lk->fd = open(lk->filename.buf, O_RDWR | O_CREAT | O_EXCL, 0666);
+/*
+ * adjust_shared_perm() will widen permissions if needed,
+ * otherwise keep permissions restrictive
+ *
+ */
+if (!stat(path, &st))
+mode = st.st_mode & 0;
+
+lk->fd 

[PATCH v4 00/32] Lockfile correctness and refactoring

2014-09-06 Thread Michael Haggerty
Sorry for the long delay since v3. This version mostly cleans up a
couple more places where the lockfile object was left in an
ill-defined state. Thanks to Johannes Sixt and Torsten Bögershausen
for their review of v3.

I believe that this series addresses all of the comments from v1 [1],
v2 [2], and v3 [3].

This series applies to the current "master". There is a trivial
conflict between these changes and "next", and a few not-too-serious
conflicts between these changes and Ronnie's reference-related series
in "pu". I've figured out how to resolve the conflicts locally. Is
there some form in which I can put the conflict resolution that would
help you?

Changes since v3:

* Rebase to the current master, including adjusting the patch series
  for 93dcaea2 (addition of reopen_lock_file()).

* Perform the internal consistency check right away in
  commit_lock_file(), rather than after possibly having closed the
  file.

* Improve the explanation of the rationale for marking lock_file
  fields volatile.

* Fix comments that still referred to lock_file::filename[0] even
  though it is now a strbuf.

* Change rollback_lock_file() to exit early if the lock is not active
  (rather than nesting the rest of the function in an "if" statement).

* Fix Johannes's email address in the trailers.

* Extract a function commit_lock_file_to(lk, filename) and delegate to
  it from commit_lock_file() and commit_locked_index() so that the
  latter gets the benefit of the improvements in this patch series.

[1] http://thread.gmane.org/gmane.comp.version-control.git/245609
[2] http://thread.gmane.org/gmane.comp.version-control.git/245801
[3] http://thread.gmane.org/gmane.comp.version-control.git/246222

Michael Haggerty (32):
  unable_to_lock_die(): rename function from unable_to_lock_index_die()
  api-lockfile: expand the documentation
  rollback_lock_file(): do not clear filename redundantly
  rollback_lock_file(): exit early if lock is not active
  rollback_lock_file(): set fd to -1
  lockfile: unlock file if lockfile permissions cannot be adjusted
  hold_lock_file_for_append(): release lock on errors
  lock_file(): always add lock_file object to lock_file_list
  lockfile.c: document the various states of lock_file objects
  cache.h: define constants LOCK_SUFFIX and LOCK_SUFFIX_LEN
  delete_ref_loose(): don't muck around in the lock_file's filename
  prepare_index(): declare return value to be (const char *)
  write_packed_entry_fn(): convert cb_data into a (const int *)
  lock_file(): exit early if lockfile cannot be opened
  remove_lock_file(): call rollback_lock_file()
  commit_lock_file(): inline temporary variable
  commit_lock_file(): die() if called for unlocked lockfile object
  commit_lock_file(): if close fails, roll back
  commit_lock_file(): rollback lock file on failure to rename
  api-lockfile: document edge cases
  dump_marks(): remove a redundant call to rollback_lock_file()
  git_config_set_multivar_in_file(): avoid call to rollback_lock_file()
  lockfile: avoid transitory invalid states
  struct lock_file: declare some fields volatile
  try_merge_strategy(): remove redundant lock_file allocation
  try_merge_strategy(): use a statically-allocated lock_file object
  commit_lock_file(): use a strbuf to manage temporary space
  Change lock_file::filename into a strbuf
  resolve_symlink(): use a strbuf for internal scratch space
  resolve_symlink(): take a strbuf parameter
  trim_last_path_elm(): replace last_path_elm()
  Extract a function commit_lock_file_to()

 Documentation/technical/api-lockfile.txt |  67 +--
 builtin/commit.c |  16 +-
 builtin/merge.c  |  15 +-
 builtin/reflog.c |   2 +-
 builtin/update-index.c   |   2 +-
 cache.h  |  16 +-
 config.c |  28 +--
 fast-import.c|   4 +-
 lockfile.c   | 299 ++-
 read-cache.c |  12 +-
 refs.c   |  29 +--
 shallow.c|   6 +-
 12 files changed, 296 insertions(+), 200 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html