Re: [PATCH 07/23] expire_reflog(): use a lock_file for rewriting the reflog file

2014-12-09 Thread Junio C Hamano
Michael Haggerty mhag...@alum.mit.edu writes:

 This isn't documented very well. I thought I saw a comment somewhere in
 the code that stated it explicitly, but I can't find it now. In any
 case, my understanding of the locking protocol for reflogs is:

 The reflog for $refname, which is stored at
 $GIT_DIR/logs/$refname, is locked by holding
 $GIT_DIR/refs/$refname.lock, *even if the corresponding
 reference is packed*.

 This implies that readers, who don't pay attention to locks, have to be
 prepared for the possibility that the reflog is in the middle of an
 update and that the last line is incomplete. This is handled by
 show_one_reflog_ent(), which discards incomplete lines.

Interesting, as I think I saw Peff did something around that area
recently.

I have some more thought around the transaction in general, but
it will be in a separate message.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/23] expire_reflog(): use a lock_file for rewriting the reflog file

2014-12-09 Thread Jeff King
On Tue, Dec 09, 2014 at 10:47:24AM -0800, Junio C Hamano wrote:

 Michael Haggerty mhag...@alum.mit.edu writes:
 
  This isn't documented very well. I thought I saw a comment somewhere in
  the code that stated it explicitly, but I can't find it now. In any
  case, my understanding of the locking protocol for reflogs is:
 
  The reflog for $refname, which is stored at
  $GIT_DIR/logs/$refname, is locked by holding
  $GIT_DIR/refs/$refname.lock, *even if the corresponding
  reference is packed*.
 
  This implies that readers, who don't pay attention to locks, have to be
  prepared for the possibility that the reflog is in the middle of an
  update and that the last line is incomplete. This is handled by
  show_one_reflog_ent(), which discards incomplete lines.
 
 Interesting, as I think I saw Peff did something around that area
 recently.

Yeah, and I had no idea about the truncated-line race which Michael
described at the time. It makes me glad that I took the time to do the
more careful thing[1] (fixing the reverse-reflog parser to properly
include the newline when present) and not the quick-and-easy thing[2]
(teaching show_one_reflog_ent to accept entries without newlines).

What you have queued in jk/for-each-reflog-ent-reverse should be fine,
even in light of what Michael said.

-Peff

[1] http://article.gmane.org/gmane.comp.version-control.git/260849

[2] http://article.gmane.org/gmane.comp.version-control.git/260807
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/23] expire_reflog(): use a lock_file for rewriting the reflog file

2014-12-08 Thread Michael Haggerty
On 12/05/2014 03:19 AM, Stefan Beller wrote:
 On Thu, Dec 04, 2014 at 04:23:31PM -0800, Jonathan Nieder wrote:
 Michael Haggerty wrote:

 We don't actually need the locking functionality, because we already
 hold the lock on the reference itself, which is how the reflog file is
 locked. But the lock_file code still does some of the bookkeeping for
 us and is more careful than the old code here was.

 As you say, the ref lock takes care of mutual exclusion, so we do not
 have to be too careful about compatibility with other tools that might
 not know to lock the reflog.  And this is not tying our hands for a
 future when I might want to lock logs/refs/heads/topic/1 while
 logs/refs/heads/topic still exists as part of the implementation of
 git mv topic/1 topic.

 Stefan and I had forgotten about that guarantee when looking at that
 kind of operation --- thanks for the reminder.
 
 I did not forget about it, I did not know about that in the first hand.
 We don't seem to have documentation on it?
 
 So sorry for heading in a direction, which would have been avoidable.

This isn't documented very well. I thought I saw a comment somewhere in
the code that stated it explicitly, but I can't find it now. In any
case, my understanding of the locking protocol for reflogs is:

The reflog for $refname, which is stored at
$GIT_DIR/logs/$refname, is locked by holding
$GIT_DIR/refs/$refname.lock, *even if the corresponding
reference is packed*.

This implies that readers, who don't pay attention to locks, have to be
prepared for the possibility that the reflog is in the middle of an
update and that the last line is incomplete. This is handled by
show_one_reflog_ent(), which discards incomplete lines.

This protocol avoids the need to rewrite the reflog from scratch for
each reference update.

Given how poorly-documented this point is, I wonder whether other
implementations of Git (e.g., libgit2, JGit, Dulwich, ...) got it right.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/23] expire_reflog(): use a lock_file for rewriting the reflog file

2014-12-08 Thread Michael Haggerty
On 12/05/2014 03:59 AM, ronnie sahlberg wrote:
 On Thu, Dec 4, 2014 at 3:08 PM, Michael Haggerty mhag...@alum.mit.edu wrote:
 We don't actually need the locking functionality, because we already
 hold the lock on the reference itself,
 
 No. You do need the lock.
 The ref is locked only during transaction_commit()
 
 If you don't want to lock the reflog file and instead rely on the lock
 on the ref itself you will need to
 rework your patches so that the lock on the ref is taken already
 during, for example, transaction_update_ref() instead.
 
 But without doing those changes and moving the ref locking from
 _commit() to _update_ref() you will risk reflog corruption/surprises
 if two operations collide and both rewrite the reflog without any lock held.

Ronnie, I don't understand your comments.

It is a statement of fact (to the best of my knowledge) that reflogs are
supposed to be modified only under a lock on the corresponding
reference, namely $GIT_DIR/refs/$refname.lock. We do not require
reflog writers to hold $GIT_DIR/logs/$refname.lock.

In this function, $GIT_DIR/logs/$refname.lock happens to be the name
of the temporary file being used to stage the new contents of the
reflog. But that is more or less a coincidence; we could call the
temporary file whatever we want because it has no locking implications.
However, what we want to do with the file in this code path (write a new
version then rename the new version on top of the old version, deleting
the temporary file if the program is interrupted) is the same as what we
do with lockfiles, so we use the lockfile code because it is convenient.

This patch series has nothing to do with ref_transaction_commit() or any
of the transaction machinery. It has to do with expire_reflog(), which
is invoked outside of any transaction.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/23] expire_reflog(): use a lock_file for rewriting the reflog file

2014-12-08 Thread Michael Haggerty
On 12/05/2014 01:23 AM, Jonathan Nieder wrote:
 Michael Haggerty wrote:
 
 We don't actually need the locking functionality, because we already
 hold the lock on the reference itself, which is how the reflog file is
 locked. But the lock_file code still does some of the bookkeeping for
 us and is more careful than the old code here was.
 
 As you say, the ref lock takes care of mutual exclusion, so we do not
 have to be too careful about compatibility with other tools that might
 not know to lock the reflog.  And this is not tying our hands for a
 future when I might want to lock logs/refs/heads/topic/1 while
 logs/refs/heads/topic still exists as part of the implementation of
 git mv topic/1 topic.
 
 Stefan and I had forgotten about that guarantee when looking at that
 kind of operation --- thanks for the reminder.

This reminder is important (and forgettable) enough that I will add a
comment within the function explaining it.

 Should updates to the HEAD reflog acquire HEAD.lock?  (They don't
 currently.)

Yes, they should; good catch. I assume that you are referring to the
code at the bottom of write_ref_sha1()? Or did you find a problem in
this patch series?

If the former, then I propose that we address this bug in a separate
patch series.

 [...]
 --- a/builtin/reflog.c
 +++ b/builtin/reflog.c
 @@ -349,12 +349,14 @@ static int push_tip_to_list(const char *refname, const 
 unsigned char *sha1, int
  return 0;
  }
  
 +static struct lock_file reflog_lock;
 
 If this lockfile is only used in that one function, it can be declared
 inside the function.
 
 If it is meant to be used throughout the 'git reflog' command, then it
 can go near the top of the file.

For now it is only used within this function, so I will move it into the
function as you suggest. (As you know, it does need to remain static,
because of the way the lock_file module takes over ownership of these
objects.)

 +
  static int expire_reflog(const char *refname, const unsigned char *sha1, 
 void *cb_data)
  {
  struct cmd_reflog_expire_cb *cmd = cb_data;
  struct expire_reflog_cb cb;
  struct ref_lock *lock;
 -char *log_file, *newlog_path = NULL;
 +char *log_file;
  struct commit *tip_commit;
  struct commit_list *tips;
  int status = 0;
 @@ -372,10 +374,14 @@ static int expire_reflog(const char *refname, const 
 unsigned char *sha1, void *c
  unlock_ref(lock);
  return 0;
  }
 +
  log_file = git_pathdup(logs/%s, refname);
  if (!cmd-dry_run) {
 -newlog_path = git_pathdup(logs/%s.lock, refname);
 -cb.newlog = fopen(newlog_path, w);
 +if (hold_lock_file_for_update(reflog_lock, log_file, 0)  0)
 +goto failure;
 
 hold_lock_file_for_update doesn't print a message.  Code to print one
 looks like
 
   if (hold_lock_file_for_update(reflog_lock, log_file, 0)  0) {
   unable_to_lock_message(log_file, errno, err);
   error(%s, err.buf);
   goto failure;
   }

Thanks; will add.

 (A patch in flight changes that to
 
   if (hold_lock_file_for_update(reflog_lock, log_file, 0, err)  0) {
   error(%s, err.buf);
   goto failure;
   }
 
 )

Thanks for the heads-up. The compiler will complain when the branches
are merged, and hopefully the fix will be obvious.

 +cb.newlog = fdopen_lock_file(reflog_lock, w);
 +if (!cb.newlog)
 +goto failure;
 
 Hm.  lockfile.c::fdopen_lock_file ought to use xfdopen to make this
 case impossible.  And xfdopen should use try_to_free_routine() and
 try again on failure.

That sounds reasonable, but it is not manifestly obvious given that at
least one caller of fdopen_lock_file() (in fast-import.c) tries to
recover if fdopen_lock_file() fails. Let's address this in a separate
patch series if that is OK with you. For now I will add explicit
error-reporting code here before goto failure.

 [...]
 @@ -423,10 +429,9 @@ static int expire_reflog(const char *refname, const 
 unsigned char *sha1, void *c
  }
  
  if (cb.newlog) {
 -if (fclose(cb.newlog)) {
 -status |= error(%s: %s, strerror(errno),
 -newlog_path);
 -unlink(newlog_path);
 +if (close_lock_file(reflog_lock)) {
 +status |= error(Couldn't write %s: %s, log_file,
 +strerror(errno));
 
 Style nit: error messages usually start with a lowercase letter
 (though I realize nearby examples are already inconsistent).

Thanks; will fix.

 commit_lock_file() can take care of the close_lock_file automatically.

The existing code is a tiny bit safer: first make sure both files can be
written, *then* rename each of them into place. If either write fails,
then both files will get rolled back. But if we switch to using
commit_lock_file(), then a failure when writing the reference 

Re: [PATCH 07/23] expire_reflog(): use a lock_file for rewriting the reflog file

2014-12-05 Thread Stefan Beller
On Thu, Dec 04, 2014 at 06:51:36PM -0800, ronnie sahlberg wrote:
 On Thu, Dec 4, 2014 at 3:08 PM, Michael Haggerty mhag...@alum.mit.edu
 wrote:
 
  We don't actually need the locking functionality, because we already
  hold the lock on the reference itself,
 
 
 No. You do need the lock.
 The ref is locked only during transaction_commit()
 

Michael is saying, that you never want to modify the reflog, if you're not 
holding
the lock on the corresponding ref. Or in other words, you may modify the 
reflog, if
the corresponding ref is held. This used to be this way back then. 

 If you don't want to lock the reflog file and instead rely on the lock on
 the ref itself you will need to
 rework your patches so that the lock on the ref is taken already during,
 for example, transaction_update_ref() instead.
 
 But without doing those changes and moving the ref locking from _commit()
 to _update_ref() you will risk reflog corruption/surprises
 if two operations collide and both rewrite the reflog without any lock held.
 
 
 
 
 
  which is how the reflog file is
  locked. But the lock_file code still does some of the bookkeeping for
  us and is more careful than the old code here was.
 
  For example:
 
  * Correctly handle the case that the reflog lock file already exists
for some reason or cannot be opened.
 
  * Correctly clean up the lockfile if the program dies.
 
  Signed-off-by: Michael Haggerty mhag...@alum.mit.edu
  ---
   builtin/reflog.c | 37 ++---
   1 file changed, 22 insertions(+), 15 deletions(-)
 
  diff --git a/builtin/reflog.c b/builtin/reflog.c
  index a282e60..d344d45 100644
  --- a/builtin/reflog.c
  +++ b/builtin/reflog.c
  @@ -349,12 +349,14 @@ static int push_tip_to_list(const char *refname,
  const unsigned char *sha1, int
  return 0;
   }
 
  +static struct lock_file reflog_lock;
  +
   static int expire_reflog(const char *refname, const unsigned char *sha1,
  void *cb_data)
   {
  struct cmd_reflog_expire_cb *cmd = cb_data;
  struct expire_reflog_cb cb;
  struct ref_lock *lock;
  -   char *log_file, *newlog_path = NULL;
  +   char *log_file;
  struct commit *tip_commit;
  struct commit_list *tips;
  int status = 0;
  @@ -372,10 +374,14 @@ static int expire_reflog(const char *refname, const
  unsigned char *sha1, void *c
  unlock_ref(lock);
  return 0;
  }
  +
  log_file = git_pathdup(logs/%s, refname);
  if (!cmd-dry_run) {
  -   newlog_path = git_pathdup(logs/%s.lock, refname);
  -   cb.newlog = fopen(newlog_path, w);
  +   if (hold_lock_file_for_update(reflog_lock, log_file, 0) 
  0)
  +   goto failure;
  +   cb.newlog = fdopen_lock_file(reflog_lock, w);
  +   if (!cb.newlog)
  +   goto failure;
  }
 
  cb.cmd = cmd;
  @@ -423,10 +429,9 @@ static int expire_reflog(const char *refname, const
  unsigned char *sha1, void *c
  }
 
  if (cb.newlog) {
  -   if (fclose(cb.newlog)) {
  -   status |= error(%s: %s, strerror(errno),
  -   newlog_path);
  -   unlink(newlog_path);
  +   if (close_lock_file(reflog_lock)) {
  +   status |= error(Couldn't write %s: %s, log_file,
  +   strerror(errno));
  } else if (cmd-updateref 
  (write_in_full(lock-lock_fd,
  sha1_to_hex(cb.last_kept_sha1), 40) != 40
  ||
  @@ -434,21 +439,23 @@ static int expire_reflog(const char *refname, const
  unsigned char *sha1, void *c
   close_ref(lock)  0)) {
  status |= error(Couldn't write %s,
  lock-lk-filename.buf);
  -   unlink(newlog_path);
  -   } else if (rename(newlog_path, log_file)) {
  -   status |= error(cannot rename %s to %s,
  -   newlog_path, log_file);
  -   unlink(newlog_path);
  +   rollback_lock_file(reflog_lock);
  +   } else if (commit_lock_file(reflog_lock)) {
  +   status |= error(cannot rename %s.lock to %s,
  +   log_file, log_file);
  } else if (cmd-updateref  commit_ref(lock)) {
  status |= error(Couldn't set %s, lock-ref_name);
  -   } else {
  -   adjust_shared_perm(log_file);
  }
  }
  -   free(newlog_path);
  free(log_file);
  unlock_ref(lock);
  return status;
  +
  + failure:
  +   rollback_lock_file(reflog_lock);
  +   free(log_file);
  +   unlock_ref(lock);
  +   

Re: [PATCH 07/23] expire_reflog(): use a lock_file for rewriting the reflog file

2014-12-05 Thread Stefan Beller
On Thu, Dec 04, 2014 at 04:23:31PM -0800, Jonathan Nieder wrote:
 Michael Haggerty wrote:
 
  We don't actually need the locking functionality, because we already
  hold the lock on the reference itself, which is how the reflog file is
  locked. But the lock_file code still does some of the bookkeeping for
  us and is more careful than the old code here was.
 
 As you say, the ref lock takes care of mutual exclusion, so we do not
 have to be too careful about compatibility with other tools that might
 not know to lock the reflog.  And this is not tying our hands for a
 future when I might want to lock logs/refs/heads/topic/1 while
 logs/refs/heads/topic still exists as part of the implementation of
 git mv topic/1 topic.
 
 Stefan and I had forgotten about that guarantee when looking at that
 kind of operation --- thanks for the reminder.
 
 Should updates to the HEAD reflog acquire HEAD.lock?  (They don't
 currently.)
 
 [...]
  --- a/builtin/reflog.c
  +++ b/builtin/reflog.c
  @@ -349,12 +349,14 @@ static int push_tip_to_list(const char *refname, 
  const unsigned char *sha1, int
  return 0;
   }
   
  +static struct lock_file reflog_lock;
 
 If this lockfile is only used in that one function, it can be declared
 inside the function.
 
 If it is meant to be used throughout the 'git reflog' command, then it
 can go near the top of the file.
 

After the series completes, this lock is only used in reflog_expire.
So I'd rather move it inside the function? Then we could run the reflog_expire
function in parallel for different locks in theory?

  +
   static int expire_reflog(const char *refname, const unsigned char *sha1, 
  void *cb_data)
   {
  struct cmd_reflog_expire_cb *cmd = cb_data;
  struct expire_reflog_cb cb;
  struct ref_lock *lock;
  -   char *log_file, *newlog_path = NULL;
  +   char *log_file;
  struct commit *tip_commit;
  struct commit_list *tips;
  int status = 0;
  @@ -372,10 +374,14 @@ static int expire_reflog(const char *refname, const 
  unsigned char *sha1, void *c
  unlock_ref(lock);
  return 0;
  }
  +
  log_file = git_pathdup(logs/%s, refname);
  if (!cmd-dry_run) {
  -   newlog_path = git_pathdup(logs/%s.lock, refname);
  -   cb.newlog = fopen(newlog_path, w);
  +   if (hold_lock_file_for_update(reflog_lock, log_file, 0)  0)
  +   goto failure;
 
 hold_lock_file_for_update doesn't print a message.  Code to print one
 looks like
 
   if (hold_lock_file_for_update(reflog_lock, log_file, 0)  0) {
   unable_to_lock_message(log_file, errno, err);
   error(%s, err.buf);
   goto failure;
   }
 
 (A patch in flight changes that to
 
   if (hold_lock_file_for_update(reflog_lock, log_file, 0, err)  0) {
   error(%s, err.buf);
   goto failure;
   }
 
 )
 
  +   cb.newlog = fdopen_lock_file(reflog_lock, w);
  +   if (!cb.newlog)
  +   goto failure;
 
 Hm.  lockfile.c::fdopen_lock_file ought to use xfdopen to make this
 case impossible.  And xfdopen should use try_to_free_routine() and
 try again on failure.
 
 [...]
  @@ -423,10 +429,9 @@ static int expire_reflog(const char *refname, const 
  unsigned char *sha1, void *c
  }
   
  if (cb.newlog) {
  -   if (fclose(cb.newlog)) {
  -   status |= error(%s: %s, strerror(errno),
  -   newlog_path);
  -   unlink(newlog_path);
  +   if (close_lock_file(reflog_lock)) {
  +   status |= error(Couldn't write %s: %s, log_file,
  +   strerror(errno));
 
 Style nit: error messages usually start with a lowercase letter
 (though I realize nearby examples are already inconsistent).
 
 commit_lock_file() can take care of the close_lock_file automatically.
 
 [...]
  @@ -434,21 +439,23 @@ static int expire_reflog(const char *refname, const 
  unsigned char *sha1, void *c
   close_ref(lock)  0)) {
  status |= error(Couldn't write %s,
  lock-lk-filename.buf);
  -   unlink(newlog_path);
  -   } else if (rename(newlog_path, log_file)) {
  -   status |= error(cannot rename %s to %s,
  -   newlog_path, log_file);
  -   unlink(newlog_path);
  +   rollback_lock_file(reflog_lock);
  +   } else if (commit_lock_file(reflog_lock)) {
  +   status |= error(cannot rename %s.lock to %s,
  +   log_file, log_file);
 
 Most callers say unable to commit reflog '%s', log_file to hedge their
 bets in case the close failed (which may be what you were avoiding
 above.
 
 errno is meaningful when commit_lock_file fails, making a more
 detailed diagnosis from strerror(errno) possible.
 
 Thanks,
 Jonathan
--
To 

Re: [PATCH 07/23] expire_reflog(): use a lock_file for rewriting the reflog file

2014-12-05 Thread Junio C Hamano
Stefan Beller sbel...@google.com writes:

  +static struct lock_file reflog_lock;
 
 If this lockfile is only used in that one function, it can be declared
 inside the function.
 
 If it is meant to be used throughout the 'git reflog' command, then it
 can go near the top of the file.
 

 After the series completes, this lock is only used in reflog_expire.
 So I'd rather move it inside the function? Then we could run the reflog_expire
 function in parallel for different locks in theory?

I am not sure about the parallel part, but I would imagine that it
is an essential prerequisite to move this outside the client code
if we want to later replace the backing storage of refs and reflogs
outside the filesystem, so from that point of view,  I think the
suggestion makes sense.

Thanks.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/23] expire_reflog(): use a lock_file for rewriting the reflog file

2014-12-05 Thread Stefan Beller
On Fri, Dec 5, 2014 at 11:32 AM, Junio C Hamano gits...@pobox.com wrote:
 Stefan Beller sbel...@google.com writes:

  +static struct lock_file reflog_lock;

 If this lockfile is only used in that one function, it can be declared
 inside the function.

 If it is meant to be used throughout the 'git reflog' command, then it
 can go near the top of the file.


 After the series completes, this lock is only used in reflog_expire.
 So I'd rather move it inside the function? Then we could run the 
 reflog_expire
 function in parallel for different locks in theory?

 I am not sure about the parallel part, but I would imagine that it
 is an essential prerequisite to move this outside the client code
 if we want to later replace the backing storage of refs and reflogs
 outside the filesystem, so from that point of view,  I think the
 suggestion makes sense.

 Thanks.


Sorry for the confusion. With parallel I mean, that in theory we could have
multiple threads running reflog expire at the same time in the same program.
Having the static struct lock_file reflog_lock; around, we can only
process one
reflog at a time, as that is holding the lock_file struct.

I am not saying we want to go multi-threading any time soon if at all.
Just that it would
be easier to do, if not having the lock file as a file-global variable
instead of a
variable inside a function.

Thanks,
Stefan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/23] expire_reflog(): use a lock_file for rewriting the reflog file

2014-12-05 Thread Junio C Hamano
Stefan Beller sbel...@google.com writes:

 After the series completes, this lock is only used in reflog_expire.
 So I'd rather move it inside the function? Then we could run the 
 reflog_expire
 function in parallel for different locks in theory?

 I am not sure about the parallel part, but I would imagine that it
 is an essential prerequisite to move this outside the client code
 if we want to later replace the backing storage of refs and reflogs
 outside the filesystem, so from that point of view,  I think the
 suggestion makes sense.

 Thanks.


 Sorry for the confusion. With parallel I mean,...

There is no confusion.  I understand exactly what you meant.

What I said was not sure was if parallel is a practical enough
possiblity to include into the set of value propositions the
suggested change to move the lock out of the client may give us.

In other words, With this change, doing a parallel will become a
lot easier---Really?  It probably is not one of the harder part of
the problem if you really want to go parallel was the discourse I
had in my mind.

;-)
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/23] expire_reflog(): use a lock_file for rewriting the reflog file

2014-12-04 Thread Michael Haggerty
We don't actually need the locking functionality, because we already
hold the lock on the reference itself, which is how the reflog file is
locked. But the lock_file code still does some of the bookkeeping for
us and is more careful than the old code here was.

For example:

* Correctly handle the case that the reflog lock file already exists
  for some reason or cannot be opened.

* Correctly clean up the lockfile if the program dies.

Signed-off-by: Michael Haggerty mhag...@alum.mit.edu
---
 builtin/reflog.c | 37 ++---
 1 file changed, 22 insertions(+), 15 deletions(-)

diff --git a/builtin/reflog.c b/builtin/reflog.c
index a282e60..d344d45 100644
--- a/builtin/reflog.c
+++ b/builtin/reflog.c
@@ -349,12 +349,14 @@ static int push_tip_to_list(const char *refname, const 
unsigned char *sha1, int
return 0;
 }
 
+static struct lock_file reflog_lock;
+
 static int expire_reflog(const char *refname, const unsigned char *sha1, void 
*cb_data)
 {
struct cmd_reflog_expire_cb *cmd = cb_data;
struct expire_reflog_cb cb;
struct ref_lock *lock;
-   char *log_file, *newlog_path = NULL;
+   char *log_file;
struct commit *tip_commit;
struct commit_list *tips;
int status = 0;
@@ -372,10 +374,14 @@ static int expire_reflog(const char *refname, const 
unsigned char *sha1, void *c
unlock_ref(lock);
return 0;
}
+
log_file = git_pathdup(logs/%s, refname);
if (!cmd-dry_run) {
-   newlog_path = git_pathdup(logs/%s.lock, refname);
-   cb.newlog = fopen(newlog_path, w);
+   if (hold_lock_file_for_update(reflog_lock, log_file, 0)  0)
+   goto failure;
+   cb.newlog = fdopen_lock_file(reflog_lock, w);
+   if (!cb.newlog)
+   goto failure;
}
 
cb.cmd = cmd;
@@ -423,10 +429,9 @@ static int expire_reflog(const char *refname, const 
unsigned char *sha1, void *c
}
 
if (cb.newlog) {
-   if (fclose(cb.newlog)) {
-   status |= error(%s: %s, strerror(errno),
-   newlog_path);
-   unlink(newlog_path);
+   if (close_lock_file(reflog_lock)) {
+   status |= error(Couldn't write %s: %s, log_file,
+   strerror(errno));
} else if (cmd-updateref 
(write_in_full(lock-lock_fd,
sha1_to_hex(cb.last_kept_sha1), 40) != 40 ||
@@ -434,21 +439,23 @@ static int expire_reflog(const char *refname, const 
unsigned char *sha1, void *c
 close_ref(lock)  0)) {
status |= error(Couldn't write %s,
lock-lk-filename.buf);
-   unlink(newlog_path);
-   } else if (rename(newlog_path, log_file)) {
-   status |= error(cannot rename %s to %s,
-   newlog_path, log_file);
-   unlink(newlog_path);
+   rollback_lock_file(reflog_lock);
+   } else if (commit_lock_file(reflog_lock)) {
+   status |= error(cannot rename %s.lock to %s,
+   log_file, log_file);
} else if (cmd-updateref  commit_ref(lock)) {
status |= error(Couldn't set %s, lock-ref_name);
-   } else {
-   adjust_shared_perm(log_file);
}
}
-   free(newlog_path);
free(log_file);
unlock_ref(lock);
return status;
+
+ failure:
+   rollback_lock_file(reflog_lock);
+   free(log_file);
+   unlock_ref(lock);
+   return -1;
 }
 
 static int collect_reflog(const char *ref, const unsigned char *sha1, int 
unused, void *cb_data)
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/23] expire_reflog(): use a lock_file for rewriting the reflog file

2014-12-04 Thread Jonathan Nieder
Michael Haggerty wrote:

 We don't actually need the locking functionality, because we already
 hold the lock on the reference itself, which is how the reflog file is
 locked. But the lock_file code still does some of the bookkeeping for
 us and is more careful than the old code here was.

As you say, the ref lock takes care of mutual exclusion, so we do not
have to be too careful about compatibility with other tools that might
not know to lock the reflog.  And this is not tying our hands for a
future when I might want to lock logs/refs/heads/topic/1 while
logs/refs/heads/topic still exists as part of the implementation of
git mv topic/1 topic.

Stefan and I had forgotten about that guarantee when looking at that
kind of operation --- thanks for the reminder.

Should updates to the HEAD reflog acquire HEAD.lock?  (They don't
currently.)

[...]
 --- a/builtin/reflog.c
 +++ b/builtin/reflog.c
 @@ -349,12 +349,14 @@ static int push_tip_to_list(const char *refname, const 
 unsigned char *sha1, int
   return 0;
  }
  
 +static struct lock_file reflog_lock;

If this lockfile is only used in that one function, it can be declared
inside the function.

If it is meant to be used throughout the 'git reflog' command, then it
can go near the top of the file.

 +
  static int expire_reflog(const char *refname, const unsigned char *sha1, 
 void *cb_data)
  {
   struct cmd_reflog_expire_cb *cmd = cb_data;
   struct expire_reflog_cb cb;
   struct ref_lock *lock;
 - char *log_file, *newlog_path = NULL;
 + char *log_file;
   struct commit *tip_commit;
   struct commit_list *tips;
   int status = 0;
 @@ -372,10 +374,14 @@ static int expire_reflog(const char *refname, const 
 unsigned char *sha1, void *c
   unlock_ref(lock);
   return 0;
   }
 +
   log_file = git_pathdup(logs/%s, refname);
   if (!cmd-dry_run) {
 - newlog_path = git_pathdup(logs/%s.lock, refname);
 - cb.newlog = fopen(newlog_path, w);
 + if (hold_lock_file_for_update(reflog_lock, log_file, 0)  0)
 + goto failure;

hold_lock_file_for_update doesn't print a message.  Code to print one
looks like

if (hold_lock_file_for_update(reflog_lock, log_file, 0)  0) {
unable_to_lock_message(log_file, errno, err);
error(%s, err.buf);
goto failure;
}

(A patch in flight changes that to

if (hold_lock_file_for_update(reflog_lock, log_file, 0, err)  0) {
error(%s, err.buf);
goto failure;
}

)

 + cb.newlog = fdopen_lock_file(reflog_lock, w);
 + if (!cb.newlog)
 + goto failure;

Hm.  lockfile.c::fdopen_lock_file ought to use xfdopen to make this
case impossible.  And xfdopen should use try_to_free_routine() and
try again on failure.

[...]
 @@ -423,10 +429,9 @@ static int expire_reflog(const char *refname, const 
 unsigned char *sha1, void *c
   }
  
   if (cb.newlog) {
 - if (fclose(cb.newlog)) {
 - status |= error(%s: %s, strerror(errno),
 - newlog_path);
 - unlink(newlog_path);
 + if (close_lock_file(reflog_lock)) {
 + status |= error(Couldn't write %s: %s, log_file,
 + strerror(errno));

Style nit: error messages usually start with a lowercase letter
(though I realize nearby examples are already inconsistent).

commit_lock_file() can take care of the close_lock_file automatically.

[...]
 @@ -434,21 +439,23 @@ static int expire_reflog(const char *refname, const 
 unsigned char *sha1, void *c
close_ref(lock)  0)) {
   status |= error(Couldn't write %s,
   lock-lk-filename.buf);
 - unlink(newlog_path);
 - } else if (rename(newlog_path, log_file)) {
 - status |= error(cannot rename %s to %s,
 - newlog_path, log_file);
 - unlink(newlog_path);
 + rollback_lock_file(reflog_lock);
 + } else if (commit_lock_file(reflog_lock)) {
 + status |= error(cannot rename %s.lock to %s,
 + log_file, log_file);

Most callers say unable to commit reflog '%s', log_file to hedge their
bets in case the close failed (which may be what you were avoiding
above.

errno is meaningful when commit_lock_file fails, making a more
detailed diagnosis from strerror(errno) possible.

Thanks,
Jonathan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/23] expire_reflog(): use a lock_file for rewriting the reflog file

2014-12-04 Thread Stefan Beller
On Thu, Dec 04, 2014 at 04:23:31PM -0800, Jonathan Nieder wrote:
 Michael Haggerty wrote:
 
  We don't actually need the locking functionality, because we already
  hold the lock on the reference itself, which is how the reflog file is
  locked. But the lock_file code still does some of the bookkeeping for
  us and is more careful than the old code here was.
 
 As you say, the ref lock takes care of mutual exclusion, so we do not
 have to be too careful about compatibility with other tools that might
 not know to lock the reflog.  And this is not tying our hands for a
 future when I might want to lock logs/refs/heads/topic/1 while
 logs/refs/heads/topic still exists as part of the implementation of
 git mv topic/1 topic.
 
 Stefan and I had forgotten about that guarantee when looking at that
 kind of operation --- thanks for the reminder.

I did not forget about it, I did not know about that in the first hand.
We don't seem to have documentation on it?

So sorry for heading in a direction, which would have been avoidable.

Thanks,
Stefan

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/23] expire_reflog(): use a lock_file for rewriting the reflog file

2014-12-04 Thread ronnie sahlberg
On Thu, Dec 4, 2014 at 3:08 PM, Michael Haggerty mhag...@alum.mit.edu wrote:
 We don't actually need the locking functionality, because we already
 hold the lock on the reference itself,

No. You do need the lock.
The ref is locked only during transaction_commit()

If you don't want to lock the reflog file and instead rely on the lock
on the ref itself you will need to
rework your patches so that the lock on the ref is taken already
during, for example, transaction_update_ref() instead.

But without doing those changes and moving the ref locking from
_commit() to _update_ref() you will risk reflog corruption/surprises
if two operations collide and both rewrite the reflog without any lock held.


 which is how the reflog file is
 locked. But the lock_file code still does some of the bookkeeping for
 us and is more careful than the old code here was.

 For example:

 * Correctly handle the case that the reflog lock file already exists
   for some reason or cannot be opened.

 * Correctly clean up the lockfile if the program dies.

 Signed-off-by: Michael Haggerty mhag...@alum.mit.edu
 ---
  builtin/reflog.c | 37 ++---
  1 file changed, 22 insertions(+), 15 deletions(-)

 diff --git a/builtin/reflog.c b/builtin/reflog.c
 index a282e60..d344d45 100644
 --- a/builtin/reflog.c
 +++ b/builtin/reflog.c
 @@ -349,12 +349,14 @@ static int push_tip_to_list(const char *refname, const 
 unsigned char *sha1, int
 return 0;
  }

 +static struct lock_file reflog_lock;
 +
  static int expire_reflog(const char *refname, const unsigned char *sha1, 
 void *cb_data)
  {
 struct cmd_reflog_expire_cb *cmd = cb_data;
 struct expire_reflog_cb cb;
 struct ref_lock *lock;
 -   char *log_file, *newlog_path = NULL;
 +   char *log_file;
 struct commit *tip_commit;
 struct commit_list *tips;
 int status = 0;
 @@ -372,10 +374,14 @@ static int expire_reflog(const char *refname, const 
 unsigned char *sha1, void *c
 unlock_ref(lock);
 return 0;
 }
 +
 log_file = git_pathdup(logs/%s, refname);
 if (!cmd-dry_run) {
 -   newlog_path = git_pathdup(logs/%s.lock, refname);
 -   cb.newlog = fopen(newlog_path, w);
 +   if (hold_lock_file_for_update(reflog_lock, log_file, 0)  0)
 +   goto failure;
 +   cb.newlog = fdopen_lock_file(reflog_lock, w);
 +   if (!cb.newlog)
 +   goto failure;
 }

 cb.cmd = cmd;
 @@ -423,10 +429,9 @@ static int expire_reflog(const char *refname, const 
 unsigned char *sha1, void *c
 }

 if (cb.newlog) {
 -   if (fclose(cb.newlog)) {
 -   status |= error(%s: %s, strerror(errno),
 -   newlog_path);
 -   unlink(newlog_path);
 +   if (close_lock_file(reflog_lock)) {
 +   status |= error(Couldn't write %s: %s, log_file,
 +   strerror(errno));
 } else if (cmd-updateref 
 (write_in_full(lock-lock_fd,
 sha1_to_hex(cb.last_kept_sha1), 40) != 40 ||
 @@ -434,21 +439,23 @@ static int expire_reflog(const char *refname, const 
 unsigned char *sha1, void *c
  close_ref(lock)  0)) {
 status |= error(Couldn't write %s,
 lock-lk-filename.buf);
 -   unlink(newlog_path);
 -   } else if (rename(newlog_path, log_file)) {
 -   status |= error(cannot rename %s to %s,
 -   newlog_path, log_file);
 -   unlink(newlog_path);
 +   rollback_lock_file(reflog_lock);
 +   } else if (commit_lock_file(reflog_lock)) {
 +   status |= error(cannot rename %s.lock to %s,
 +   log_file, log_file);
 } else if (cmd-updateref  commit_ref(lock)) {
 status |= error(Couldn't set %s, lock-ref_name);
 -   } else {
 -   adjust_shared_perm(log_file);
 }
 }
 -   free(newlog_path);
 free(log_file);
 unlock_ref(lock);
 return status;
 +
 + failure:
 +   rollback_lock_file(reflog_lock);
 +   free(log_file);
 +   unlock_ref(lock);
 +   return -1;
  }

  static int collect_reflog(const char *ref, const unsigned char *sha1, int 
 unused, void *cb_data)
 --
 2.1.3

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html