Re: [PATCH v4] gc: save log from daemonized gc --auto and print it next time

2015-09-17 Thread Michael Haggerty
On 09/16/2015 06:00 PM, Junio C Hamano wrote:
> *1* But one "gc" could be foreground gc that does not write to a log
> file and the other "gc" could be background that does write to a
> log file, so this cannot be the primary way to avoid double
> execution.

^^^ This was the point that was confusing me. If this is not one of the
roles of the log file, then things are easier.

If you decide to go the route of tempfile/lockfile modifications, feel
free to CC me for feedback.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] gc: save log from daemonized gc --auto and print it next time

2015-09-17 Thread Junio C Hamano
Duy Nguyen  writes:

> We do keep another lock before coming to opening this log file. So
> once we get here we already know nobody else will be opening the log
> file. We can simply open it the normal way, then make sure we clean it
> up at atexit().
>
>>> This doesn't seem like a common thing to want (as in, this might be the
>>> only caller), but it probably makes sense to build it into the
>>> tempfile/lockfile API nevertheless, because implementing it externally
>>> would require a lot of other code to be duplicated.
>>>
>>> Another possibility that might work (maybe without requiring changes to
>>> tempfile/lockfile): don't worry about deleting the log file if it is
>>> empty, but make observers treat an empty log file the same as an absent one.
>>
>> Probably your "don't remove and check for emptiness" approach would
>> be the simpler of the two, but I think we can go either way.
>
> People have complained to me about stray files in $GIT_DIR, so it's
> probably best that we delete empty/useless files. Although we could
> delete empty files at the beginning of the next gc instead of at
> atexit(). Let me try it out and see which is simplest.

It would be nicer if we did not have to do an extra and custom
atexit() handler.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] gc: save log from daemonized gc --auto and print it next time

2015-09-17 Thread Duy Nguyen
On Wed, Sep 16, 2015 at 11:00 PM, Junio C Hamano  wrote:
> Michael Haggerty  writes:
>
>> I'm not sure what behavior you want. At one point you say "puts the file
>> to a final place if it is non-empty" but later you say "leave it if
>> non-empty". Should the file be written directly, or should it be written
>> to a lockfile and renamed into place only when complete?
>
> I do not think we care that deeply either way, as we do not want to
> run multiple auto-gc's at the same time in the first place.  So either
> one of the following is perfectly fine:
>
>  * We open the final destination directly, but with O_CREAT|O_EXCL,
>which as a side effect also detects the simultanous execution
>[*1*].  We do not do any "finalizing rename" if we go this route.
>When we are done, close it and remove it if we did not write
>anything, or leave it there if we did write something.
>
>  * We open a lockfile the usual way.  When we are done, close it and
>and remove it if we did not write anything, or finalize it by
>renaming it if we did write something.
>
> I think Duy's code wants to do the latter.

We do keep another lock before coming to opening this log file. So
once we get here we already know nobody else will be opening the log
file. We can simply open it the normal way, then make sure we clean it
up at atexit().

>> This doesn't seem like a common thing to want (as in, this might be the
>> only caller), but it probably makes sense to build it into the
>> tempfile/lockfile API nevertheless, because implementing it externally
>> would require a lot of other code to be duplicated.
>>
>> Another possibility that might work (maybe without requiring changes to
>> tempfile/lockfile): don't worry about deleting the log file if it is
>> empty, but make observers treat an empty log file the same as an absent one.
>
> Probably your "don't remove and check for emptiness" approach would
> be the simpler of the two, but I think we can go either way.

People have complained to me about stray files in $GIT_DIR, so it's
probably best that we delete empty/useless files. Although we could
delete empty files at the beginning of the next gc instead of at
atexit(). Let me try it out and see which is simplest.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] gc: save log from daemonized gc --auto and print it next time

2015-09-16 Thread Michael Haggerty
On 09/14/2015 07:37 PM, Junio C Hamano wrote:
> Junio C Hamano  writes:
> 
>> Thanks, will queue.
> 
> Ehh, I spoke a bit too early.
> 
>>> diff --git a/builtin/gc.c b/builtin/gc.c
>>> index bcc75d9..2c3aaeb 100644
>>> --- a/builtin/gc.c
>>> +++ b/builtin/gc.c
>>> @@ -43,9 +43,20 @@ static struct argv_array prune_worktrees = 
>>> ARGV_ARRAY_INIT;
>>>  static struct argv_array rerere = ARGV_ARRAY_INIT;
>>>  
>>>  static char *pidfile;
>>> +static struct strbuf log_filename = STRBUF_INIT;
>>> +static int daemonized;
>>>  
>>>  static void remove_pidfile(void)
>>>  {
>>> +   if (daemonized && log_filename.len) {
>>> +   struct stat st;
>>> +
>>> +   close(2);
>>> +   if (stat(log_filename.buf, ) ||
>>> +   !st.st_size ||
>>> +   rename(log_filename.buf, git_path("gc.log")))
>>> +   unlink(log_filename.buf);
>>> +   }
> 
> Unfortuantely we cannot queue this as-is, as we let the tempfile API
> to automatically manage the pidfile since ebebeaea (gc: use tempfile
> module to handle gc.pid file, 2015-08-10), and you cannot piggy-back
> the log file finalization to this function that no longer exists.
> 
> Besides, it is obviously wrong to remove this log file in a function
> whose name is remove_pidfile() ;-)
> 
> Adding a new function to tempfile API that puts the file to a final
> place if it is non-empty and otherwise remove it, and using that to
> create this "gc.log" file, would be the cleanest from the point of
> view of _this_ codepath.  I however do not know if that is too
> specific for the need of this codepath or "leave it if non-empty,
> but otherwise remove as it is uninteresting" is fairly common thing
> we would want and it is a good addition to the API set.
> 
> Michael, what do you think?

I'm not sure what behavior you want. At one point you say "puts the file
to a final place if it is non-empty" but later you say "leave it if
non-empty". Should the file be written directly, or should it be written
to a lockfile and renamed into place only when complete?

Technically I don't see a problem implementing either behavior. POSIX
allows [1] calls to stat() and rename() from a signal handler. There is
a minor technical difficulty that commit_lock_file() allocates memory
via get_locked_file_path(), but this would be surmountable by
pre-allocating the memory for the locked file path and storing it in the
lock_file object.

This doesn't seem like a common thing to want (as in, this might be the
only caller), but it probably makes sense to build it into the
tempfile/lockfile API nevertheless, because implementing it externally
would require a lot of other code to be duplicated.

Another possibility that might work (maybe without requiring changes to
tempfile/lockfile): don't worry about deleting the log file if it is
empty, but make observers treat an empty log file the same as an absent one.

Michael

[1]
http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03

-- 
Michael Haggerty
mhag...@alum.mit.edu

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] gc: save log from daemonized gc --auto and print it next time

2015-09-16 Thread Junio C Hamano
Michael Haggerty  writes:

> I'm not sure what behavior you want. At one point you say "puts the file
> to a final place if it is non-empty" but later you say "leave it if
> non-empty". Should the file be written directly, or should it be written
> to a lockfile and renamed into place only when complete?

I do not think we care that deeply either way, as we do not want to
run multiple auto-gc's at the same time in the first place.  So either
one of the following is perfectly fine:

 * We open the final destination directly, but with O_CREAT|O_EXCL,
   which as a side effect also detects the simultanous execution
   [*1*].  We do not do any "finalizing rename" if we go this route.
   When we are done, close it and remove it if we did not write
   anything, or leave it there if we did write something.

 * We open a lockfile the usual way.  When we are done, close it and
   and remove it if we did not write anything, or finalize it by
   renaming it if we did write something.

I think Duy's code wants to do the latter.

> This doesn't seem like a common thing to want (as in, this might be the
> only caller), but it probably makes sense to build it into the
> tempfile/lockfile API nevertheless, because implementing it externally
> would require a lot of other code to be duplicated.
>
> Another possibility that might work (maybe without requiring changes to
> tempfile/lockfile): don't worry about deleting the log file if it is
> empty, but make observers treat an empty log file the same as an absent one.

Probably your "don't remove and check for emptiness" approach would
be the simpler of the two, but I think we can go either way.

Thanks.

[Footnote]

*1* But one "gc" could be foreground gc that does not write to a log
file and the other "gc" could be background that does write to a
log file, so this cannot be the primary way to avoid double
execution.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] gc: save log from daemonized gc --auto and print it next time

2015-09-14 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> While commit 9f673f9 (gc: config option for running --auto in
> background - 2014-02-08) helps reduce some complaints about 'gc
> --auto' hogging the terminal, it creates another set of problems.
>
> The latest in this set is, as the result of daemonizing, stderr is
> closed and all warnings are lost. This warning at the end of cmd_gc()
> is particularly important because it tells the user how to avoid "gc
> --auto" running repeatedly. Because stderr is closed, the user does
> not know, naturally they complain about 'gc --auto' wasting CPU.
>
> Daemonized gc now saves stderr to $GIT_DIR/gc.log. Following gc --auto
> will not run and gc.log printed out until the user removes gc.log.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  When gc.log exists, gc --auto now simply exits
>
>  builtin/gc.c | 33 -
>  1 file changed, 32 insertions(+), 1 deletion(-)

Thanks, will queue.

This is sufficient for non-novice users, but I wonder if the error
message is strong enough.

A knee-jerk reaction to "Last run reported an error so we won't run
until the log is removed" may be "Then why don't you remove the log
for me automatically?", which is a total brain-less raction [*1*],
and "then I'll remove the log" is not all that better.  The message
we want to convey is "correct the root cause so that gc can run
without having to give an error message."

I guess it all depends on what is in gc.log; after all, this new
codepath is not in a good position to read the diagnosis left in
there and offer good pieces of advice to resolve them.


[Footnote]

*1* ... which is what a knee-jerk reaction is by definition.



>
> diff --git a/builtin/gc.c b/builtin/gc.c
> index bcc75d9..2c3aaeb 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -43,9 +43,20 @@ static struct argv_array prune_worktrees = ARGV_ARRAY_INIT;
>  static struct argv_array rerere = ARGV_ARRAY_INIT;
>  
>  static char *pidfile;
> +static struct strbuf log_filename = STRBUF_INIT;
> +static int daemonized;
>  
>  static void remove_pidfile(void)
>  {
> + if (daemonized && log_filename.len) {
> + struct stat st;
> +
> + close(2);
> + if (stat(log_filename.buf, ) ||
> + !st.st_size ||
> + rename(log_filename.buf, git_path("gc.log")))
> + unlink(log_filename.buf);
> + }
>   if (pidfile)
>   unlink(pidfile);
>  }
> @@ -330,13 +341,21 @@ int cmd_gc(int argc, const char **argv, const char 
> *prefix)
>   fprintf(stderr, _("See \"git help gc\" for manual 
> housekeeping.\n"));
>   }
>   if (detach_auto) {
> + struct strbuf sb = STRBUF_INIT;
> + if (strbuf_read_file(, git_path("gc.log"), 0) > 0)
> + return error(_("last gc run reported:\n"
> +"%s\n"
> +"not running until %s is 
> removed"),
> +  sb.buf, git_path("gc.log"));
> + strbuf_release();
> +
>   if (gc_before_repack())
>   return -1;
>   /*
>* failure to daemonize is ok, we'll continue
>* in foreground
>*/
> - daemonize();
> + daemonized = !daemonize();
>   }
>   } else
>   add_repack_all_option();
> @@ -349,6 +368,18 @@ int cmd_gc(int argc, const char **argv, const char 
> *prefix)
>   name, (uintmax_t)pid);
>   }
>  
> + if (daemonized) {
> + int fd;
> +
> + strbuf_addstr(_filename, git_path("gc.log_XX"));
> + fd = xmkstemp(log_filename.buf);
> + if (fd >= 0) {
> + dup2(fd, 2);
> + close(fd);
> + } else
> + strbuf_release(_filename);
> + }
> +
>   if (gc_before_repack())
>   return -1;
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] gc: save log from daemonized gc --auto and print it next time

2015-09-14 Thread Junio C Hamano
Junio C Hamano  writes:

> Thanks, will queue.

Ehh, I spoke a bit too early.

>> diff --git a/builtin/gc.c b/builtin/gc.c
>> index bcc75d9..2c3aaeb 100644
>> --- a/builtin/gc.c
>> +++ b/builtin/gc.c
>> @@ -43,9 +43,20 @@ static struct argv_array prune_worktrees = 
>> ARGV_ARRAY_INIT;
>>  static struct argv_array rerere = ARGV_ARRAY_INIT;
>>  
>>  static char *pidfile;
>> +static struct strbuf log_filename = STRBUF_INIT;
>> +static int daemonized;
>>  
>>  static void remove_pidfile(void)
>>  {
>> +if (daemonized && log_filename.len) {
>> +struct stat st;
>> +
>> +close(2);
>> +if (stat(log_filename.buf, ) ||
>> +!st.st_size ||
>> +rename(log_filename.buf, git_path("gc.log")))
>> +unlink(log_filename.buf);
>> +}

Unfortuantely we cannot queue this as-is, as we let the tempfile API
to automatically manage the pidfile since ebebeaea (gc: use tempfile
module to handle gc.pid file, 2015-08-10), and you cannot piggy-back
the log file finalization to this function that no longer exists.

Besides, it is obviously wrong to remove this log file in a function
whose name is remove_pidfile() ;-)

Adding a new function to tempfile API that puts the file to a final
place if it is non-empty and otherwise remove it, and using that to
create this "gc.log" file, would be the cleanest from the point of
view of _this_ codepath.  I however do not know if that is too
specific for the need of this codepath or "leave it if non-empty,
but otherwise remove as it is uninteresting" is fairly common thing
we would want and it is a good addition to the API set.

Michael, what do you think?

>> @@ -330,13 +341,21 @@ int cmd_gc(int argc, const char **argv, const char 
>> *prefix)
>>  fprintf(stderr, _("See \"git help gc\" for manual 
>> housekeeping.\n"));
>>  }
>>  if (detach_auto) {
>> +struct strbuf sb = STRBUF_INIT;
>> +if (strbuf_read_file(, git_path("gc.log"), 0) > 0)
>> +return error(_("last gc run reported:\n"
>> +   "%s\n"
>> +   "not running until %s is 
>> removed"),
>> + sb.buf, git_path("gc.log"));
>> +strbuf_release();
>> +
>>  if (gc_before_repack())
>>  return -1;
>>  /*
>>   * failure to daemonize is ok, we'll continue
>>   * in foreground
>>   */
>> -daemonize();
>> +daemonized = !daemonize();
>>  }
>>  } else
>>  add_repack_all_option();
>> @@ -349,6 +368,18 @@ int cmd_gc(int argc, const char **argv, const char 
>> *prefix)
>>  name, (uintmax_t)pid);
>>  }
>>  
>> +if (daemonized) {
>> +int fd;
>> +
>> +strbuf_addstr(_filename, git_path("gc.log_XX"));
>> +fd = xmkstemp(log_filename.buf);
>> +if (fd >= 0) {
>> +dup2(fd, 2);
>> +close(fd);
>> +} else
>> +strbuf_release(_filename);
>> +}
>> +
>>  if (gc_before_repack())
>>  return -1;
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4] gc: save log from daemonized gc --auto and print it next time

2015-09-12 Thread Nguyễn Thái Ngọc Duy
While commit 9f673f9 (gc: config option for running --auto in
background - 2014-02-08) helps reduce some complaints about 'gc
--auto' hogging the terminal, it creates another set of problems.

The latest in this set is, as the result of daemonizing, stderr is
closed and all warnings are lost. This warning at the end of cmd_gc()
is particularly important because it tells the user how to avoid "gc
--auto" running repeatedly. Because stderr is closed, the user does
not know, naturally they complain about 'gc --auto' wasting CPU.

Daemonized gc now saves stderr to $GIT_DIR/gc.log. Following gc --auto
will not run and gc.log printed out until the user removes gc.log.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 When gc.log exists, gc --auto now simply exits

 builtin/gc.c | 33 -
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index bcc75d9..2c3aaeb 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -43,9 +43,20 @@ static struct argv_array prune_worktrees = ARGV_ARRAY_INIT;
 static struct argv_array rerere = ARGV_ARRAY_INIT;
 
 static char *pidfile;
+static struct strbuf log_filename = STRBUF_INIT;
+static int daemonized;
 
 static void remove_pidfile(void)
 {
+   if (daemonized && log_filename.len) {
+   struct stat st;
+
+   close(2);
+   if (stat(log_filename.buf, ) ||
+   !st.st_size ||
+   rename(log_filename.buf, git_path("gc.log")))
+   unlink(log_filename.buf);
+   }
if (pidfile)
unlink(pidfile);
 }
@@ -330,13 +341,21 @@ int cmd_gc(int argc, const char **argv, const char 
*prefix)
fprintf(stderr, _("See \"git help gc\" for manual 
housekeeping.\n"));
}
if (detach_auto) {
+   struct strbuf sb = STRBUF_INIT;
+   if (strbuf_read_file(, git_path("gc.log"), 0) > 0)
+   return error(_("last gc run reported:\n"
+  "%s\n"
+  "not running until %s is 
removed"),
+sb.buf, git_path("gc.log"));
+   strbuf_release();
+
if (gc_before_repack())
return -1;
/*
 * failure to daemonize is ok, we'll continue
 * in foreground
 */
-   daemonize();
+   daemonized = !daemonize();
}
} else
add_repack_all_option();
@@ -349,6 +368,18 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
name, (uintmax_t)pid);
}
 
+   if (daemonized) {
+   int fd;
+
+   strbuf_addstr(_filename, git_path("gc.log_XX"));
+   fd = xmkstemp(log_filename.buf);
+   if (fd >= 0) {
+   dup2(fd, 2);
+   close(fd);
+   } else
+   strbuf_release(_filename);
+   }
+
if (gc_before_repack())
return -1;
 
-- 
2.3.0.rc1.137.g477eb31

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html