Re: [PATCH] gc: do not warn about too many loose objects

2018-07-16 Thread Jonathan Nieder
Hi,

Jeff King wrote:
> On Mon, Jul 16, 2018 at 03:56:39PM -0700, Jonathan Nieder wrote:

>> The calling command in the motivating example is Android's "repo" tool:
>> 
>> bare_git.gc('--auto')
>> 
>> from https://gerrit-review.googlesource.com/c/git-repo/+/10598/.  I
>> think it's reasonable that it expects a status code of 0 in the normal
>> case.  So life is less simple than I hoped.
>
> IMHO it should ignore the return code, since that's what Git does
> itself. And at any rate, you'd miss errors from daemonized gc's (at
> least until the _next_ one runs and propagates the error code).

That suggests a possible improvement.  If all callers should be
ignoring the exit code, can we make the exit code in daemonize mode
unconditionally zero in this kind of case?

That doesn't really solve the problem:

 1. "gc --auto" would produce noise *every time it runs* until gc.log
is removed, for example via expiry

 2. "gc --auto" would not do any garbage collection until gc.log is
removed, for example by expiry

 3. "gc --auto" would still not ratelimit itself, for example when
there are a large number of loose unreachable objects that is not
large enough to hit the loose object threshold.

but maybe it's better than the present state.

To solve (1) and (2), we could introduce a gc.warnings file that
behaves like gc.log except that it only gets printed once and then
self-destructs, allowing gc --auto to proceed.  To solve (3), we could
introduce a gc.lastrun file that is touched whenever "gc --auto" runs
successfully and make "gc --auto" a no-op for a while after each run.

-- >8 --
Subject: gc: do not return error for prior errors in daemonized mode

Some build machines started failing to fetch updated source using
"repo sync", with error

  error: The last gc run reported the following. Please correct the root cause
  and remove /build/.repo/projects/tools/git.git/gc.log.
  Automatic cleanup will not be performed until the file is removed.

  warning: There are too many unreachable loose objects; run 'git prune' to 
remove them.

The cause takes some time to describe.

In v2.0.0-rc0~145^2 (gc: config option for running --auto in
background, 2014-02-08), "git gc --auto" learned to run in the
background instead of blocking the invoking command.  In this mode, it
closed stderr to avoid interleaving output with any subsequent
commands, causing warnings like the above to be swallowed; v2.6.3~24^2
(gc: save log from daemonized gc --auto and print it next time,
2015-09-19) addressed this by storing any diagnostic output in
.git/gc.log and allowing the next "git gc --auto" run to print it.

To avoid wasteful repeated fruitless gcs, when gc.log is present, the
subsequent "gc --auto" would die after print its contents.  Most git
commands, such as "git fetch", ignore the exit status from "git gc
--auto" so all is well at this point: the user gets to see the error
message, and the fetch succeeds, without a wasteful additional attempt
at an automatic gc.

External tools like repo[1], though, do care about the exit status
from "git gc --auto".  In non-daemonized mode, the exit status is
straightforward: if there is an error, it is nonzero, but after a
warning like the above, the status is zero.  The daemonized mode, as a
side effect of the other properties provided, offers a very strange
exit code convention:

 - if no housekeeping was required, the exit status is 0

 - the first real run, after forking into the background, returns exit
   status 0 unconditionally.  The parent process has no way to know
   whether gc will succeed.

 - if there is any diagnostic output in gc.log, subsequent runs return
   a nonzero exit status to indicate that gc was not triggered.

There's nothing for the calling program to act on on the basis of that
error.  Use status 0 consistently instead, to indicate that we decided
not to run a gc (just like if no housekeeping was required).  This
way, repo and similar tools can get the benefit of the same behavior
as tools like "git fetch" that ignore the exit status from gc --auto.

Once the period of time described by gc.pruneExpire elapses, the
unreachable loose objects will be removed by "git gc --auto"
automatically.

[1] https://gerrit-review.googlesource.com/c/git-repo/+/10598/

Reported-by: Andrii Dehtiarov 
Helped-by: Jeff King 
Signed-off-by: Jonathan Nieder 
---
 Documentation/config.txt |  3 ++-
 builtin/gc.c | 16 +++-
 t/t6500-gc.sh|  6 +++---
 3 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 1cc18a828c..5eaa4aaa7d 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1649,7 +1649,8 @@ will be repacked. After this the number of packs should 
go below
 gc.autoPackLimit and gc.bigPackThreshold should be respected again.
 
 gc.logExpiry::
-   If the file gc.log exists, then `git gc --auto` won't run
+   If the file gc.log exists, 

Re: [PATCH 15/16] commit-reach: make can_all_from_reach... linear

2018-07-16 Thread Jonathan Tan
> The first step includes using a depth-first-search (DFS) from each from
> commit, sorted by ascending generation number. We do not walk beyond the
> minimum generation number or the minimum commit date. This DFS is likely
> to be faster than the existing reachable() method because we expect
> previous ref values to be along the first-parent history.
> 
> If we find a target commit, then we mark everything in the DFS stack as
> a RESULT. This expands the set of targets for the other from commits. We
> also mark the visited commits using 'assign_flag' to prevent re-walking
> the same code.

Thanks for this - it was very helpful in understanding the code.

The function itself uses a DFS stack that contains only the trail
leading up to the currently processed node, and not the one that I'm
more familiar with, which also contains the siblings of processed nodes.
I'll annotate the function with my thought process in the hope that it
will aid future reviewers. (The diff as seen in the e-mail is confusing
so I'm reproducing the function itself, not any + or -.)

> int can_all_from_reach_with_flag(struct object_array *from,
>int with_flag, int assign_flag,
>time_t min_commit_date,
>uint32_t min_generation)
> {
>   struct commit **list = NULL;
>   int i;
>   int result = 1;
> 
>   ALLOC_ARRAY(list, from->nr);
>   for (i = 0; i < from->nr; i++) {
>   list[i] = (struct commit *)from->objects[i].item;
> 
>   parse_commit(list[i]);
> 
>   if (list[i]->generation < min_generation)
>   return 0;
>   }
> 
>   QSORT(list, from->nr, compare_commits_by_gen);
> 
>   for (i = 0; i < from->nr; i++) {
>   /* DFS from list[i] */
>   struct commit_list *stack = NULL;
> 
>   list[i]->object.flags |= assign_flag;
>   commit_list_insert(list[i], );
> 
>   while (stack) {
>   struct commit_list *parent;
> 
>   if (stack->item->object.flags & with_flag) {
>   pop_commit();
>   continue;
>   }

I wish that the code would refrain from pushing such an object instead
of popping it at the first opportunity, but I guess that doing so would
require the equivalent of a labeled break/continue. I have no qualms
with using "goto" in this case, but I know that some people don't like
it :-P

>   for (parent = stack->item->parents; parent; parent = 
> parent->next) {
>   if (parent->item->object.flags & (with_flag | 
> RESULT))
>   stack->item->object.flags |= RESULT;

Straightforward, and also produces the bubbling up that we want. An
object is never popped unless it has the "with_flag" flag (see above) or
all its parents have been processed. The object can encounter the "if"
statement multiple times; the last one is when all its parents have been
processed (and thus have the RESULT flag set if necessary).

>   if (!(parent->item->object.flags & 
> assign_flag)) {
>   parent->item->object.flags |= 
> assign_flag;
> 
>   parse_commit(parent->item);
> 
>   if (parent->item->date < 
> min_commit_date ||
>   parent->item->generation < 
> min_generation)
>   continue;
> 
>   commit_list_insert(parent->item, 
> );
>   break;
>   }

If not yet processed, push it onto the stack and break. The child commit
is still left on the stack. The next time the child commit is processed
(in an iteration of the "while" loop), the "for" loop will iterate until
the next unprocessed parent.

In the DFS that I'm used to, all parents would be pushed here, but
perhaps the fact that the iteration is postorder confuses things.
Anyway, if someone comes up with a better algorithm, replacing it
shouldn't be too difficult - the algorithm is contained within this
function, and there are tests to check the correctness of the algorithm
update.

>   }
> 
>   if (!parent)
>   pop_commit();

Only when we have no parents left are we completely done with the
current object.

>   }
> 
>   if (!(list[i]->object.flags & (with_flag | RESULT))) {
>   result = 0;
>   goto cleanup;
>   }

And after the DFS, if the original object did not have an appropriate
flag set, we do not bother with the other "want" objects.

>   }
> 
> cleanup:
>   for (i = 0; i < from->nr; i++) {
>   

Re: [PATCH v3 1/7] gpg-interface: add new config to select how to sign a commit

2018-07-16 Thread Jeff King
On Tue, Jul 17, 2018 at 12:03:11AM +, brian m. carlson wrote:

> > +gpg.format::
> > +   Specifies which key format to use when signing with `--gpg-sign`.
> > +   Default is "openpgp", that is also the only supported value.
> 
> I think, as discussed in the other thread, perhaps a different prefix
> for these options is in order if we'd like to plan for the future.
> Maybe this could be "signature.format", "sign.format", "signing.format",
> or "signingtool.format" (I tend to prefer the former, but not too
> strongly).
> 
> I anticipate that some projects will prefer other formats, and it makes
> it easier if we don't have to maintain two sets of legacy names.

Heh. This is slowly morphing into the original signingtool series.

For the record (since I think my response is what you meant by the
"other thread"), I'm OK with going down this gpg.* road for now, and
dealing with other tools if and when they appear (via the extra level of
indirection).

-Peff


[PATCH v2 0/6] git-submodule.sh: convert part of cmd_update to C

2018-07-16 Thread Stefan Beller
v2:
addressed review comments, renaming the struct, improving the commit message.

v1:
https://public-inbox.org/git/20180712194754.71979-1-sbel...@google.com/
I thought about writing it all in one go, but the series got too large,
so let's chew one bite at a time.

Thanks,
Stefan

Stefan Beller (6):
  git-submodule.sh: align error reporting for update mode to use path
  git-submodule.sh: rename unused variables
  builtin/submodule--helper: factor out submodule updating
  builtin/submodule--helper: store update_clone information in a struct
  builtin/submodule--helper: factor out method to update a single
submodule
  submodule--helper: introduce new update-module-mode helper

 builtin/submodule--helper.c | 152 
 git-submodule.sh|  22 +-
 2 files changed, 122 insertions(+), 52 deletions(-)

-- 
2.18.0.203.gfac676dfb9-goog

1:  d4e1ec45740 ! 1:  bbc8697a8ca git-submodule.sh: align error reporting for 
update mode to use path
@@ -6,7 +6,6 @@
 on its path, so let's do that for invalid update modes, too.
 
 Signed-off-by: Stefan Beller 
-Signed-off-by: Junio C Hamano 
 
 diff --git a/git-submodule.sh b/git-submodule.sh
 --- a/git-submodule.sh
2:  9c5ec3fccea ! 2:  7e26af17578 git-submodule.sh: rename unused variables
@@ -14,8 +14,12 @@
 using its own function starting in 48308681b07 (git submodule update:
 have a dedicated helper for cloning, 2016-02-29), its removal was 
missed.
 
+A later patch in this series also touches the communication between
+the submodule helper and git-submodule.sh, but let's have this as
+a preparatory patch, as it eases the next patch, which stores the
+raw data instead of the line printed for this communication.
+
 Signed-off-by: Stefan Beller 
-Signed-off-by: Junio C Hamano 
 
 diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
 --- a/builtin/submodule--helper.c
3:  a3fb4e5539f ! 3:  3e8d22b0c70 builtin/submodule--helper: factor out 
submodule updating
@@ -7,7 +7,6 @@
 most of it is still in git-submodule.sh.
 
 Signed-off-by: Stefan Beller 
-Signed-off-by: Junio C Hamano 
 
 diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
 --- a/builtin/submodule--helper.c
4:  e680684139d ! 4:  5e0a39015df builtin/submodule--helper: store update_clone 
information in a struct
@@ -11,7 +11,6 @@
 struct.
 
 Signed-off-by: Stefan Beller 
-Signed-off-by: Junio C Hamano 
 
 diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
 --- a/builtin/submodule--helper.c
@@ -20,7 +19,7 @@
return 0;
  }
  
-+struct submodule_update_clone_information {
++struct update_clone_data {
 +  const struct submodule *sub;
 +  struct object_id oid;
 +  unsigned just_cloned;
@@ -36,8 +35,8 @@
 -  /* Machine-readable status lines to be consumed by git-submodule.sh */
 -  struct string_list projectlines;
 +  /* to be consumed by git-submodule.sh */
-+  struct submodule_update_clone_information *submodule_lines;
-+  int submodule_lines_nr; int submodule_lines_alloc;
++  struct update_clone_data *update_clone;
++  int update_clone_nr; int update_clone_alloc;
  
/* If we want to stop as fast as possible and return an error */
unsigned quickstop : 1;
@@ -58,12 +57,12 @@
 -  strbuf_addf(, "dummy %s %d\t%s\n",
 -  oid_to_hex(>oid), needs_cloning, ce->name);
 -  string_list_append(>projectlines, sb.buf);
-+  ALLOC_GROW(suc->submodule_lines, suc->submodule_lines_nr + 1,
-+   suc->submodule_lines_alloc);
-+  oidcpy(>submodule_lines[suc->submodule_lines_nr].oid, >oid);
-+  suc->submodule_lines[suc->submodule_lines_nr].just_cloned = 
needs_cloning;
-+  suc->submodule_lines[suc->submodule_lines_nr].sub = sub;
-+  suc->submodule_lines_nr++;
++  ALLOC_GROW(suc->update_clone, suc->update_clone_nr + 1,
++ suc->update_clone_alloc);
++  oidcpy(>update_clone[suc->update_clone_nr].oid, >oid);
++  suc->update_clone[suc->update_clone_nr].just_cloned = needs_cloning;
++  suc->update_clone[suc->update_clone_nr].sub = sub;
++  suc->update_clone_nr++;
  
if (!needs_cloning)
goto cleanup;
@@ -83,11 +82,11 @@
  
 -  for_each_string_list_item(item, >projectlines)
 -  fprintf(stdout, "%s", item->string);
-+  for (i = 0; i < suc->submodule_lines_nr; i++) {
++  for (i = 0; i < suc->update_clone_nr; i++) {
 +  strbuf_addf(, "dummy %s %d\t%s\n",
-+  oid_to_hex(>submodule_lines[i].oid),
-+  suc->submodule_lines[i].just_cloned,
-+  suc->submodule_lines[i].sub->path);
++  

[PATCH v2 3/6] builtin/submodule--helper: factor out submodule updating

2018-07-16 Thread Stefan Beller
Separate the command line parsing from the actual execution of the command
within the repository. For now there is not a lot of execution as
most of it is still in git-submodule.sh.

Signed-off-by: Stefan Beller 
---
 builtin/submodule--helper.c | 59 +
 1 file changed, 33 insertions(+), 26 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index ebcfe85bfa9..96929ba1096 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1472,6 +1472,8 @@ struct submodule_update_clone {
/* failed clones to be retried again */
const struct cache_entry **failed_clones;
int failed_clones_nr, failed_clones_alloc;
+
+   int max_jobs;
 };
 #define SUBMODULE_UPDATE_CLONE_INIT {0, MODULE_LIST_INIT, 0, \
SUBMODULE_UPDATE_STRATEGY_INIT, 0, 0, -1, STRING_LIST_INIT_DUP, 0, \
@@ -1714,11 +1716,36 @@ static int gitmodules_update_clone_config(const char 
*var, const char *value,
return 0;
 }
 
+static int update_submodules(struct submodule_update_clone *suc)
+{
+   struct string_list_item *item;
+
+   run_processes_parallel(suc->max_jobs,
+  update_clone_get_next_task,
+  update_clone_start_failure,
+  update_clone_task_finished,
+  suc);
+
+   /*
+* We saved the output and put it out all at once now.
+* That means:
+* - the listener does not have to interleave their (checkout)
+*   work with our fetching.  The writes involved in a
+*   checkout involve more straightforward sequential I/O.
+* - the listener can avoid doing any work if fetching failed.
+*/
+   if (suc->quickstop)
+   return 1;
+
+   for_each_string_list_item(item, >projectlines)
+   fprintf(stdout, "%s", item->string);
+
+   return 0;
+}
+
 static int update_clone(int argc, const char **argv, const char *prefix)
 {
const char *update = NULL;
-   int max_jobs = 1;
-   struct string_list_item *item;
struct pathspec pathspec;
struct submodule_update_clone suc = SUBMODULE_UPDATE_CLONE_INIT;
 
@@ -1740,7 +1767,7 @@ static int update_clone(int argc, const char **argv, 
const char *prefix)
OPT_STRING(0, "depth", , "",
   N_("Create a shallow clone truncated to the "
  "specified number of revisions")),
-   OPT_INTEGER('j', "jobs", _jobs,
+   OPT_INTEGER('j', "jobs", _jobs,
N_("parallel jobs")),
OPT_BOOL(0, "recommend-shallow", _shallow,
N_("whether the initial clone should follow the 
shallow recommendation")),
@@ -1756,8 +1783,8 @@ static int update_clone(int argc, const char **argv, 
const char *prefix)
};
suc.prefix = prefix;
 
-   config_from_gitmodules(gitmodules_update_clone_config, _jobs);
-   git_config(gitmodules_update_clone_config, _jobs);
+   config_from_gitmodules(gitmodules_update_clone_config, _jobs);
+   git_config(gitmodules_update_clone_config, _jobs);
 
argc = parse_options(argc, argv, prefix, module_update_clone_options,
 git_submodule_helper_usage, 0);
@@ -1772,27 +1799,7 @@ static int update_clone(int argc, const char **argv, 
const char *prefix)
if (pathspec.nr)
suc.warn_if_uninitialized = 1;
 
-   run_processes_parallel(max_jobs,
-  update_clone_get_next_task,
-  update_clone_start_failure,
-  update_clone_task_finished,
-  );
-
-   /*
-* We saved the output and put it out all at once now.
-* That means:
-* - the listener does not have to interleave their (checkout)
-*   work with our fetching.  The writes involved in a
-*   checkout involve more straightforward sequential I/O.
-* - the listener can avoid doing any work if fetching failed.
-*/
-   if (suc.quickstop)
-   return 1;
-
-   for_each_string_list_item(item, )
-   fprintf(stdout, "%s", item->string);
-
-   return 0;
+   return update_submodules();
 }
 
 static int resolve_relative_path(int argc, const char **argv, const char 
*prefix)
-- 
2.18.0.203.gfac676dfb9-goog



[PATCH v2 6/6] submodule--helper: introduce new update-module-mode helper

2018-07-16 Thread Stefan Beller
This chews off a bit of the shell part of the update command in
git-submodule.sh. When writing the C code, keep in mind that the
submodule--helper part will go away eventually and we want to have
a C function that is able to determine the submodule update strategy,
it as a nicety, make determine_submodule_update_strategy accessible
for arbitrary repositories.

Signed-off-by: Stefan Beller 
---
 builtin/submodule--helper.c | 61 +
 git-submodule.sh| 16 +-
 2 files changed, 62 insertions(+), 15 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 034ba1bb2e0..d4cb7c72e33 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1444,6 +1444,66 @@ static int module_clone(int argc, const char **argv, 
const char *prefix)
return 0;
 }
 
+static void determine_submodule_update_strategy(struct repository *r,
+   int just_cloned,
+   const char *path,
+   const char *update,
+   struct 
submodule_update_strategy *out)
+{
+   const struct submodule *sub = submodule_from_path(r, _oid, path);
+   char *key;
+   const char *val;
+
+   key = xstrfmt("submodule.%s.update", sub->name);
+
+   if (update) {
+   trace_printf("parsing update");
+   if (parse_submodule_update_strategy(update, out) < 0)
+   die(_("Invalid update mode '%s' for submodule path 
'%s'"),
+   update, path);
+   } else if (!repo_config_get_string_const(r, key, )) {
+   if (parse_submodule_update_strategy(val, out) < 0)
+   die(_("Invalid update mode '%s' configured for 
submodule path '%s'"),
+   val, path);
+   } else if (sub->update_strategy.type != SM_UPDATE_UNSPECIFIED) {
+   trace_printf("loaded thing");
+   out->type = sub->update_strategy.type;
+   out->command = sub->update_strategy.command;
+   } else
+   out->type = SM_UPDATE_CHECKOUT;
+
+   if (just_cloned &&
+   (out->type == SM_UPDATE_MERGE ||
+out->type == SM_UPDATE_REBASE ||
+out->type == SM_UPDATE_NONE))
+   out->type = SM_UPDATE_CHECKOUT;
+
+   free(key);
+}
+
+static int module_update_module_mode(int argc, const char **argv, const char 
*prefix)
+{
+   const char *path, *update = NULL;
+   int just_cloned;
+   struct submodule_update_strategy update_strategy = { .type = 
SM_UPDATE_CHECKOUT };
+
+   if (argc < 3 || argc > 4)
+   die("submodule--helper update-module-clone expects 
  []");
+
+   just_cloned = git_config_int("just_cloned", argv[1]);
+   path = argv[2];
+
+   if (argc == 4)
+   update = argv[3];
+
+   determine_submodule_update_strategy(the_repository,
+   just_cloned, path, update,
+   _strategy);
+   fprintf(stdout, submodule_strategy_to_string(_strategy));
+
+   return 0;
+}
+
 struct update_clone_data {
const struct submodule *sub;
struct object_id oid;
@@ -2039,6 +2099,7 @@ static struct cmd_struct commands[] = {
{"list", module_list, 0},
{"name", module_name, 0},
{"clone", module_clone, 0},
+   {"update-module-mode", module_update_module_mode, 0},
{"update-clone", update_clone, 0},
{"relative-path", resolve_relative_path, 0},
{"resolve-relative-url", resolve_relative_url, 0},
diff --git a/git-submodule.sh b/git-submodule.sh
index 56588aa304d..215760898ce 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -535,27 +535,13 @@ cmd_update()
do
die_if_unmatched "$quickabort" "$sha1"
 
-   name=$(git submodule--helper name "$sm_path") || exit
-   if ! test -z "$update"
-   then
-   update_module=$update
-   else
-   update_module=$(git config submodule."$name".update)
-   if test -z "$update_module"
-   then
-   update_module="checkout"
-   fi
-   fi
+   update_module=$(git submodule--helper update-module-mode 
$just_cloned "$sm_path" $update)
 
displaypath=$(git submodule--helper relative-path 
"$prefix$sm_path" "$wt_prefix")
 
if test $just_cloned -eq 1
then
subsha1=
-   case "$update_module" in
-   merge | rebase | none)
-   update_module=checkout ;;
-   esac
else

[PATCH v2 4/6] builtin/submodule--helper: store update_clone information in a struct

2018-07-16 Thread Stefan Beller
The information that is printed for update_submodules in
'submodule--helper update-clone' and consumed by 'git submodule update'
is stored as a string per submodule. This made sense at the time of
48308681b07 (git submodule update: have a dedicated helper for cloning,
2016-02-29), but as we want to migrate the rest of the submodule update
into C, we're better off having access to the raw information in a helper
struct.

Signed-off-by: Stefan Beller 
---
 builtin/submodule--helper.c | 37 +++--
 1 file changed, 27 insertions(+), 10 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 96929ba1096..fb54936efc7 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1444,6 +1444,12 @@ static int module_clone(int argc, const char **argv, 
const char *prefix)
return 0;
 }
 
+struct update_clone_data {
+   const struct submodule *sub;
+   struct object_id oid;
+   unsigned just_cloned;
+};
+
 struct submodule_update_clone {
/* index into 'list', the list of submodules to look into for cloning */
int current;
@@ -1463,8 +1469,9 @@ struct submodule_update_clone {
const char *recursive_prefix;
const char *prefix;
 
-   /* Machine-readable status lines to be consumed by git-submodule.sh */
-   struct string_list projectlines;
+   /* to be consumed by git-submodule.sh */
+   struct update_clone_data *update_clone;
+   int update_clone_nr; int update_clone_alloc;
 
/* If we want to stop as fast as possible and return an error */
unsigned quickstop : 1;
@@ -1478,7 +1485,7 @@ struct submodule_update_clone {
 #define SUBMODULE_UPDATE_CLONE_INIT {0, MODULE_LIST_INIT, 0, \
SUBMODULE_UPDATE_STRATEGY_INIT, 0, 0, -1, STRING_LIST_INIT_DUP, 0, \
NULL, NULL, NULL, \
-   STRING_LIST_INIT_DUP, 0, NULL, 0, 0}
+   NULL, 0, 0, 0, NULL, 0, 0, 0}
 
 
 static void next_submodule_warn_missing(struct submodule_update_clone *suc,
@@ -1572,10 +1579,12 @@ static int prepare_to_clone_next_submodule(const struct 
cache_entry *ce,
strbuf_addf(, "%s/.git", ce->name);
needs_cloning = !file_exists(sb.buf);
 
-   strbuf_reset();
-   strbuf_addf(, "dummy %s %d\t%s\n",
-   oid_to_hex(>oid), needs_cloning, ce->name);
-   string_list_append(>projectlines, sb.buf);
+   ALLOC_GROW(suc->update_clone, suc->update_clone_nr + 1,
+  suc->update_clone_alloc);
+   oidcpy(>update_clone[suc->update_clone_nr].oid, >oid);
+   suc->update_clone[suc->update_clone_nr].just_cloned = needs_cloning;
+   suc->update_clone[suc->update_clone_nr].sub = sub;
+   suc->update_clone_nr++;
 
if (!needs_cloning)
goto cleanup;
@@ -1718,7 +1727,8 @@ static int gitmodules_update_clone_config(const char 
*var, const char *value,
 
 static int update_submodules(struct submodule_update_clone *suc)
 {
-   struct string_list_item *item;
+   int i;
+   struct strbuf sb = STRBUF_INIT;
 
run_processes_parallel(suc->max_jobs,
   update_clone_get_next_task,
@@ -1737,9 +1747,16 @@ static int update_submodules(struct 
submodule_update_clone *suc)
if (suc->quickstop)
return 1;
 
-   for_each_string_list_item(item, >projectlines)
-   fprintf(stdout, "%s", item->string);
+   for (i = 0; i < suc->update_clone_nr; i++) {
+   strbuf_addf(, "dummy %s %d\t%s\n",
+   oid_to_hex(>update_clone[i].oid),
+   suc->update_clone[i].just_cloned,
+   suc->update_clone[i].sub->path);
+   fprintf(stdout, "%s", sb.buf);
+   strbuf_reset();
+   }
 
+   strbuf_release();
return 0;
 }
 
-- 
2.18.0.203.gfac676dfb9-goog



[PATCH v2 2/6] git-submodule.sh: rename unused variables

2018-07-16 Thread Stefan Beller
The 'mode' variable is not used in cmd_update for its original purpose,
rename it to 'dummy' as it only serves the purpose to abort quickly
documenting this knowledge.

The variable 'stage' is also not used any more in cmd_update, so remove it.

This went unnoticed as first each function used the commonly used
submodule listing, which was converted in 74703a1e4df (submodule: rewrite
`module_list` shell function in C, 2015-09-02). When cmd_update was
using its own function starting in 48308681b07 (git submodule update:
have a dedicated helper for cloning, 2016-02-29), its removal was missed.

A later patch in this series also touches the communication between
the submodule helper and git-submodule.sh, but let's have this as
a preparatory patch, as it eases the next patch, which stores the
raw data instead of the line printed for this communication.

Signed-off-by: Stefan Beller 
---
 builtin/submodule--helper.c | 5 ++---
 git-submodule.sh| 4 ++--
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 20ae9191ca3..ebcfe85bfa9 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1571,9 +1571,8 @@ static int prepare_to_clone_next_submodule(const struct 
cache_entry *ce,
needs_cloning = !file_exists(sb.buf);
 
strbuf_reset();
-   strbuf_addf(, "%06o %s %d %d\t%s\n", ce->ce_mode,
-   oid_to_hex(>oid), ce_stage(ce),
-   needs_cloning, ce->name);
+   strbuf_addf(, "dummy %s %d\t%s\n",
+   oid_to_hex(>oid), needs_cloning, ce->name);
string_list_append(>projectlines, sb.buf);
 
if (!needs_cloning)
diff --git a/git-submodule.sh b/git-submodule.sh
index 8a611865397..56588aa304d 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -531,9 +531,9 @@ cmd_update()
"$@" || echo "#unmatched" $?
} | {
err=
-   while read -r mode sha1 stage just_cloned sm_path
+   while read -r quickabort sha1 just_cloned sm_path
do
-   die_if_unmatched "$mode" "$sha1"
+   die_if_unmatched "$quickabort" "$sha1"
 
name=$(git submodule--helper name "$sm_path") || exit
if ! test -z "$update"
-- 
2.18.0.203.gfac676dfb9-goog



[PATCH v2 5/6] builtin/submodule--helper: factor out method to update a single submodule

2018-07-16 Thread Stefan Beller
In a later patch we'll find this method handy.

Signed-off-by: Stefan Beller 
---
 builtin/submodule--helper.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index fb54936efc7..034ba1bb2e0 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1725,10 +1725,17 @@ static int gitmodules_update_clone_config(const char 
*var, const char *value,
return 0;
 }
 
+static void update_submodule(struct update_clone_data *ucd)
+{
+   fprintf(stdout, "dummy %s %d\t%s\n",
+   oid_to_hex(>oid),
+   ucd->just_cloned,
+   ucd->sub->path);
+}
+
 static int update_submodules(struct submodule_update_clone *suc)
 {
int i;
-   struct strbuf sb = STRBUF_INIT;
 
run_processes_parallel(suc->max_jobs,
   update_clone_get_next_task,
@@ -1747,16 +1754,9 @@ static int update_submodules(struct 
submodule_update_clone *suc)
if (suc->quickstop)
return 1;
 
-   for (i = 0; i < suc->update_clone_nr; i++) {
-   strbuf_addf(, "dummy %s %d\t%s\n",
-   oid_to_hex(>update_clone[i].oid),
-   suc->update_clone[i].just_cloned,
-   suc->update_clone[i].sub->path);
-   fprintf(stdout, "%s", sb.buf);
-   strbuf_reset();
-   }
+   for (i = 0; i < suc->update_clone_nr; i++)
+   update_submodule(>update_clone[i]);
 
-   strbuf_release();
return 0;
 }
 
-- 
2.18.0.203.gfac676dfb9-goog



[PATCH v2 1/6] git-submodule.sh: align error reporting for update mode to use path

2018-07-16 Thread Stefan Beller
All other error messages in cmd_update are reporting the submodule based
on its path, so let's do that for invalid update modes, too.

Signed-off-by: Stefan Beller 
---
 git-submodule.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/git-submodule.sh b/git-submodule.sh
index 5f9d9f6ea37..8a611865397 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -627,7 +627,7 @@ cmd_update()
must_die_on_failure=yes
;;
*)
-   die "$(eval_gettext "Invalid update mode 
'$update_module' for submodule '$name'")"
+   die "$(eval_gettext "Invalid update mode 
'$update_module' for submodule path '$path'")"
esac
 
if (sanitize_submodule_env; cd "$sm_path" && $command 
"$sha1")
-- 
2.18.0.203.gfac676dfb9-goog



Re: [PATCH 13/16] test-reach: test can_all_from_reach_with_flags

2018-07-16 Thread Jonathan Tan
The subject should be can_all_from_reach_with_flag (without the "s" at
the end). Likewise in the commit message.

> To make this method testable, add a new can_all_from_reach method that
> does the initial setup and final tear-down. Call the method from
> 'test-tool reach'.

This description leads me to believe that can_all_from_reach() is (1)
trivial, and (2) will not be used in production code. But (1) the
function itself is non-trivial and the function signature contains a
"cutoff_by_min_date" parameter not found in
can_all_from_reach_with_flag():

> +int can_all_from_reach(struct commit_list *from, struct commit_list *to,
> +int cutoff_by_min_date)

and (2) this function will be used in production code subsequently in
the "commit-reach: use can_all_from_reach" commit. It would be clearer,
maybe, if there were some rearrangement - maybe a commit introducing
this function, especially documenting what cutoff_by_min_date does, and
then a test (which just tests can_all_from_reach), and then the
"commit-reach: use can_all_from_reach" commit.


Re: [PATCH v3 1/7] gpg-interface: add new config to select how to sign a commit

2018-07-16 Thread brian m. carlson
On Fri, Jul 13, 2018 at 10:41:23AM +0200, Henning Schild wrote:
> Add "gpg.format" where the user can specify which type of signature to
> use for commits. At the moment only "openpgp" is supported and the value is
> not even used. This commit prepares for a new types of signatures.
> 
> Signed-off-by: Henning Schild 
> ---
>  Documentation/config.txt | 4 
>  gpg-interface.c  | 7 +++
>  2 files changed, 11 insertions(+)
> 
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 1cc18a828..ac373e3f4 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -1828,6 +1828,10 @@ gpg.program::
>   signed, and the program is expected to send the result to its
>   standard output.
>  
> +gpg.format::
> + Specifies which key format to use when signing with `--gpg-sign`.
> + Default is "openpgp", that is also the only supported value.

I think, as discussed in the other thread, perhaps a different prefix
for these options is in order if we'd like to plan for the future.
Maybe this could be "signature.format", "sign.format", "signing.format",
or "signingtool.format" (I tend to prefer the former, but not too
strongly).

I anticipate that some projects will prefer other formats, and it makes
it easier if we don't have to maintain two sets of legacy names.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH 4/6] builtin/submodule--helper: store update_clone information in a struct

2018-07-16 Thread Stefan Beller
On Mon, Jul 16, 2018 at 12:37 PM Junio C Hamano  wrote:
>
> Stefan Beller  writes:
>
> > The information that is printed for update_submodules in
> > 'submodule--helper update-clone' and consumed by 'git submodule update'
> > is stored as a string per submodule. This made sense at the time of
> > 48308681b07 (git submodule update: have a dedicated helper for cloning,
> > 2016-02-29), but as we want to migrate the rest of the submodule update
> > into C, we're better off having access to the raw information in a helper
> > struct.
>
> The reasoning above makes sense, but I have a few niggles on the
> naming.
>
>  * Anything you'd place to keep track of the state is "information",
>so a whole "_information" suffix to the structure name does not
>add much value.  We've seen our fair share of (meaningless)
>"_data" suffix used in many places but I think the overly long
>"_information" that is equally meaningless trumps them.  I think
>we also have "_info", but if we are not going to have a beter
>name, let's not be original and stick to "_data" like other
>existing codepath.  An alternative with more meaningful name is
>of course better, though, but it is not all that much worth to
>spend too many braincycles on it.

agreed.

>  * Is the fact that these parameters necessary to help populating
>the submodule repository are sent on a line to external process
>the most important aspect of the submodule_lines[] array?  As

In terms of what the submodule--helper does, it is.
It is not the most important aspect in the big picture, or once
we finish the migration to not have shell interpret its output.

>this step is a preparation to migrate out of that line/text
>oriented IPC, I think line-ness is the least important and
>interesting thing to name the variable with.

ok.

> If I were writing this patch, perhaps I'd do
>
> struct update_clone_data *update_clone;
> int update_clone_nr, update_clone_alloc;
>
> following my gut, but since you've been thinking about submodule
> longer than I have, perhaps you can come up with a better name.

That makes sense. We do not need to mention 'submodule' as that
ought to be obvious from the file name already and 'update_clone'
is just enough to describe what we are doing.
Although there is already struct submodule_update_clone, which
would be the better candidate for 'update_clone' and this new
struct would be used as a helper in that struct, so update_clone_data

I'll think of adding a patch to rename that already existing struct
(submodule_update_clone -> update_clone) and renaming
this to update_clone_data.



>
> > @@ -1463,8 +1469,9 @@ struct submodule_update_clone {
> >   const char *recursive_prefix;
> >   const char *prefix;
> >
> > - /* Machine-readable status lines to be consumed by git-submodule.sh */
> > - struct string_list projectlines;
> > + /* to be consumed by git-submodule.sh */
> > + struct submodule_update_clone_information *submodule_lines;
> > + int submodule_lines_nr; int submodule_lines_alloc;
> >
> >   /* If we want to stop as fast as possible and return an error */
> >   unsigned quickstop : 1;
> > @@ -1478,7 +1485,7 @@ struct submodule_update_clone {
> >  #define SUBMODULE_UPDATE_CLONE_INIT {0, MODULE_LIST_INIT, 0, \
> >   SUBMODULE_UPDATE_STRATEGY_INIT, 0, 0, -1, STRING_LIST_INIT_DUP, 0, \
> >   NULL, NULL, NULL, \
> > - STRING_LIST_INIT_DUP, 0, NULL, 0, 0}
> > + NULL, 0, 0, 0, NULL, 0, 0, 0}
>
> The structure definition and this macro definition are nearby, so it
> is not crucial, but its probably not a bad idea to switch to C99
> initializers for a thing like this to make it more readable, once
> the code around this area stabilizes back again sufficiently (IOW,
> let's not distract ourselves in the middle of adding a new feature
> like this one).

Are we still in the phase of "test balloon" or do we now accept
C99 initializers all over the code base?

But I agree to defer the conversion for now.

Thanks,
Stefan


Offre de prêts

2018-07-16 Thread POPINET
Bonjour

Vous aviez besoin de prêts d'argent entre particuliers pour faire face
aux difficultés financières pour enfin sortir de l'impasse que
provoquent les banques, par le rejet de vos dossiers de demande de
crédits ?
Je suis un un citoyen français à la retraite en mesure de vous faire
un prêt de 5000 euros à 500 euros et avec des conditions qui vous
faciliteront la vie.Voici les domaines dans lesquels je peux vous
aider:
* Financier
* Prêt immobilier
* Prêt à l'investissement
* Prêt automobile
* Dette de consolidation
* Marge de crédit
* Deuxième hypothèque
* Rachat de crédit
* Prêt personnel
Vous êtes fichés, interdits bancaires et vous n'avez pas la faveur des
banques ou mieux vous avez un projet et besoin de financement, un
mauvais dossier de crédit ou besoin d'argent pour payer des factures,
fonds à investir dans les entreprises.
Alors si vous avez besoin de prêts d'argent n'hésitez pas à me
contacter par mail  popinetmic...@yahoo.com pour en savoir plus sur
mes conditions bien favorables.
Personne pas sérieux s'abstenir


Re: [PATCH] gc: do not warn about too many loose objects

2018-07-16 Thread Jeff King
On Mon, Jul 16, 2018 at 03:56:39PM -0700, Jonathan Nieder wrote:

> The calling command in the motivating example is Android's "repo" tool:
> 
> bare_git.gc('--auto')
> 
> from https://gerrit-review.googlesource.com/c/git-repo/+/10598/.  I
> think it's reasonable that it expects a status code of 0 in the normal
> case.  So life is less simple than I hoped.

IMHO it should ignore the return code, since that's what Git does
itself. And at any rate, you'd miss errors from daemonized gc's (at
least until the _next_ one runs and propagates the error code).

> Interesting!  It looks like that thread anticipated the problems we've
> seen here.  Three years without having to have fixed it is a good run,
> I suppose.
> 
> The discussion of stopping there appears to be primarily about
> stopping in the error case, not rate-limiting in the success or
> warning case.

I think the two are essentially the same. The point is that we cannot
make further progress by re-running the gc again, so we should not.

Amusingly, the warning we're talking about is the exact reason that the
logging in that thread was added.  329e6e8794 (gc: save log from
daemonized gc --auto and print it next time, 2015-09-19) says:

  The latest in this set is, as the result of daemonizing, stderr is
  closed and all warnings are lost. This warning at the end of cmd_gc()
  is particularly important because it tells the user how to avoid "gc
  --auto" running repeatedly. Because stderr is closed, the user does
  not know, naturally they complain about 'gc --auto' wasting CPU.

> -- >8 --
> Subject: gc: exit with status 128 on failure
> 
> A value of -1 returned from cmd_gc gets propagated to exit(),
> resulting in an exit status of 255.  Use die instead for a clearer
> error message and a controlled exit.

I agree it's better to not pass -1 to exit(). But I thought the original
motivation was the fact that we were returning non-zero in this case at
all?  (And I thought you and I both agreed with that motivation).

So is this meant to be a preparatory fix until somebody is interested in
fixing that?

> -static int gc_before_repack(void)
> +static void gc_before_repack(void)
>  {
>   if (pack_refs && run_command_v_opt(pack_refs_cmd.argv, RUN_GIT_CMD))
> - return error(FAILED_RUN, pack_refs_cmd.argv[0]);
> + die(FAILED_RUN, pack_refs_cmd.argv[0]);
>  
>   if (prune_reflogs && run_command_v_opt(reflog.argv, RUN_GIT_CMD))
> - return error(FAILED_RUN, reflog.argv[0]);
> + die(FAILED_RUN, reflog.argv[0]);

Dscho is going to yell at you about replacing error returns with die().  ;)

I wonder if it would be simpler to just reinterpret the "-1" into "128"
in cmd_gc(). I thought we had talked about having run_builtin() do that
at one point, but I don't think we ever did. I dunno. I'm fine with it
either way.

-Peff


Re: [PATCH v2 7/9] gpg-interface: introduce new config to select per gpg format program

2018-07-16 Thread Junio C Hamano
Jeff King  writes:

> On Mon, Jul 16, 2018 at 02:56:34PM -0700, Junio C Hamano wrote:
>
>> >> I'm okay with us forcing "openpgp".  That seems sane enough for now, and
>> >> if people scream loudly, we can loosen it.
>> >
>> > Well, I specifically meant "are you sure subsections like this are a
>> > good idea". But it seems like people think so?
>> 
>> I admit that I did not even consider that there may be better tool
>> than using subsections to record this information.  What are the
>> possibilities you have in mind (if you have one)?
>
> I don't think there is another tool except two-level options, like
> "gpg.openpgpprogram" and "gpg.x509program".
>
> Although those are a bit ugly, I just wondered if they might make things
> simpler, since AFAIK we are not planning to add more config options
> here. Like gpg.x509.someotherflag, nor gpg.someothertool.program.
>
> Of course one reason _for_ the tri-level is that we might one day add
> gpg.x509.someotherflag, and this gives us room to do it with less
> awkwardness (i.e., a proliferation of gpg.x509someflag options).

Yes, and signingtool.. is probably a good (ultra-)long
term direction.  Preparing the code may be quite a lot of work that
nobody may be interested in, and nothing other than the GPG family
might materialize for a long time, but if we can cheaply prepare
external interface less dependent on GPG/PGP, that by itself would
be a good thing to have, I would guess.

Thanks.


Re: [PATCH 11/16] test-reach: test get_merge_bases_many

2018-07-16 Thread Jonathan Tan
> @@ -71,6 +78,14 @@ int cmd__reach(int ac, const char **av)
>   printf("%s(A,B):%d\n", av[1], in_merge_bases(A, B));
>   else if (!strcmp(av[1], "is_descendant_of"))
>   printf("%s(A,X):%d\n", av[1], is_descendant_of(A, X));
> + else if (!strcmp(av[1], "get_merge_bases_many")) {
> + struct commit_list *list = get_merge_bases_many(A, X_nr, 
> X_array);
> + printf("%s(A,X):\n", av[1]);
> + while (list) {
> + printf("%s\n", oid_to_hex(>item->object.oid));
> + list = list->next;
> + }

I don't think get_merge_bases_many defines a sort order on its output?
It might be better to sort the resulting commit list here, so that the
output is more well-defined. (And omit the informational printf so that
it's slightly easier to generate the "expect" file.)


Re: [PATCH] gc: do not warn about too many loose objects

2018-07-16 Thread Elijah Newren
On Mon, Jul 16, 2018 at 3:55 PM, Jeff King  wrote:
> On Mon, Jul 16, 2018 at 03:07:34PM -0700, Elijah Newren wrote:
>
>> > If we were to delete those objects, wouldn't it be exactly the same as
>> > running "git prune"? Or setting your gc.pruneExpire to "now" or even "5
>> > minutes"?  Or are you concerned with taking other objects along for the
>> > ride that weren't part of old reflogs? I think that's a valid concern,
>>
>> Yes, I was worried about taking other objects along for the ride that
>> weren't part of old reflogs.
>>
>> > but it's also an issue for objects which were previously referenced in
>> > a reflog, but are part of another current operation.
>>
>> I'm not certain what you're referring to here.
>
> I mean that an ongoing operation could refer to a blob that just
> recently became unreachable via reflog pruning. And then we would delete
> it, leaving the repository corrupt.
>
> One of the protections we have against that is that if I ask to write
> blob XYZ and we find that we already have the object, Git will freshen
> the mtime of the loose object or pack that contains it, protecting it
> from pruning. But with your suggestion, we'd delete it immediately,
> regardless of the mtime of the containing pack.
>
> Another way to think of it is this: a reflog mentioning an object does
> not mean it is the exclusive user of that object. So when we expire it,
> that does not mean that it is OK to delete it immediately; there may be
> other users.
>
>> > Also, what do you do if there weren't reflogs in the repo? Or the
>> > reflogs were deleted (e.g., because branch deletion drops the associated
>> > reflog entirely)?
>>
>> Yes, there are issues this rule won't help with, but in my experience
>> it was a primary (if not sole) actual cause in practice.  (I believe I
>> even said elsewhere in this thread that I knew there were unreachable
>> objects for other reasons and they might also become large in number).
>> At $DAYJOB we've had multiple people including myself hit the "too
>> many unreachable loose objects" nasty loop issue (some of us multiple
>> different times), and as far as I can tell, most (perhaps even all) of
>> them would have been avoided by just "properly" deleting garbage as
>> per my object-age-is-reflog-age-if-not-otherwise-referenced rule.
>
> I agree with you that this is a frequent cause, and probably even the
> primary one. But my concern is that your loosening increases the risk of
> corruption for other cases.
>
>> > I assume by "these objects" you mean ones which used to be reachable
>> > from a reflog, but that reflog entry just expired.  I think you'd be
>> > sacrificing some race-safety in that case.
>>
>> Is that inherently any more race unsafe than 'git prune
>> --expire=2.weeks.ago'?  I thought it'd be racy in the same ways, and
>> thus a tradeoff folks are already accepting (at least implicitly) when
>> running git-gc.  Since these objects are actually 90 days old rather
>> than a mere two weeks, it actually seemed more safe to me.  But maybe
>> I'm overlooking something with the pack->loose transition that makes
>> it more racy?
>
> I think it's worse in terms of race-safety because you're losing one of
> the signals that users of the objects can provide to git-prune to tell
> it the object is useful: updating the mtime. So yes, you think of the
> objects as "90 days old", but we don't know if there are other users.
> Has somebody else been accessing them in the meantime?
>
> To be honest, I'm not sure how meaningful that distinction is in
> practice. The current scheme is still racy, even if the windows are
> shorter in some circumstances. But it seems like cruft packfiles are
> a similar amount of work to your scheme, cover more cases, and are
> slightly safer. And possibly even give us a long-term route to true
> race-free pruning (via the "garbage pack" mark that Jonathan mentioned).
>
> Assuming you buy into the idea that objects in a cruft-pack are not
> hurting anything aside from a little wasted storage for up to 2 weeks
> (which it sounds like you're at least partially on board with ;) ).

Ah, I see now.  Thanks (to you and Jonathan) for the thorough
explanations.  I'm totally on board now.


[PATCH 5/9] diff.c: adjust hash function signature to match hashmap expectation

2018-07-16 Thread Stefan Beller
This makes the follow up patch easier.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 diff.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/diff.c b/diff.c
index ce7bedc1b92..d1bae900cdc 100644
--- a/diff.c
+++ b/diff.c
@@ -707,11 +707,15 @@ struct moved_entry {
struct moved_entry *next_line;
 };
 
-static int moved_entry_cmp(const struct diff_options *diffopt,
-  const struct moved_entry *a,
-  const struct moved_entry *b,
+static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
+  const void *entry,
+  const void *entry_or_key,
   const void *keydata)
 {
+   const struct diff_options *diffopt = hashmap_cmp_fn_data;
+   const struct moved_entry *a = entry;
+   const struct moved_entry *b = entry_or_key;
+
return !xdiff_compare_lines(a->es->line, a->es->len,
b->es->line, b->es->len,
diffopt->xdl_opts);
@@ -5534,10 +5538,8 @@ static void diff_flush_patch_all_file_pairs(struct 
diff_options *o)
if (o->color_moved) {
struct hashmap add_lines, del_lines;
 
-   hashmap_init(_lines,
-(hashmap_cmp_fn)moved_entry_cmp, o, 0);
-   hashmap_init(_lines,
-(hashmap_cmp_fn)moved_entry_cmp, o, 0);
+   hashmap_init(_lines, moved_entry_cmp, o, 0);
+   hashmap_init(_lines, moved_entry_cmp, o, 0);
 
add_lines_to_move_detection(o, _lines, _lines);
mark_color_as_moved(o, _lines, _lines);
-- 
2.18.0.203.gfac676dfb9-goog



[PATCH 6/9] diff.c: add a blocks mode for moved code detection

2018-07-16 Thread Stefan Beller
The new "blocks" mode provides a middle ground between plain and zebra.
It is as intuitive (few colors) as plain, but still has the requirement
for a minimum of lines/characters to count a block as moved.

Suggested-by: Ævar Arnfjörð Bjarmason 
 (https://public-inbox.org/git/87o9j0uljo@evledraar.gmail.com/)
Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 Documentation/diff-options.txt |  8 --
 diff.c |  6 +++--
 diff.h |  5 ++--
 t/t4015-diff-whitespace.sh | 49 --
 4 files changed, 60 insertions(+), 8 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index e3a44f03cdc..ba56169de31 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -276,10 +276,14 @@ plain::
that are added somewhere else in the diff. This mode picks up any
moved line, but it is not very useful in a review to determine
if a block of code was moved without permutation.
-zebra::
+blocks::
Blocks of moved text of at least 20 alphanumeric characters
are detected greedily. The detected blocks are
-   painted using either the 'color.diff.{old,new}Moved' color or
+   painted using either the 'color.diff.{old,new}Moved' color.
+   Adjacent blocks cannot be told apart.
+zebra::
+   Blocks of moved text are detected as in 'blocks' mode. The blocks
+   are painted using either the 'color.diff.{old,new}Moved' color or
'color.diff.{old,new}MovedAlternative'. The change between
the two colors indicates that a new block was detected.
 dimmed_zebra::
diff --git a/diff.c b/diff.c
index d1bae900cdc..95c51c0b7df 100644
--- a/diff.c
+++ b/diff.c
@@ -271,6 +271,8 @@ static int parse_color_moved(const char *arg)
return COLOR_MOVED_NO;
else if (!strcmp(arg, "plain"))
return COLOR_MOVED_PLAIN;
+   else if (!strcmp(arg, "blocks"))
+   return COLOR_MOVED_BLOCKS;
else if (!strcmp(arg, "zebra"))
return COLOR_MOVED_ZEBRA;
else if (!strcmp(arg, "default"))
@@ -278,7 +280,7 @@ static int parse_color_moved(const char *arg)
else if (!strcmp(arg, "dimmed_zebra"))
return COLOR_MOVED_ZEBRA_DIM;
else
-   return error(_("color moved setting must be one of 'no', 
'default', 'zebra', 'dimmed_zebra', 'plain'"));
+   return error(_("color moved setting must be one of 'no', 
'default', 'blocks', 'zebra', 'dimmed_zebra', 'plain'"));
 }
 
 int git_diff_ui_config(const char *var, const char *value, void *cb)
@@ -903,7 +905,7 @@ static void mark_color_as_moved(struct diff_options *o,
 
block_length++;
 
-   if (flipped_block)
+   if (flipped_block && o->color_moved != COLOR_MOVED_BLOCKS)
l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
}
adjust_last_block(o, n, block_length);
diff --git a/diff.h b/diff.h
index d29560f822c..7bd4f182c33 100644
--- a/diff.h
+++ b/diff.h
@@ -208,8 +208,9 @@ struct diff_options {
enum {
COLOR_MOVED_NO = 0,
COLOR_MOVED_PLAIN = 1,
-   COLOR_MOVED_ZEBRA = 2,
-   COLOR_MOVED_ZEBRA_DIM = 3,
+   COLOR_MOVED_BLOCKS = 2,
+   COLOR_MOVED_ZEBRA = 3,
+   COLOR_MOVED_ZEBRA_DIM = 4,
} color_moved;
#define COLOR_MOVED_DEFAULT COLOR_MOVED_ZEBRA
#define COLOR_MOVED_MIN_ALNUM_COUNT 20
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index ddbc3901385..e54529f026d 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1223,7 +1223,7 @@ test_expect_success 'plain moved code, inside file' '
test_cmp expected actual
 '
 
-test_expect_success 'detect permutations inside moved code -- dimmed_zebra' '
+test_expect_success 'detect blocks of moved code' '
git reset --hard &&
cat <<-\EOF >lines.txt &&
long line 1
@@ -1271,6 +1271,50 @@ test_expect_success 'detect permutations inside moved 
code -- dimmed_zebra' '
test_config color.diff.newMovedDimmed "normal cyan" &&
test_config color.diff.oldMovedAlternativeDimmed "normal blue" &&
test_config color.diff.newMovedAlternativeDimmed "normal yellow" &&
+   git diff HEAD --no-renames --color-moved=blocks --color >actual.raw &&
+   grep -v "index" actual.raw | test_decode_color >actual &&
+   cat <<-\EOF >expected &&
+   diff --git a/lines.txt b/lines.txt
+   --- a/lines.txt
+   +++ b/lines.txt
+   @@ -1,16 +1,16 @@
+   -long line 1
+   -long line 2
+   -long line 3
+line 4
+line 5
+line 6
+line 7
+line 8
+line 9
+   +long line 1
+   +long line 2
+   +long line 3
+   +long line 14
+   +long line 15
+   +long line 16
+line 10
+  

[PATCH 8/9] diff.c: factor advance_or_nullify out of mark_color_as_moved

2018-07-16 Thread Stefan Beller
This moves the part of code that checks if we're still in a block
into its own function.  We'll need a different approach on advancing
the blocks in a later patch, so having it as a separate function will
prove useful.

While at it rename the variable `p` to `prev` to indicate that it refers
to the previous line. This is as pmb[i] was assigned in the last iteration
of the outmost for loop.

Further rename `pnext` to `cur` to indicate that this should match up with
the current line of the outmost for loop.

Also replace the advancement of pmb[i] to reuse `cur` instead of
using `p->next` (which is how the name for pnext could be explained.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 diff.c | 32 
 1 file changed, 20 insertions(+), 12 deletions(-)

diff --git a/diff.c b/diff.c
index 70eeb40c5fd..4963819e530 100644
--- a/diff.c
+++ b/diff.c
@@ -801,6 +801,25 @@ static void add_lines_to_move_detection(struct 
diff_options *o,
}
 }
 
+static void pmb_advance_or_null(struct diff_options *o,
+   struct moved_entry *match,
+   struct hashmap *hm,
+   struct moved_entry **pmb,
+   int pmb_nr)
+{
+   int i;
+   for (i = 0; i < pmb_nr; i++) {
+   struct moved_entry *prev = pmb[i];
+   struct moved_entry *cur = (prev && prev->next_line) ?
+   prev->next_line : NULL;
+   if (cur && !hm->cmpfn(o, cur, match, NULL)) {
+   pmb[i] = cur;
+   } else {
+   pmb[i] = NULL;
+   }
+   }
+}
+
 static int shrink_potential_moved_blocks(struct moved_entry **pmb,
 int pmb_nr)
 {
@@ -875,7 +894,6 @@ static void mark_color_as_moved(struct diff_options *o,
struct moved_entry *key;
struct moved_entry *match = NULL;
struct emitted_diff_symbol *l = >emitted_symbols->buf[n];
-   int i;
 
switch (l->s) {
case DIFF_SYMBOL_PLUS:
@@ -906,17 +924,7 @@ static void mark_color_as_moved(struct diff_options *o,
if (o->color_moved == COLOR_MOVED_PLAIN)
continue;
 
-   /* Check any potential block runs, advance each or nullify */
-   for (i = 0; i < pmb_nr; i++) {
-   struct moved_entry *p = pmb[i];
-   struct moved_entry *pnext = (p && p->next_line) ?
-   p->next_line : NULL;
-   if (pnext && !hm->cmpfn(o, pnext, match, NULL)) {
-   pmb[i] = p->next_line;
-   } else {
-   pmb[i] = NULL;
-   }
-   }
+   pmb_advance_or_null(o, match, hm, pmb, pmb_nr);
 
pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr);
 
-- 
2.18.0.203.gfac676dfb9-goog



[PATCH 9/9] diff.c: add white space mode to move detection that allows indent changes

2018-07-16 Thread Stefan Beller
The option of --color-moved has proven to be useful as observed on the
mailing list. However when refactoring sometimes the indentation changes,
for example when partitioning a functions into smaller helper functions
the code usually mostly moved around except for a decrease in indentation.

To just review the moved code ignoring the change in indentation, a mode
to ignore spaces in the move detection as implemented in a previous patch
would be enough.  However the whole move coloring as motivated in commit
2e2d5ac (diff.c: color moved lines differently, 2017-06-30), brought
up the notion of the reviewer being able to trust the move of a "block".

As there are languages such as python, which depend on proper relative
indentation for the control flow of the program, ignoring any white space
change in a block would not uphold the promises of 2e2d5ac that allows
reviewers to pay less attention to the inside of a block, as inside
the reviewer wants to assume the same program flow.

This new mode of white space ignorance will take this into account and will
only allow the same white space changes per line in each block. This patch
even allows only for the same change at the beginning of the lines.

As this is a white space mode, it is made exclusive to other white space
modes in the move detection.

This patch brings some challenges, related to the detection of blocks.
We need a white net the catch the possible moved lines, but then need to
narrow down to check if the blocks are still in tact. Consider this
example (ignoring block sizes):

 - A
 - B
 - C
 +A
 +B
 +C

At the beginning of a block when checking if there is a counterpart
for A, we have to ignore all space changes. However at the following
lines we have to check if the indent change stayed the same.

Checking if the indentation change did stay the same, is done by computing
the indentation change by the difference in line length, and then assume
the change is only in the beginning of the longer line, the common tail
is the same. That is why the test contains lines like:

 -  A
 ...
 + A 
 ...

As the first line starting a block is caught using a compare function that
ignores white spaces unlike the rest of the block, where the white space
delta is taken into account for the comparison, we also have to think about
the following situation:

 - A
 - B
 -   A
 -   B
 +A
 +B
 +  A
 +  B

When checking if the first A (both in the + and - lines) is a start of
a block, we have to check all 'A' and record all the white space deltas
such that we can find the example above to be just one block that is
indented.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 Documentation/diff-options.txt |   5 ++
 diff.c | 158 -
 diff.h |   3 +
 t/t4015-diff-whitespace.sh |  88 ++
 4 files changed, 252 insertions(+), 2 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 80e29e39854..143acd9417e 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -307,6 +307,11 @@ ignore-space-change::
 ignore-all-space::
Ignore whitespace when comparing lines. This ignores differences
even if one line has whitespace where the other line has none.
+allow-indentation-change::
+   Initially ignore any white spaces in the move detection, then
+   group the moved code blocks only into a block if the change in
+   whitespace is the same per line. This is incompatible with the
+   other modes.
 --
 
 --word-diff[=]::
diff --git a/diff.c b/diff.c
index 4963819e530..f51f0ac32f4 100644
--- a/diff.c
+++ b/diff.c
@@ -302,12 +302,18 @@ static int parse_color_moved_ws(const char *arg)
ret |= XDF_IGNORE_WHITESPACE_AT_EOL;
else if (!strcmp(sb.buf, "ignore-all-space"))
ret |= XDF_IGNORE_WHITESPACE;
+   else if (!strcmp(sb.buf, "allow-indentation-change"))
+   ret |= COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE;
else
error(_("ignoring unknown color-moved-ws mode '%s'"), 
sb.buf);
 
strbuf_release();
}
 
+   if ((ret & COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) &&
+   (ret & XDF_WHITESPACE_FLAGS))
+   die(_("color-moved-ws: allow-indentation-change cannot be 
combined with other white space modes"));
+
string_list_clear(, 0);
 
return ret;
@@ -737,7 +743,91 @@ struct moved_entry {
struct hashmap_entry ent;
const struct emitted_diff_symbol *es;
struct moved_entry *next_line;
+   struct ws_delta *wsd;
+};
+
+/**
+ * The struct ws_delta holds white space differences between moved lines, i.e.
+ * between '+' and '-' lines that have been detected to be a move.
+ * The string contains the difference in leading white spaces, before the
+ * rest of 

[PATCH 7/9] diff.c: decouple white space treatment from move detection algorithm

2018-07-16 Thread Stefan Beller
In the original implementation of the move detection logic the choice for
ignoring white space changes is the same for the move detection as it is
for the regular diff.  Some cases came up where different treatment would
have been nice.

Allow the user to specify that white space should be ignored differently
during detection of moved lines than during generation of added and removed
lines. This is done by providing analogs to the --ignore-space-at-eol,
-b, and -w options by introducing the option --color-moved-ws=
with the modes named "ignore-space-at-eol", "ignore-space-change" and
"ignore-all-space", which is used only during the move detection phase.

As we change the default, we'll adjust the tests.

For now we do not infer any options to treat white spaces in the move
detection from the generic white space options given to diff.
This can be tuned later to reasonable default.

As we plan on adding more white space related options in a later patch,
that interferes with the current white space options, use a flag field
and clamp it down to  XDF_WHITESPACE_FLAGS, as that (a) allows to easily
check at parse time if we give invalid combinations and (b) can reuse
parts of this patch.

By having the white space treatment in its own option, we'll also
make it easier for a later patch to have an config option for
spaces in the move detection.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 Documentation/diff-options.txt | 17 ++
 diff.c | 39 +++--
 diff.h |  1 +
 t/t4015-diff-whitespace.sh | 62 +++---
 4 files changed, 113 insertions(+), 6 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index ba56169de31..80e29e39854 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -292,6 +292,23 @@ dimmed_zebra::
blocks are considered interesting, the rest is uninteresting.
 --
 
+--color-moved-ws=::
+   This configures how white spaces are ignored when performing the
+   move detection for `--color-moved`. These modes can be given
+   as a comma separated list:
++
+--
+ignore-space-at-eol::
+   Ignore changes in whitespace at EOL.
+ignore-space-change::
+   Ignore changes in amount of whitespace.  This ignores whitespace
+   at line end, and considers all other sequences of one or
+   more whitespace characters to be equivalent.
+ignore-all-space::
+   Ignore whitespace when comparing lines. This ignores differences
+   even if one line has whitespace where the other line has none.
+--
+
 --word-diff[=]::
Show a word diff, using the  to delimit changed words.
By default, words are delimited by whitespace; see
diff --git a/diff.c b/diff.c
index 95c51c0b7df..70eeb40c5fd 100644
--- a/diff.c
+++ b/diff.c
@@ -283,6 +283,36 @@ static int parse_color_moved(const char *arg)
return error(_("color moved setting must be one of 'no', 
'default', 'blocks', 'zebra', 'dimmed_zebra', 'plain'"));
 }
 
+static int parse_color_moved_ws(const char *arg)
+{
+   int ret = 0;
+   struct string_list l = STRING_LIST_INIT_DUP;
+   struct string_list_item *i;
+
+   string_list_split(, arg, ',', -1);
+
+   for_each_string_list_item(i, ) {
+   struct strbuf sb = STRBUF_INIT;
+   strbuf_addstr(, i->string);
+   strbuf_trim();
+
+   if (!strcmp(sb.buf, "ignore-space-change"))
+   ret |= XDF_IGNORE_WHITESPACE_CHANGE;
+   else if (!strcmp(sb.buf, "ignore-space-at-eol"))
+   ret |= XDF_IGNORE_WHITESPACE_AT_EOL;
+   else if (!strcmp(sb.buf, "ignore-all-space"))
+   ret |= XDF_IGNORE_WHITESPACE;
+   else
+   error(_("ignoring unknown color-moved-ws mode '%s'"), 
sb.buf);
+
+   strbuf_release();
+   }
+
+   string_list_clear(, 0);
+
+   return ret;
+}
+
 int git_diff_ui_config(const char *var, const char *value, void *cb)
 {
if (!strcmp(var, "diff.color") || !strcmp(var, "color.diff")) {
@@ -717,10 +747,12 @@ static int moved_entry_cmp(const void 
*hashmap_cmp_fn_data,
const struct diff_options *diffopt = hashmap_cmp_fn_data;
const struct moved_entry *a = entry;
const struct moved_entry *b = entry_or_key;
+   unsigned flags = diffopt->color_moved_ws_handling
+& XDF_WHITESPACE_FLAGS;
 
return !xdiff_compare_lines(a->es->line, a->es->len,
b->es->line, b->es->len,
-   diffopt->xdl_opts);
+   flags);
 }
 
 static struct moved_entry *prepare_entry(struct diff_options *o,
@@ -728,8 +760,9 @@ static struct moved_entry *prepare_entry(struct 
diff_options *o,
 {
struct moved_entry *ret = xmalloc(sizeof(*ret));

[PATCHv5 0/9] Reroll of sb/diff-color-move-more

2018-07-16 Thread Stefan Beller
This is a resend of sb/diff-color-move-more
https://public-inbox.org/git/20180629001958.85143-1-sbel...@google.com/
that fixes an errornous squashing within the series; the end result is
the same. range diff is below. (As the latest cooking email said
this series is going to land in next soon, I hope this is not too late
of a resend; otherwise just ignore it as the end result is the same)

Thanks,
Stefan

Stefan Beller (9):
  xdiff/xdiff.h: remove unused flags
  xdiff/xdiffi.c: remove unneeded function declarations
  t4015: avoid git as a pipe input
  diff.c: do not pass diff options as keydata to hashmap
  diff.c: adjust hash function signature to match hashmap expectation
  diff.c: add a blocks mode for moved code detection
  diff.c: decouple white space treatment from move detection algorithm
  diff.c: factor advance_or_nullify out of mark_color_as_moved
  diff.c: add white space mode to move detection that allows indent
changes

 Documentation/diff-options.txt |  30 +++-
 diff.c | 253 +
 diff.h |   9 +-
 t/t4015-diff-whitespace.sh | 243 ++-
 xdiff/xdiff.h  |   8 --
 xdiff/xdiffi.c |  17 ---
 6 files changed, 472 insertions(+), 88 deletions(-)

-- 
2.18.0.203.gfac676dfb9-goog

1:  7199e9b5608 ! 1:  7d58ad461cb diff.c: decouple white space treatment from 
move detection algorithm
@@ -207,9 +207,8 @@
 +  EOF
 +
 +  # Make sure we get a different diff using -w
-+  git diff --color --color-moved -w |
-+  grep -v "index" |
-+  test_decode_color >actual &&
++  git diff --color --color-moved -w >actual.raw &&
++  grep -v "index" actual.raw | test_decode_color >actual &&
 +  q_to_tab <<-\EOF >expected &&
 +  diff --git a/text.txt b/text.txt
 +  --- a/text.txt
@@ -224,9 +223,8 @@
 +
 +  # And now ignoring white space only in the move detection
 +  git diff --color --color-moved \
-+  
--color-moved-ws=ignore-all-space,ignore-space-change,ignore-space-at-eol |
-+  grep -v "index" |
-+  test_decode_color >actual &&
++  
--color-moved-ws=ignore-all-space,ignore-space-change,ignore-space-at-eol 
>actual.raw &&
++  grep -v "index" actual.raw | test_decode_color >actual &&
 +  q_to_tab <<-\EOF >expected &&
 +  diff --git a/text.txt b/text.txt
 +  --- a/text.txt
2:  5626d523b70 = 2:  f08353f2a02 diff.c: factor advance_or_nullify out of 
mark_color_as_moved
3:  e2f1e573699 ! 3:  3fde7cf2194 diff.c: add white space mode to move 
detection that allows indent changes
@@ -339,30 +339,6 @@
 diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
 --- a/t/t4015-diff-whitespace.sh
 +++ b/t/t4015-diff-whitespace.sh
-@@
-   EOF
- 
-   # Make sure we get a different diff using -w
--  git diff --color --color-moved -w |
--  grep -v "index" |
--  test_decode_color >actual &&
-+  git diff --color --color-moved -w >actual.raw &&
-+  grep -v "index" actual.raw | test_decode_color >actual &&
-   q_to_tab <<-\EOF >expected &&
-   diff --git a/text.txt b/text.txt
-   --- a/text.txt
-@@
- 
-   # And now ignoring white space only in the move detection
-   git diff --color --color-moved \
--  
--color-moved-ws=ignore-all-space,ignore-space-change,ignore-space-at-eol |
--  grep -v "index" |
--  test_decode_color >actual &&
-+  
--color-moved-ws=ignore-all-space,ignore-space-change,ignore-space-at-eol 
>actual.raw &&
-+  grep -v "index" actual.raw | test_decode_color >actual &&
-   q_to_tab <<-\EOF >expected &&
-   diff --git a/text.txt b/text.txt
-   --- a/text.txt
 @@
test_cmp expected actual
  '


[PATCH 2/9] xdiff/xdiffi.c: remove unneeded function declarations

2018-07-16 Thread Stefan Beller
There is no need to forward-declare these functions, as they are used
after their implementation only.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 xdiff/xdiffi.c | 17 -
 1 file changed, 17 deletions(-)

diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 0de1ef463bf..3e8aff92bc4 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -22,34 +22,17 @@
 
 #include "xinclude.h"
 
-
-
 #define XDL_MAX_COST_MIN 256
 #define XDL_HEUR_MIN_COST 256
 #define XDL_LINE_MAX (long)((1UL << (CHAR_BIT * sizeof(long) - 1)) - 1)
 #define XDL_SNAKE_CNT 20
 #define XDL_K_HEUR 4
 
-
-
 typedef struct s_xdpsplit {
long i1, i2;
int min_lo, min_hi;
 } xdpsplit_t;
 
-
-
-
-static long xdl_split(unsigned long const *ha1, long off1, long lim1,
- unsigned long const *ha2, long off2, long lim2,
- long *kvdf, long *kvdb, int need_min, xdpsplit_t *spl,
- xdalgoenv_t *xenv);
-static xdchange_t *xdl_add_change(xdchange_t *xscr, long i1, long i2, long 
chg1, long chg2);
-
-
-
-
-
 /*
  * See "An O(ND) Difference Algorithm and its Variations", by Eugene Myers.
  * Basically considers a "box" (off1, off2, lim1, lim2) and scan from both
-- 
2.18.0.203.gfac676dfb9-goog



Re: [PATCH 11/25] t7400: fix broken "submodule add/reconfigure --force" test

2018-07-16 Thread Eric Sunshine
On Mon, Jul 16, 2018 at 02:37:32PM -0700, Junio C Hamano wrote:
> Eric Sunshine  writes:
> > On Mon, Jul 16, 2018 at 11:51 AM Johannes Schindelin
> >  wrote:
> >> On Mon, 16 Jul 2018, Johannes Schindelin wrote:
> >> > > -   git submodule add --force bogus-url submod &&
> >> > > +   git submodule add --force /bogus-url submod &&
> >> > This breaks on Windows because of the absolute Unix-y path having to be
> >> > translated to a Windows path:
> >> > I could imagine that using "$(pwd)/bogus-url" (which will generate a
> >> > Windows-y absolute path on Windows) would work, though.
> >>
> >> Yes, this works indeed, see the patch below. Could you please squash it in
> >> if you send another iteration of your patch series? Junio, could you
> >> please add this as a SQUASH??? commit so that `pu` is fixed on Windows?
> >
> > So, this SQUASH looks like the correct way forward. Hopefully, Junio
> > can just squash it so I don't have to flood the list again with this
> > long series.
> 
> The topic already has another dependent topic so rerolling or
> squashing would be a bit cumbersome.  I'll see what I could do but
> it may not be until tomorrow's pushout.

No problem. Here's Dscho's fix in the form of a proper patch if
that makes it easier for you (it just needs his sign-off):

--- >8 ---
From: Johannes Schindelin 
Subject: [PATCH] t7400: make "submodule add/reconfigure --force" work on
 Windows

On Windows, the "Unixy" path /bogus-url gets translated automatically to
a Windows-style path (such as C:/git-sdk/bogus_url). This is normally
not problem, since it's still a valid and usable path in that form,
however, this test wants to do a comparison against the original path.
$(pwd) was invented exactly for this case, so use it to make the path
comparison work.

[es: commit message]

Signed-off-by: Eric Sunshine 
---
 t/t7400-submodule-basic.sh | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
index 76cf522a08..bfb09dd566 100755
--- a/t/t7400-submodule-basic.sh
+++ b/t/t7400-submodule-basic.sh
@@ -171,10 +171,11 @@ test_expect_success 'submodule add to .gitignored path 
with --force' '
 test_expect_success 'submodule add to reconfigure existing submodule with 
--force' '
(
cd addtest-ignore &&
-   git submodule add --force /bogus-url submod &&
+   bogus_url="$(pwd)/bogus-url" &&
+   git submodule add --force "$bogus_url" submod &&
git submodule add --force -b initial "$submodurl" submod-branch 
&&
-   test "/bogus-url" = "$(git config -f .gitmodules 
submodule.submod.url)" &&
-   test "/bogus-url" = "$(git config submodule.submod.url)" &&
+   test "$bogus_url" = "$(git config -f .gitmodules 
submodule.submod.url)" &&
+   test "$bogus_url" = "$(git config submodule.submod.url)" &&
# Restore the url
git submodule add --force "$submodurl" submod &&
test "$submodurl" = "$(git config -f .gitmodules 
submodule.submod.url)" &&
-- 
2.18.0.233.g985f88cf7e


[PATCH 1/9] xdiff/xdiff.h: remove unused flags

2018-07-16 Thread Stefan Beller
These flags were there since the beginning (3443546f6e (Use a *real*
built-in diff generator, 2006-03-24), but were never used. Remove them.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 xdiff/xdiff.h | 8 
 1 file changed, 8 deletions(-)

diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index c1937a29112..2356da5f784 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -52,14 +52,6 @@ extern "C" {
 #define XDL_EMIT_FUNCNAMES (1 << 0)
 #define XDL_EMIT_FUNCCONTEXT (1 << 2)
 
-#define XDL_MMB_READONLY (1 << 0)
-
-#define XDL_MMF_ATOMIC (1 << 0)
-
-#define XDL_BDOP_INS 1
-#define XDL_BDOP_CPY 2
-#define XDL_BDOP_INSB 3
-
 /* merge simplification levels */
 #define XDL_MERGE_MINIMAL 0
 #define XDL_MERGE_EAGER 1
-- 
2.18.0.203.gfac676dfb9-goog



[PATCH 3/9] t4015: avoid git as a pipe input

2018-07-16 Thread Stefan Beller
In t4015 we have a pattern of

git diff [] |
grep -v "index" |
test_decode_color >actual &&

to produce output that we want to test against. This pattern was introduced
in 86b452e2769 (diff.c: add dimming to moved line detection, 2017-06-30)
as then the focus on getting the colors right. However the pattern used
is not best practice as we do care about the exit code of Git. So let's
not have Git as the upstream of a pipe. Piping the output of grep to
some function is fine as we assume grep to be un-flawed in our test suite.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 t/t4015-diff-whitespace.sh | 50 +++---
 1 file changed, 20 insertions(+), 30 deletions(-)

diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 17df491a3ab..ddbc3901385 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1271,9 +1271,8 @@ test_expect_success 'detect permutations inside moved 
code -- dimmed_zebra' '
test_config color.diff.newMovedDimmed "normal cyan" &&
test_config color.diff.oldMovedAlternativeDimmed "normal blue" &&
test_config color.diff.newMovedAlternativeDimmed "normal yellow" &&
-   git diff HEAD --no-renames --color-moved=dimmed_zebra --color |
-   grep -v "index" |
-   test_decode_color >actual &&
+   git diff HEAD --no-renames --color-moved=dimmed_zebra --color 
>actual.raw &&
+   grep -v "index" actual.raw | test_decode_color >actual &&
cat <<-\EOF >expected &&
diff --git a/lines.txt b/lines.txt
--- a/lines.txt
@@ -1315,9 +1314,8 @@ test_expect_success 'cmd option assumes configured 
colored-moved' '
test_config color.diff.oldMovedAlternativeDimmed "normal blue" &&
test_config color.diff.newMovedAlternativeDimmed "normal yellow" &&
test_config diff.colorMoved zebra &&
-   git diff HEAD --no-renames --color-moved --color |
-   grep -v "index" |
-   test_decode_color >actual &&
+   git diff HEAD --no-renames --color-moved --color >actual.raw &&
+   grep -v "index" actual.raw | test_decode_color >actual &&
cat <<-\EOF >expected &&
diff --git a/lines.txt b/lines.txt
--- a/lines.txt
@@ -1395,9 +1393,8 @@ test_expect_success 'move detection ignoring whitespace ' 
'
line 4
line 5
EOF
-   git diff HEAD --no-renames --color-moved --color |
-   grep -v "index" |
-   test_decode_color >actual &&
+   git diff HEAD --no-renames --color-moved --color >actual.raw &&
+   grep -v "index" actual.raw | test_decode_color >actual &&
cat <<-\EOF >expected &&
diff --git a/lines.txt b/lines.txt
--- a/lines.txt
@@ -1419,9 +1416,8 @@ test_expect_success 'move detection ignoring whitespace ' 
'
EOF
test_cmp expected actual &&
 
-   git diff HEAD --no-renames -w --color-moved --color |
-   grep -v "index" |
-   test_decode_color >actual &&
+   git diff HEAD --no-renames -w --color-moved --color >actual.raw &&
+   grep -v "index" actual.raw | test_decode_color >actual &&
cat <<-\EOF >expected &&
diff --git a/lines.txt b/lines.txt
--- a/lines.txt
@@ -1459,9 +1455,8 @@ test_expect_success 'move detection ignoring whitespace 
changes' '
line 5
EOF
 
-   git diff HEAD --no-renames --color-moved --color |
-   grep -v "index" |
-   test_decode_color >actual &&
+   git diff HEAD --no-renames --color-moved --color >actual.raw &&
+   grep -v "index" actual.raw | test_decode_color >actual &&
cat <<-\EOF >expected &&
diff --git a/lines.txt b/lines.txt
--- a/lines.txt
@@ -1483,9 +1478,8 @@ test_expect_success 'move detection ignoring whitespace 
changes' '
EOF
test_cmp expected actual &&
 
-   git diff HEAD --no-renames -b --color-moved --color |
-   grep -v "index" |
-   test_decode_color >actual &&
+   git diff HEAD --no-renames -b --color-moved --color >actual.raw &&
+   grep -v "index" actual.raw | test_decode_color >actual &&
cat <<-\EOF >expected &&
diff --git a/lines.txt b/lines.txt
--- a/lines.txt
@@ -1526,9 +1520,8 @@ test_expect_success 'move detection ignoring whitespace 
at eol' '
# avoid cluttering the output with complaints about our eol whitespace
test_config core.whitespace -blank-at-eol &&
 
-   git diff HEAD --no-renames --color-moved --color |
-   grep -v "index" |
-   test_decode_color >actual &&
+   git diff HEAD --no-renames --color-moved --color >actual.raw &&
+   grep -v "index" actual.raw | test_decode_color >actual &&
cat <<-\EOF >expected &&
diff --git a/lines.txt b/lines.txt
--- a/lines.txt
@@ -1550,9 +1543,8 @@ test_expect_success 'move detection ignoring 

[PATCH 4/9] diff.c: do not pass diff options as keydata to hashmap

2018-07-16 Thread Stefan Beller
When we initialize the hashmap, we give it a pointer to the
diff_options, which it then passes along to each call of the
hashmap_cmp_fn function. There's no need to pass it a second time as
the "keydata" parameter, and our comparison functions never look at
keydata.

This was a mistake left over from an earlier round of 2e2d5ac184
(diff.c: color moved lines differently, 2017-06-30), before hashmap
learned to pass the data pointer for us.

Explanation-by: Jeff King 
Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 diff.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 1289df4b1f9..ce7bedc1b92 100644
--- a/diff.c
+++ b/diff.c
@@ -842,13 +842,13 @@ static void mark_color_as_moved(struct diff_options *o,
case DIFF_SYMBOL_PLUS:
hm = del_lines;
key = prepare_entry(o, n);
-   match = hashmap_get(hm, key, o);
+   match = hashmap_get(hm, key, NULL);
free(key);
break;
case DIFF_SYMBOL_MINUS:
hm = add_lines;
key = prepare_entry(o, n);
-   match = hashmap_get(hm, key, o);
+   match = hashmap_get(hm, key, NULL);
free(key);
break;
default:
-- 
2.18.0.203.gfac676dfb9-goog



Re: [PATCH] negotiator/skipping: skip commits during fetch

2018-07-16 Thread Junio C Hamano
Jonathan Tan  writes:

> Introduce a new negotiation algorithm used during fetch that skips
> commits in an effort to find common ancestors faster. The skips grow
> similarly to the Fibonacci sequence as the commit walk proceeds further
> away from the tips. The skips may cause unnecessary commits to be
> included in the packfile, but the negotiation step typically ends more
> quickly.
>
> Usage of this algorithm is guarded behind the configuration flag
> fetch.negotiationAlgorithm.
>
> Signed-off-by: Jonathan Tan 
> ---
> This is on jt/fetch-pack-negotiator, but also applies cleanly on
> jt/fetch-nego-tip.

Sounds sensible.

Unfortunately, this one is among many others that get hurt by
needless semantic conflicts caused by reusing the old function name
and changing the function signature to pass the_repository thru some
codepaths, without adding transition macros.  I am running out of
time today as I need some post-office-move clean-ups before getting
organized enough, so I expect I won't be able to clean it up and
push it out on 'pu' by the end of day today.

Thanks.



Re: [PATCH 08/16] test-reach: create new test tool for ref_newer

2018-07-16 Thread Jonathan Tan
> To use the new test-tool, use 'test-tool reach ' and provide
> input to stdin that describes the inputs to the method. Currently, we
> only implement the ref_newer method, which requires two commits. Use
> lines "A:" and "B:" for the two inputs. We will
> expand this input later to accommodate methods that take lists of
> commits.

It would be nice if "A" and "B" were "ancestor" and "descendant" (or
something like that) instead, so that I don't have to check which
direction the reach is calculated in.

> +int cmd__reach(int ac, const char **av)
> +{
> + struct object_id oid_A, oid_B;
> + struct strbuf buf = STRBUF_INIT;
> + struct repository *r = the_repository;
> +
> + setup_git_directory();
> +
> + if (ac < 2)
> + exit(1);
> +
> +
> + while (strbuf_getline(, stdin) != EOF) {
> + struct object_id oid;
> + struct object *o;
> + struct commit *c;
> + if (buf.len < 3)
> + continue;
> +
> + if (get_oid_committish(buf.buf + 2, ))
> + die("failed to resolve %s", buf.buf + 2);

You can also use skip_prefix() instead of using arithmetic to determine
the start of the OID.

> +# Construct a grid-like commit graph with points (x,y)
> +# with 1 <= x <= 10, 1 <= y <= 10, where (x,y) has
> +# parents (x-1, y) and (x, y-1), keeping in mind that
> +# we drop a parent if a coordinate is nonpositive.
> +#
> +# (10,10)
> +#/   \
> +# (10,9)(9,10)
> +#/ \   /  \
> +#(10,8)(9,9)  (8,10)
> +#   / \/   \  /\
> +# ( continued...)
> +#   \ /\   /  \/
> +#(3,1) (2,2)  (1,3)
> +#\ /\ /
> +# (2,1)  (2,1)
> +#  \/
> +#  (1,1)

This is quite a good design, thanks.

> +# We use branch 'comit-x-y' to refer to (x,y).

s/comit/commit/

> + git show-ref -s commit-7-7 | git commit-graph write --stdin-commits &&
> + mv .git/objects/info/commit-graph commit-graph-half &&

My understanding is that this writes for 7-7 and all its ancestors,
but...

> +test_expect_success 'ref_newer:hit' '
> + cat >input <<-\EOF &&
> + A:commit-5-7
> + B:commit-2-3
> + EOF
> + printf "ref_newer:1\n" >expect &&
> + test_three_modes ref_newer
> +'
> +
> +test_done

...both 5-7 and 2-3 are ancestors of 7-7, right? Which means that you
don't test the "half" commit graph here. (It's probably sufficient to
just adjust the numbers.)


Re: [PATCH] gc: do not warn about too many loose objects

2018-07-16 Thread Jonathan Nieder
Jeff King wrote:
> On Mon, Jul 16, 2018 at 03:03:06PM -0700, Jonathan Nieder wrote:

>> Oh, good point.  In non-daemon mode, we don't let "gc --auto" failure
>> cause the invoking command to fail, but in daemon mode we do.  That
>> should be a straightforward fix; patch coming in a moment.
>
> OK, that definitely sounds like a bug. I'm still confused how that could
> happen, though, since from the caller's perspective they ignore git-gc's
> exit code either way. I guess I'll see in your patch. :)

Alas, I just misremembered.  What I was remembering is that gc --auto
does

if (auto_gc && too_many_loose_objects())
warning(_("There are too many unreachable loose objects; "
"run 'git prune' to remove them."));
return 0;

which means that too_many_loose_objects is not an error in undaemonized
mode, while it is in daemonized mode.  But we've already discussed that.

The calling command in the motivating example is Android's "repo" tool:

bare_git.gc('--auto')

from https://gerrit-review.googlesource.com/c/git-repo/+/10598/.  I
think it's reasonable that it expects a status code of 0 in the normal
case.  So life is less simple than I hoped.

[...]
>> Can you point me to some discussion about building that rate-limiting?
>> The commit message for v2.12.2~17^2 (gc: ignore old gc.log files,
>> 2017-02-10) definitely doesn't describe that as its intent.
>
> I think that commit is a loosening of the rate-limiting (because we'd
> refuse to progress for something that was actually time-based). But the
> original stopping comes from this discussion, I think:
>
>   https://public-inbox.org/git/xmqqlhijznpm@gitster.dls.corp.google.com/

Interesting!  It looks like that thread anticipated the problems we've
seen here.  Three years without having to have fixed it is a good run,
I suppose.

The discussion of stopping there appears to be primarily about
stopping in the error case, not rate-limiting in the success or
warning case.

Here's a patch for the 'return -1' thing.

-- >8 --
Subject: gc: exit with status 128 on failure

A value of -1 returned from cmd_gc gets propagated to exit(),
resulting in an exit status of 255.  Use die instead for a clearer
error message and a controlled exit.

Signed-off-by: Jonathan Nieder 
---
 builtin/gc.c  | 36 
 t/t6500-gc.sh |  2 +-
 2 files changed, 17 insertions(+), 21 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index ccfb1ceaeb..2bebc52bda 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -438,10 +438,10 @@ static const char *lock_repo_for_gc(int force, pid_t* 
ret_pid)
return NULL;
 }
 
-static int report_last_gc_error(void)
+static void report_last_gc_error(void)
 {
struct strbuf sb = STRBUF_INIT;
-   int ret = 0;
+   ssize_t ret;
struct stat st;
char *gc_log_path = git_pathdup("gc.log");
 
@@ -449,16 +449,17 @@ static int report_last_gc_error(void)
if (errno == ENOENT)
goto done;
 
-   ret = error_errno(_("Can't stat %s"), gc_log_path);
-   goto done;
+   die_errno(_("cannot stat '%s'"), gc_log_path);
}
 
if (st.st_mtime < gc_log_expire_time)
goto done;
 
ret = strbuf_read_file(, gc_log_path, 0);
+   if (ret < 0)
+   die_errno(_("cannot read '%s'"), gc_log_path);
if (ret > 0)
-   ret = error(_("The last gc run reported the following. "
+   die(_("The last gc run reported the following. "
   "Please correct the root cause\n"
   "and remove %s.\n"
   "Automatic cleanup will not be performed "
@@ -468,20 +469,18 @@ static int report_last_gc_error(void)
strbuf_release();
 done:
free(gc_log_path);
-   return ret;
 }
 
-static int gc_before_repack(void)
+static void gc_before_repack(void)
 {
if (pack_refs && run_command_v_opt(pack_refs_cmd.argv, RUN_GIT_CMD))
-   return error(FAILED_RUN, pack_refs_cmd.argv[0]);
+   die(FAILED_RUN, pack_refs_cmd.argv[0]);
 
if (prune_reflogs && run_command_v_opt(reflog.argv, RUN_GIT_CMD))
-   return error(FAILED_RUN, reflog.argv[0]);
+   die(FAILED_RUN, reflog.argv[0]);
 
pack_refs = 0;
prune_reflogs = 0;
-   return 0;
 }
 
 int cmd_gc(int argc, const char **argv, const char *prefix)
@@ -562,13 +561,11 @@ int cmd_gc(int argc, const char **argv, const char 
*prefix)
fprintf(stderr, _("See \"git help gc\" for manual 
housekeeping.\n"));
}
if (detach_auto) {
-   if (report_last_gc_error())
-   return -1;
+   report_last_gc_error(); /* dies on error */
 
if (lock_repo_for_gc(force, ))

Re: [PATCH] gc: do not warn about too many loose objects

2018-07-16 Thread Jeff King
On Mon, Jul 16, 2018 at 03:07:34PM -0700, Elijah Newren wrote:

> > If we were to delete those objects, wouldn't it be exactly the same as
> > running "git prune"? Or setting your gc.pruneExpire to "now" or even "5
> > minutes"?  Or are you concerned with taking other objects along for the
> > ride that weren't part of old reflogs? I think that's a valid concern,
> 
> Yes, I was worried about taking other objects along for the ride that
> weren't part of old reflogs.
> 
> > but it's also an issue for objects which were previously referenced in
> > a reflog, but are part of another current operation.
> 
> I'm not certain what you're referring to here.

I mean that an ongoing operation could refer to a blob that just
recently became unreachable via reflog pruning. And then we would delete
it, leaving the repository corrupt.

One of the protections we have against that is that if I ask to write
blob XYZ and we find that we already have the object, Git will freshen
the mtime of the loose object or pack that contains it, protecting it
from pruning. But with your suggestion, we'd delete it immediately,
regardless of the mtime of the containing pack.

Another way to think of it is this: a reflog mentioning an object does
not mean it is the exclusive user of that object. So when we expire it,
that does not mean that it is OK to delete it immediately; there may be
other users.

> > Also, what do you do if there weren't reflogs in the repo? Or the
> > reflogs were deleted (e.g., because branch deletion drops the associated
> > reflog entirely)?
> 
> Yes, there are issues this rule won't help with, but in my experience
> it was a primary (if not sole) actual cause in practice.  (I believe I
> even said elsewhere in this thread that I knew there were unreachable
> objects for other reasons and they might also become large in number).
> At $DAYJOB we've had multiple people including myself hit the "too
> many unreachable loose objects" nasty loop issue (some of us multiple
> different times), and as far as I can tell, most (perhaps even all) of
> them would have been avoided by just "properly" deleting garbage as
> per my object-age-is-reflog-age-if-not-otherwise-referenced rule.

I agree with you that this is a frequent cause, and probably even the
primary one. But my concern is that your loosening increases the risk of
corruption for other cases.

> > I assume by "these objects" you mean ones which used to be reachable
> > from a reflog, but that reflog entry just expired.  I think you'd be
> > sacrificing some race-safety in that case.
> 
> Is that inherently any more race unsafe than 'git prune
> --expire=2.weeks.ago'?  I thought it'd be racy in the same ways, and
> thus a tradeoff folks are already accepting (at least implicitly) when
> running git-gc.  Since these objects are actually 90 days old rather
> than a mere two weeks, it actually seemed more safe to me.  But maybe
> I'm overlooking something with the pack->loose transition that makes
> it more racy?

I think it's worse in terms of race-safety because you're losing one of
the signals that users of the objects can provide to git-prune to tell
it the object is useful: updating the mtime. So yes, you think of the
objects as "90 days old", but we don't know if there are other users.
Has somebody else been accessing them in the meantime?

To be honest, I'm not sure how meaningful that distinction is in
practice. The current scheme is still racy, even if the windows are
shorter in some circumstances. But it seems like cruft packfiles are
a similar amount of work to your scheme, cover more cases, and are
slightly safer. And possibly even give us a long-term route to true
race-free pruning (via the "garbage pack" mark that Jonathan mentioned).

Assuming you buy into the idea that objects in a cruft-pack are not
hurting anything aside from a little wasted storage for up to 2 weeks
(which it sounds like you're at least partially on board with ;) ).

-Peff


Re: [PATCH 16/16] commit-reach: use can_all_from_reach

2018-07-16 Thread Stefan Beller
On Mon, Jul 16, 2018 at 6:00 AM Derrick Stolee via GitGitGadget
 wrote:
>
> From: Derrick Stolee 
>
> The is_descendant_of method previously used in_merge_bases() to check if
> the commit can reach any of the commits in the provided list. This had
> two performance problems:
>
> 1. The performance is quadratic in worst-case.
>
> 2. A single in_merge_bases() call requires walking beyond the target
>commit in order to find the full set of boundary commits that may be
>merge-bases.
>
> The can_all_from_reach method avoids this quadratic behavior and can
> limit the search beyond the target commits using generation numbers. It
> requires a small prototype adjustment to stop using commit-date as a
> cutoff, as that optimization is no longer appropriate here.
>
> Since in_merge_bases() uses paint_down_to_common(), is_descendant_of()
> naturally found cutoffs to avoid walking the entire commit graph. Since
> we want to always return the correct result, we cannot use the
> min_commit_date cutoff in can_all_from_reach. We then rely on generation
> numbers to provide the cutoff.
>
> Since not all repos will have a commit-graph file, nor will we always
> have generation numbers computed for a commit-graph file, create a new
> method, generation_numbers_enabled(), that checks for a commit-graph
> file and sees if the first commit in the file has a non-zero generation
> number. In the case that we do not have generation numbers, use the old
> logic for is_descendant_of().
>
> Performance was meausured on a copy of the Linux repository using the
> 'test-tool reach is_descendant_of' command using this input:
>
> A:v4.9
> X:v4.10
> X:v4.11
> X:v4.12
> X:v4.13
> X:v4.14
> X:v4.15
> X:v4.16
> X:v4.17
> X.v3.0
>
> Note that this input is tailored to demonstrate the quadratic nature of
> the previous method, as it will compute merge-bases for v4.9 versus all
> of the later versions before checking against v4.1.
>
> Before: 0.26 s
>  After: 0.21 s
>
> Since we previously used the is_descendant_of method in the ref_newer
> method, we also measured performance there using
> 'test-tool reach ref_newer' with this input:
>
> A:v4.9
> B:v3.19
>
> Before: 0.10 s
>  After: 0.08 s
>
> By adding a new commit with parent v3.19, we test the non-reachable case
> of ref_newer:
>
> Before: 0.09 s
>  After: 0.08 s
>
> Signed-off-by: Derrick Stolee 
> ---

Thanks for the commit message. The code itself looks good!

I think this series is nearly done, I have only commented on
style issues so far, which are easier to address than fundamental
design issues or naming things.

Thanks,
Stefan


Re: [PATCH] gc: do not warn about too many loose objects

2018-07-16 Thread Jeff King
On Mon, Jul 16, 2018 at 03:03:06PM -0700, Jonathan Nieder wrote:

> Jeff King wrote:
> 
> > I don't think any command should report failure of its _own_ operation
> > if "gc --auto" failed. And grepping around the source code shows that we
> > typically ignore it.
> 
> Oh, good point.  In non-daemon mode, we don't let "gc --auto" failure
> cause the invoking command to fail, but in daemon mode we do.  That
> should be a straightforward fix; patch coming in a moment.

OK, that definitely sounds like a bug. I'm still confused how that could
happen, though, since from the caller's perspective they ignore git-gc's
exit code either way. I guess I'll see in your patch. :)

> > What I was trying to say earlier is that we _did_ build this
> > rate-limiting, and I think it is a bug that the non-daemon case does not
> > rate-limit (but nobody noticed, because the default is daemonizing).
> >
> > So the fix is not "rip out the rate-limiting in daemon mode", but rather
> > "extend it to the non-daemon case".
> 
> Can you point me to some discussion about building that rate-limiting?
> The commit message for v2.12.2~17^2 (gc: ignore old gc.log files,
> 2017-02-10) definitely doesn't describe that as its intent.

I think that commit is a loosening of the rate-limiting (because we'd
refuse to progress for something that was actually time-based). But the
original stopping comes from this discussion, I think:

  https://public-inbox.org/git/xmqqlhijznpm@gitster.dls.corp.google.com/

(I didn't read the whole thread, but that was what I hit by blaming the
log code and then tracing that back to the list).

> This is the kind of review that Dscho often complains about, where
> someone tries to fix something small but significant to users and gets
> told to build something larger that was not their itch instead.

I don't know how to say more emphatically that I am not asking anyone to
build something larger (like cruft packfiles). I'm just trying to bring
up an impact that the author didn't consider (and that IMHO would be a
regression). Isn't that what reviews are for?

I only mention packfiles because as the discussion turns to "well, all
of these solutions are mediocre hacks" (because they absolutely are),
it's important to realize that there _is_ a right solution, and we even
already know about it. Even if we don't work on it now, knowing that
it's there makes it easier to decide about the various hacks.

> The comments about the "Why is 'git commit' so slow?" experience and
> how having the warning helps with that are well taken.  I think we
> should be able to find a way to keep the warning in a v2 of this
> patch.  But the rest about rate-limiting and putting unreachable
> objects in packs etc as a blocker for this are demoralizing, since
> they gives the feeling that even if I handle the cases that are
> handled today well, it will never be enough for the project unless I
> solve the larger problems that were already there.

I really don't know why we are having such trouble communicating. I've
tried to make it clear several times that the pack thing is not
something I expect your or Jonathan Tan to work on, but obviously I
failed. I'd be _delighted_ if you wanted to work on it, but AFAICT this
patch is purely motivated by:

  1. there's a funny exit code thing going on (0 on the first run, -1 on
 the second)

  2. the warning is not actionable by users

I disagree with the second, and I think we've discussed easy solutions
for the first.

-Peff


Re: [PATCH 15/16] commit-reach: make can_all_from_reach... linear

2018-07-16 Thread Stefan Beller
On Mon, Jul 16, 2018 at 6:00 AM Derrick Stolee via GitGitGadget
 wrote:
>
> Note how the time increases between the two cases in the two versions.
> The new code increases relative to the number of commits that need to be
> walked, but not directly relative to the number of 'from' commits.

Cool!

>  int can_all_from_reach_with_flag(struct object_array *from,
>  int with_flag, int assign_flag,
> -time_t min_commit_date)
> +time_t min_commit_date,
> +uint32_t min_generation)
>  {

> for (i = 0; i < from->nr; i++) {
[...]
> +   parse_commit(list[i]);

parse_commit_or_die or handle the return code?
(or a comment why we specifically are allowed to ignore
the return code here)

[...]
> +   for (parent = stack->item->parents; parent; parent = 
> parent->next) {
[...]
> +   parse_commit(parent->item);

same here.


Re: [PATCH 07/16] commit-reach: move can_all_from_reach_with_flags

2018-07-16 Thread Jonathan Tan
>  /* Remember to update object flag allocation in object.h */
> +#define REACHABLE   (1u<<15)
>  #define PARENT1  (1u<<16)
>  #define PARENT2  (1u<<17)
>  #define STALE(1u<<18)

Update the object flag allocation in object.h.

> +int reachable(struct commit *from, int with_flag, int assign_flag,
> +   time_t min_commit_date)

In this and previous patches: I think it's better to use "unsigned int"
as the data type for a flag, just like in clear_commit_marks().

Other than that, this and all previous patches look good.


Re: [PATCH v2 7/9] gpg-interface: introduce new config to select per gpg format program

2018-07-16 Thread Jeff King
On Mon, Jul 16, 2018 at 02:56:34PM -0700, Junio C Hamano wrote:

> >> I'm okay with us forcing "openpgp".  That seems sane enough for now, and
> >> if people scream loudly, we can loosen it.
> >
> > Well, I specifically meant "are you sure subsections like this are a
> > good idea". But it seems like people think so?
> 
> I admit that I did not even consider that there may be better tool
> than using subsections to record this information.  What are the
> possibilities you have in mind (if you have one)?

I don't think there is another tool except two-level options, like
"gpg.openpgpprogram" and "gpg.x509program".

Although those are a bit ugly, I just wondered if they might make things
simpler, since AFAIK we are not planning to add more config options
here. Like gpg.x509.someotherflag, nor gpg.someothertool.program.

Of course one reason _for_ the tri-level is that we might one day add
gpg.x509.someotherflag, and this gives us room to do it with less
awkwardness (i.e., a proliferation of gpg.x509someflag options).

-Peff


Re: [PATCH v2] send-email: Fix tls AUTH when sending batch

2018-07-16 Thread Junio C Hamano
Jules Maselbas  writes:

> The variable smtp_encryption must keep it's value between two batches.
> Otherwise the authentication is skipped after the first batch.
>
> Signed-off-by: Jules Maselbas 
> ---
>  git-send-email.perl | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/git-send-email.perl b/git-send-email.perl
> index 8ec70e58e..1f9a73f74 100755
> --- a/git-send-email.perl
> +++ b/git-send-email.perl
> @@ -1479,7 +1479,7 @@ sub send_message {
>SSL => 1);
>   }
>   }
> - else {
> + elsif (!$smtp) {
>   $smtp_server_port ||= 25;
>   $smtp ||= Net::SMTP->new($smtp_server,
>Hello => $smtp_domain,

Puzzled...  The code is prepared to deal with the case where $smtp
is still active at this point by using "$smtp ||=", but this hunk
forces any caller that still has $smtp active from entering this
block.

So the conditional assignment "$smtp ||= Net::SMTP->new()" is
meaningless after this patch---we always do "new" if we reach this
statement.

This hunk did not exist in your v1, yet the proposed log message has
not changed since v1.  With this hunk, it appears that the problem
and the solution are not about $smtp_encryption but is about not
calling ->starttls() or ->command('STARTTLS') when we reuse $smtp
that has been established earlier, and the description in the log
message feels a bit off.  Some exaplantion on why this hunk is a
good thing may be missing, I guess.

Dropping the assignment to $smtp_encryption in this block (i.e. hunk
below) makes sense, as we do not call read_config() to re-read its
value after the batching code quits and unsets $smtp.

> @@ -1501,7 +1501,6 @@ sub send_message {
>   $smtp->starttls(ssl_verify_params())
>   or die sprintf(__("STARTTLS 
> failed! %s"), IO::Socket::SSL::errstr());
>   }
> - $smtp_encryption = '';
>   # Send again to receive fresh

By the way, the patch claims that it is against 8ec70e58e, but has
an edit on this context line and does not apply.  Did you hand-edit
your patch after generating it?  Please don't.


>   # supported commands
>   $smtp->hello($smtp_domain);


Re: [PATCH 14/16] commit-reach: replace ref_newer logic

2018-07-16 Thread Stefan Beller
On Mon, Jul 16, 2018 at 6:00 AM Derrick Stolee via GitGitGadget
 wrote:
>
> From: Derrick Stolee 
>
> The ref_newer method is used by 'git push' to check if a force-push is
> required. This method does not use any kind of cutoff when walking, so
> in the case of a force-push will walk all reachable commits.
>
> The is_descendant_of method already uses paint_down_to_common along with
> cutoffs. By translating the ref_newer arguments into the commit and
> commit_list required by is_descendant_of, we can have one fewer commit
> walk and also improve our performance!
>
> For a copy of the Linux repository, 'test-tool reach ref_newer' presents
> the following improvements with the specified input. In the case that
> ref_newer returns 1, there is no improvement. The improvement is in the
> second case where ref_newer returns 0.
>
> Input
> -

I fetched the series as advertised in the cover letter; however Junio
applies the patches manually, for which there is a problem here in the
patch format. Three dashes indicate the end of a commit message and
below that you usually have some ephemeral information such as the
stats, followed by the diffs starting with "diff --git", at least that was the
case.

I just tested and it applies this patch cleanly keeping the information
below the three dashes intact. Cool!

> A:v4.9
> B:v3.19
>
> Before: 0.09 s
>  After: 0.09 s
>
> To test the negative case, add a new commit with parent v3.19,
> regenerate the commit-graph, and then run with B pointing at that
> commit.
>
> Before: 0.43 s
>  After: 0.09 s

Nice! The code looks good, too.
Thanks,
Stefan


Re: [PATCH] gc: do not warn about too many loose objects

2018-07-16 Thread Elijah Newren
On Mon, Jul 16, 2018 at 2:21 PM, Jeff King  wrote:
> On Mon, Jul 16, 2018 at 02:09:43PM -0700, Elijah Newren wrote:

>> The point of gc is to: expire reflogs, repack referenced objects, and
>> delete loose objects that (1) are no longer referenced and (2) are
>> "old enough".
>>
>> The "old enough" bit is the special part here.  Before the gc, those
>> objects were referenced (only by old reflog entries) and were found in
>> a pack.  git currently writes those objects out to disk and gives them
>> the age of the packfile they are contained in, which I think is the
>> source of the bug.  We have a reference for those objects from the
>> reflogs, know they aren't referenced anywhere else, so those objects
>> to me are the age of those reflog entries: 90 days.  As such, they are
>> "old enough" and should be deleted.
>
> OK, I see what you're saying, but...
>
>> I never got around to fixing it properly, though, because 'git prune'
>> is such a handy workaround that for now.  Having people nuke all their
>> loose objects is a bit risky, but the only other workaround people had
>> was to re-clone (waiting the requisite 20+ minutes for the repo to
>> download) and throw away their old clone.  (Though some people even
>> went that route, IIRC.)
>
> If we were to delete those objects, wouldn't it be exactly the same as
> running "git prune"? Or setting your gc.pruneExpire to "now" or even "5
> minutes"?  Or are you concerned with taking other objects along for the
> ride that weren't part of old reflogs? I think that's a valid concern,

Yes, I was worried about taking other objects along for the ride that
weren't part of old reflogs.

> but it's also an issue for objects which were previously referenced in
> a reflog, but are part of another current operation.

I'm not certain what you're referring to here.

> Also, what do you do if there weren't reflogs in the repo? Or the
> reflogs were deleted (e.g., because branch deletion drops the associated
> reflog entirely)?

Yes, there are issues this rule won't help with, but in my experience
it was a primary (if not sole) actual cause in practice.  (I believe I
even said elsewhere in this thread that I knew there were unreachable
objects for other reasons and they might also become large in number).
At $DAYJOB we've had multiple people including myself hit the "too
many unreachable loose objects" nasty loop issue (some of us multiple
different times), and as far as I can tell, most (perhaps even all) of
them would have been avoided by just "properly" deleting garbage as
per my object-age-is-reflog-age-if-not-otherwise-referenced rule.

>> With big repos, it's easy to get into situations where there are well
>> more than 1 objects satisfying these conditions.  In fact, it's
>> not all that rare that the repo has far more loose objects after a git
>> gc than before.
>
> Yes, this is definitely a wart and I think is worth addressing.
>
>> I totally agree with your general plan to put unreferenced loose
>> objects into a pack.  However, I don't think these objects should be
>> part of that pack; they should just be deleted instead.
>
> I assume by "these objects" you mean ones which used to be reachable
> from a reflog, but that reflog entry just expired.  I think you'd be
> sacrificing some race-safety in that case.

Is that inherently any more race unsafe than 'git prune
--expire=2.weeks.ago'?  I thought it'd be racy in the same ways, and
thus a tradeoff folks are already accepting (at least implicitly) when
running git-gc.  Since these objects are actually 90 days old rather
than a mere two weeks, it actually seemed more safe to me.  But maybe
I'm overlooking something with the pack->loose transition that makes
it more racy?

> If the objects went into a pack under a race-proof scheme, would that
> actually bother you? Is it the 10,000 objects that's a problem, or is it
> the horrible I/O from exploding them coupled with the annoying auto-gc
> behavior?

Yeah, good point.  It's mostly the annoying auto-gc behavior and the
horrible I/O of future git operations from having lots of loose
objects.  They've already been paying the cost of storing those
objects in packed form for 90 days; a few more won't hurt much.  I'd
be slightly annoyed knowing that we're storing garbage we don't need
to be, but I don't think it's a real issue.


Re: [PATCH] gc: do not warn about too many loose objects

2018-07-16 Thread Jonathan Nieder
Jeff King wrote:

> I don't think any command should report failure of its _own_ operation
> if "gc --auto" failed. And grepping around the source code shows that we
> typically ignore it.

Oh, good point.  In non-daemon mode, we don't let "gc --auto" failure
cause the invoking command to fail, but in daemon mode we do.  That
should be a straightforward fix; patch coming in a moment.

> On Mon, Jul 16, 2018 at 02:40:03PM -0700, Jonathan Nieder wrote:

>> For comparison, in non-daemon mode, there is nothing enforcing the
>> kind of ratelimiting you are talking about.  It is an accident of
>> history.  If we want this kind of ratelimiting, I'd rather we build it
>> deliberately instead of relying on this accident.
>
> What I was trying to say earlier is that we _did_ build this
> rate-limiting, and I think it is a bug that the non-daemon case does not
> rate-limit (but nobody noticed, because the default is daemonizing).
>
> So the fix is not "rip out the rate-limiting in daemon mode", but rather
> "extend it to the non-daemon case".

Can you point me to some discussion about building that rate-limiting?
The commit message for v2.12.2~17^2 (gc: ignore old gc.log files,
2017-02-10) definitely doesn't describe that as its intent.

This is the kind of review that Dscho often complains about, where
someone tries to fix something small but significant to users and gets
told to build something larger that was not their itch instead.

The comments about the "Why is 'git commit' so slow?" experience and
how having the warning helps with that are well taken.  I think we
should be able to find a way to keep the warning in a v2 of this
patch.  But the rest about rate-limiting and putting unreachable
objects in packs etc as a blocker for this are demoralizing, since
they gives the feeling that even if I handle the cases that are
handled today well, it will never be enough for the project unless I
solve the larger problems that were already there.

Jonathan


Re: [PATCH 12/16] test-reach: test reduce_heads

2018-07-16 Thread Eric Sunshine
On Mon, Jul 16, 2018 at 5:30 PM Stefan Beller  wrote:
> > +test_expect_success 'reduce_heads' '
> > +   cat >input <<-\EOF &&
> > +   X:commit-1-10
> > +   X:commit-2-8
> > +   X:commit-3-6
> > +   X:commit-4-4
> > +   X:commit-1-7
> > +   X:commit-2-5
> > +   X:commit-3-3
> > +   X:commit-5-1
> > +   EOF
> > +   {
> > +   printf "reduce_heads(X):\n" &&
> > +   git rev-parse commit-5-1 &&
> > +   git rev-parse commit-4-4 &&
> > +   git rev-parse commit-3-6 &&
> > +   git rev-parse commit-2-8 &&
> > +   git rev-parse commit-1-10
> > +  } >expect &&
>
> Please use rev-parse only once.
>
> I am not sure about the usage of { braces } in the test suite,
> +cc Eric who sent a test suite linting series recently.
> Do we need to em-'brace' the statements that describe the
> expected behavior? (Or is it supposed to be easier to read
> for the reviewers? I found these very readable so far... but
> this question just came up)

Grouping the commands for redirection via a "{...}>expect" block is
less noisy than redirecting each command separately, thus more
reviewer-friendly. And, {...} blocks are used regularly in the test
suite, so no issue there.

I do agree that a single git-rev-parse with all 5 arguments makes more
sense (and would be appreciated by Windows folk). Also, the 'printf'
could be replaced by a simple 'echo' if we want to get nit-picky.

Finally, a style nit: We don't normally indent the content of a
here-doc like that. Instead, the content is normally aligned with the
closing EOF, not indented beyond it.


Re: [PATCH v2 7/9] gpg-interface: introduce new config to select per gpg format program

2018-07-16 Thread Junio C Hamano
Jeff King  writes:

> On Sat, Jul 14, 2018 at 06:13:47PM +, brian m. carlson wrote:
>
>> On Tue, Jul 10, 2018 at 12:56:38PM -0400, Jeff King wrote:
>> > On Tue, Jul 10, 2018 at 12:54:13PM -0400, Jeff King wrote:
>> > 
>> > > Should we allow:
>> > > 
>> > >   [gpg "OpenPGP"]
>> > >   program = whatever
>> > > 
>> > > given that we allow:
>> > > 
>> > >   [gpg]
>> > >   format = OpenPGP
>> > > 
>> > > ? I think just using strcasecmp() here would be sufficient. But I wonder
>> > > if it is a symptom of using the wrong tool (subsections) when we don't
>> > > need it.
>> > 
>> > I did just read the discussion in response to v1, where everybody told
>> > you the opposite. ;)
>> > 
>> > So I guess my question/points are more for brian and Junio.
>> 
>> I'm okay with us forcing "openpgp".  That seems sane enough for now, and
>> if people scream loudly, we can loosen it.
>
> Well, I specifically meant "are you sure subsections like this are a
> good idea". But it seems like people think so?

I admit that I did not even consider that there may be better tool
than using subsections to record this information.  What are the
possibilities you have in mind (if you have one)?

>
> If so, then I think the best route is just dictating that yes,
> gpg.format is case-sensitive because it is referencing
> gpg..program, which is itself case-sensitive (and "openpgp" is
> the right spelling).
>
> -Peff


Re: [PATCH 13/16] test-reach: test can_all_from_reach_with_flags

2018-07-16 Thread Stefan Beller
On Mon, Jul 16, 2018 at 6:00 AM Derrick Stolee via GitGitGadget
 wrote:
>
> From: Derrick Stolee 
>
> The can_all_from_reach_with_flags method is used by ok_to_give_up in
> upload-pack.c to see if we have done enough negotiation during a fetch.
> This method is intentionally created to preserve state between calls to
> assist with stateful negotiation, such as over SSH.
>
> To make this method testable, add a new can_all_from_reach method that
> does the initial setup and final tear-down. Call the method from
> 'test-tool reach'.
>
> Since this is a many-to-many reachability query, add a new type of input
> to the 'test-tool reach' input format. Lines "Y:" create a
> list of commits to be the reachability targets from the commits in the
> 'X' list. In the context of fetch negotiation, the 'X' commits are the
> 'want' commits and the 'Y' commits are the 'have' commits.

Makes sense. I shortly wondered if we want to s/Y/Z/ as I find X and Z
more distinguishable than X/Y for reading/skimming.

Thanks,
Stefan

> +++ b/commit-reach.c
> @@ -593,3 +593,50 @@ int can_all_from_reach_with_flag(struct object_array 
> *from,
> }
> return 1;
>  }
> +
> +int can_all_from_reach(struct commit_list *from, struct commit_list *to,
> +  int cutoff_by_min_date)

We'll put this method (that is only used by tests so far) here to
not clutter the test tool code too much, or do we see more benefits
from the code
here? If so, docs would be nice.

> +++ b/t/t6600-test-reach.sh

> +test_expect_success 'can_all_from_reach:hit' '
  [...]
> +   Y:commit-7-3
> +   Y:commit-8-1

> +test_expect_success 'can_all_from_reach:miss' '
[...]
> +   Y:commit-8-5

It would be nice if the difference in the list could be easier
to spot as a reader. (There is a lot of repetition).

Maybe we can teach "test-tool reach" to ignore input lines
starting with '#' such that we can annotate the last line in
the miss case?

Why do we omit 7-3 in the miss case? (might be nice
for symmetry to keep around)

Thanks,
Stefan


Re: [PATCH] gc: do not warn about too many loose objects

2018-07-16 Thread Jeff King
On Mon, Jul 16, 2018 at 02:40:03PM -0700, Jonathan Nieder wrote:

> > The key thing is that the presence of the warning/log still suppress
> > the actual gc for the gc.logExpiry period.
> 
> Ah, now I think I see the source of the misunderstanding.
> 
> When I initially read the docs, I also assumed that
> 
>   If the file gc.log exists, then git gc --auto won’t run unless
>   that file is more than gc.logExpiry old.
> 
> meant "git gc --auto" would be skipped (and thus silently succeed) when
> the file is less than gc.logExpiry old.  But that assumption was wrong:
> it errors out!

Right. That's the mysterious "-1" error code, and I agree that part is
confusing.

> This makes scripting difficult, since arbitrary commands that
> incidentally run "git gc --auto" will fail in this state, until gc.log
> expires (but at that point they'll fail again anyway).

I don't think any command should report failure of its _own_ operation
if "gc --auto" failed. And grepping around the source code shows that we
typically ignore it.

> For comparison, in non-daemon mode, there is nothing enforcing the
> kind of ratelimiting you are talking about.  It is an accident of
> history.  If we want this kind of ratelimiting, I'd rather we build it
> deliberately instead of relying on this accident.

What I was trying to say earlier is that we _did_ build this
rate-limiting, and I think it is a bug that the non-daemon case does not
rate-limit (but nobody noticed, because the default is daemonizing).

So the fix is not "rip out the rate-limiting in daemon mode", but rather
"extend it to the non-daemon case".

-Peff


Re: [PATCH] gc: do not warn about too many loose objects

2018-07-16 Thread Jonathan Nieder
Jeff King wrote:
> On Mon, Jul 16, 2018 at 01:37:53PM -0700, Jonathan Nieder wrote:
>> Jeff King wrote:

>>> With the current code, that produces a bunch of annoying warnings for
>>> every commit ("I couldn't gc because the last gc reported a warning").
>>> But after Jonathan's patch, every single commit will do a full gc of the
>>> repository. In this tiny repository that's relatively quick, but in a
>>> real repo it's a ton of CPU and I/O, all for nothing.
>>
>> I see.  Do I understand correctly that if we find a way to print the
>> warning but not error out, that would be good enough for you?
>
> Yes. I thought that's what I proposed originally. ;)
>
> The key thing is that the presence of the warning/log still suppress
> the actual gc for the gc.logExpiry period.

Ah, now I think I see the source of the misunderstanding.

When I initially read the docs, I also assumed that

If the file gc.log exists, then git gc --auto won’t run unless
that file is more than gc.logExpiry old.

meant "git gc --auto" would be skipped (and thus silently succeed) when
the file is less than gc.logExpiry old.  But that assumption was wrong:
it errors out!

This makes scripting difficult, since arbitrary commands that
incidentally run "git gc --auto" will fail in this state, until gc.log
expires (but at that point they'll fail again anyway).

For comparison, in non-daemon mode, there is nothing enforcing the
kind of ratelimiting you are talking about.  It is an accident of
history.  If we want this kind of ratelimiting, I'd rather we build it
deliberately instead of relying on this accident.

Jonathan


Re: [PATCH v3 1/7] gpg-interface: add new config to select how to sign a commit

2018-07-16 Thread Jeff King
On Mon, Jul 16, 2018 at 01:14:34PM -0700, Junio C Hamano wrote:

> >  #define PGP_SIGNATURE "-BEGIN PGP SIGNATURE-"
> > @@ -138,6 +139,12 @@ int git_gpg_config(const char *var, const char *value, 
> > void *cb)
> > return 0;
> > }
> >  
> > +   if (!strcmp(var, "gpg.format")) {
> > +   if (value && strcasecmp(value, "openpgp"))
> > +   return error("malformed value for %s: %s", var, value);
> > +   return git_config_string(_format, var, value);
> 
> I may be mistaken (sorry, I read so many topics X-<) but I think the
> consensus was to accept only "openpgp" so that we can ensure that
> 
>   [gpg "openpgp"] program = foo
> 
> etc. can be parsed more naturally without having to manually special
> case the subsection name to be case insensitive.  In other words,
> strcasecmp() should just be strcmp().  Otherwise, people would get a
> wrong expectation that
> 
>   [gpg] format = OpenPGP
>   [gpg "openpgp"] program = foo
> 
> should refer to the same and single OpenPGP, but that would violate
> the usual configuration syntax rules.

I was the one who argued the other way. But unless we are going to move
to a two-level config for all of it (i.e., gpg.openPGPProgram), I think
what you suggest here is the sanest way forward.

> The structure of checking the error looks quite suboptimal even when
> we initially support the single "openpgp" at this step.  I would
> have expected you to be writing the above like so:
> 
>   if (!value)
>   return config_error_nonbool(var);
>   if (strcmp(value, "openpgp"))
>   return error("unsupported value for %s: %s", var, value);
>   return git_config_string(...);
> 
> That would make it more clear that the variable is "nonbool", which
> does not change across additional support for new formats in later
> steps.

Agreed.

-Peff


Re: [PATCH 11/25] t7400: fix broken "submodule add/reconfigure --force" test

2018-07-16 Thread Junio C Hamano
Eric Sunshine  writes:

> On Mon, Jul 16, 2018 at 11:51 AM Johannes Schindelin
>  wrote:
>> On Mon, 16 Jul 2018, Johannes Schindelin wrote:
>> > > -   git submodule add --force bogus-url submod &&
>> > > +   git submodule add --force /bogus-url submod &&
>> >
>> > This breaks on Windows because of the absolute Unix-y path having to be
>> > translated to a Windows path:
>> >
>> >   
>> > https://git-for-windows.visualstudio.com/git/git%20Team/_build/results?buildId=12365=logs
>> >
>> > I could imagine that using "$(pwd)/bogus-url" (which will generate a
>> > Windows-y absolute path on Windows) would work, though.
>>
>> Yes, this works indeed, see the patch below. Could you please squash it in
>> if you send another iteration of your patch series? Junio, could you
>> please add this as a SQUASH??? commit so that `pu` is fixed on Windows?
>
> Thanks for reporting and diagnosing. I wondered briefly if we could
> get by with simply "./bogus-url" instead of having to use $(pwd),
> however, "./bogus-url" gets normalized internally into an absolute
> path, so $(pwd) is needed anyhow to make the test '"$bogus_url" =
> "$(git config ...)"' check work.
>
> So, this SQUASH looks like the correct way forward. Hopefully, Junio
> can just squash it so I don't have to flood the list again with this
> long series.

The topic already has another dependent topic so rerolling or
squashing would be a bit cumbersome.  I'll see what I could do but
it may not be until tomorrow's pushout.


Re: [PATCH v2 7/9] gpg-interface: introduce new config to select per gpg format program

2018-07-16 Thread Jeff King
On Sat, Jul 14, 2018 at 06:13:47PM +, brian m. carlson wrote:

> On Tue, Jul 10, 2018 at 12:56:38PM -0400, Jeff King wrote:
> > On Tue, Jul 10, 2018 at 12:54:13PM -0400, Jeff King wrote:
> > 
> > > Should we allow:
> > > 
> > >   [gpg "OpenPGP"]
> > >   program = whatever
> > > 
> > > given that we allow:
> > > 
> > >   [gpg]
> > >   format = OpenPGP
> > > 
> > > ? I think just using strcasecmp() here would be sufficient. But I wonder
> > > if it is a symptom of using the wrong tool (subsections) when we don't
> > > need it.
> > 
> > I did just read the discussion in response to v1, where everybody told
> > you the opposite. ;)
> > 
> > So I guess my question/points are more for brian and Junio.
> 
> I'm okay with us forcing "openpgp".  That seems sane enough for now, and
> if people scream loudly, we can loosen it.

Well, I specifically meant "are you sure subsections like this are a
good idea". But it seems like people think so?

If so, then I think the best route is just dictating that yes,
gpg.format is case-sensitive because it is referencing
gpg..program, which is itself case-sensitive (and "openpgp" is
the right spelling).

-Peff


Re: [PATCH 01/16] commit-reach: move walk methods from commit.c

2018-07-16 Thread Jonathan Tan
> +/* Remember to update object flag allocation in object.h */
> +#define PARENT1  (1u<<16)
> +#define PARENT2  (1u<<17)
> +#define STALE(1u<<18)
> +#define RESULT   (1u<<19)

Update object.h to point to commit-reach.c instead of commit.c also.

> diff --git a/commit-reach.h b/commit-reach.h
> new file mode 100644
> index 0..244f48c5f
> --- /dev/null
> +++ b/commit-reach.h
> @@ -0,0 +1,41 @@
> +#ifndef __COMMIT_REACH_H__
> +#define __COMMIT_REACH_H__
> +
> +#include "commit.h"
> +
> +struct commit_list *get_merge_bases_many(struct commit *one,
> +  int n,
> +  struct commit **twos);



Should the declarations in commit.h be deleted also?

Thanks for copying it over verbatim - it makes it much easier to see
what's going on with --color-moved.


Re: [PATCH v2 0/9] X509 (gpgsm) commit signing support

2018-07-16 Thread Jeff King
On Sat, Jul 14, 2018 at 06:33:12PM +, brian m. carlson wrote:

> > This series is a fine replacement for that earlier work. It's flexible
> > enough to allow what we really wanted out of that series (gpgsm support,
> > or another drop-in tool that uses the same interface). It doesn't lay
> > any groundwork for further tools (like signify), but I think the
> > consensus on the list was to punt on that until somebody had more
> > concrete plans for adding such a tool.
> 
> I actually think this moves in a nice direction for adding support for
> minisign/signify and other schemes.  There's a way to look up what
> algorithm is in use in a particular context based on the first line and
> a general interface for deciding what format to write.  Granted, it
> currently still is very specific to gpg-style tools, but I think this is
> an improvement in that regard.

My issue with this for helping with signify is that it creates a new
gpg..* hierarchy with two slots (openpgp and x509). But we would
not want gpg.signify.program, would we?  That makes no sense, as neither
the signature-matching nor the program invocation are gpg-like.

But if we later moved to "signingtool..*", now we have an extra
layer of compatibility to deal with. E.g., signingtool.openpgp.program
is the same as gpg.openpgp.program which is the same as gpg.program.

I think we can do that, but it means more historical baggage.

I'm OK with that since signify support is purely hypothetical at this
point.  But that's why I say that this doesn't lay the groundwork in the
way that the other series did.

> As an OpenPGP user, I have no interest in adding support for other
> tools, but I think this should make it easier if someone else wants to
> do that.

I don't plan to work on signify (or other tools) anytime soon either. My
interest here is in x509, since that's what enterprises would use over
pgp.

I actually dislike pgp for this application, too, because I find the key
management kind of complicated and tedious. But at least it's a standard
among open source folks.

-Peff


Re: [PATCH 12/16] test-reach: test reduce_heads

2018-07-16 Thread Stefan Beller
> +test_expect_success 'reduce_heads' '
> +   cat >input <<-\EOF &&
> +   X:commit-1-10
> +   X:commit-2-8
> +   X:commit-3-6
> +   X:commit-4-4
> +   X:commit-1-7
> +   X:commit-2-5
> +   X:commit-3-3
> +   X:commit-5-1
> +   EOF
> +   {
> +   printf "reduce_heads(X):\n" &&
> +   git rev-parse commit-5-1 &&
> +   git rev-parse commit-4-4 &&
> +   git rev-parse commit-3-6 &&
> +   git rev-parse commit-2-8 &&
> +   git rev-parse commit-1-10

Please use rev-parse only once.

I am not sure about the usage of { braces } in the test suite,
+cc Eric who sent a test suite linting series recently.
Do we need to em-'brace' the statements that describe the
expected behavior? (Or is it supposed to be easier to read
for the reviewers? I found these very readable so far... but
this question just came up)

Thanks,
Stefan


Re: [PATCH] gc: do not warn about too many loose objects

2018-07-16 Thread Jonathan Nieder
Elijah Newren wrote:

> I totally agree with your general plan to put unreferenced loose
> objects into a pack.  However, I don't think these objects should be
> part of that pack; they should just be deleted instead.

This might be the wrong thread to discuss it, but did you follow the
reference/prune race that Peff mentioned?  The simplest cure I'm aware
of to it does involve writing those objects to a pack.  The idea is to
enforce a straightforward contract:

There are two kinds of packs: GC and UNREACHABLE_GARBAGE.

Every object in a GC pack has a minimum lifetime of  (let's say
"1 days") from the time they are read.  If you start making use of an
object from a GC pack (e.g. by creating a new object referencing it),
you have three days to ensure it's referenced.

Each UNREACHABLE_GARBAGE pack has a  (let's say "3 days") from
the time it is created.  Objects in an UNREACHABLE_GARBAGE have no
minimum ttl from the time they are read.  If you want to start making
use of an object from an UNREACHABLE_GARBAGE pack (e.g. by creating a
new object referencing it), then copy it and everything it references
to a GC pack.

To avoid a proliferation of UNREACHABLE_GARBAGE packs, there's a rule
for coalescing them, but that's not relevant here.

It is perfectly possible for an object in a GC pack to reference an
object in an UNREACHABLE_GARBAGE pack via writes racing with gc, but
that's fine --- on the next gc run, the unreachable garbage objects
get copied to a GC pack.

We've been using this on a JGit DfsRepository based server for > 2
years now and it's been working well.  More details are in the "Loose
objects and unreachable objects" section in Documentation/technical/
mentioned before.

Thanks,
Jonathan


Re: [PATCH 11/16] test-reach: test get_merge_bases_many

2018-07-16 Thread Stefan Beller
> +test_expect_success 'get_merge_bases_many' '
> +   cat >input <<-\EOF &&
> +   A:commit-5-7
> +   X:commit-4-8
> +   X:commit-6-6
> +   X:commit-8-3
> +   EOF
> +   {
> +   printf "get_merge_bases_many(A,X):\n" &&
> +   git rev-parse commit-5-6 &&
> +   git rev-parse commit-4-7

Please call rev-parse only once, giving both tips as argument, i.e.

   printf "get_merge_bases_many(A,X):\n" &&
   git rev-parse commit-5-6 \
 commit-4-7

ought to produce the same output

Thanks,
Stefan


Re: [PATCH] gc: do not warn about too many loose objects

2018-07-16 Thread Jeff King
On Mon, Jul 16, 2018 at 02:09:43PM -0700, Elijah Newren wrote:

> >> Um, except it doesn't actually do that.  The testcase I provided shows
> >> that it leaves around 1 objects that are totally deletable and
> >> were only previously referenced by reflog entries -- entries that gc
> >> removed without removing the corresponding objects.
> >
> > What's your definition of "totally deletable"?
> 
> The point of gc is to: expire reflogs, repack referenced objects, and
> delete loose objects that (1) are no longer referenced and (2) are
> "old enough".
> 
> The "old enough" bit is the special part here.  Before the gc, those
> objects were referenced (only by old reflog entries) and were found in
> a pack.  git currently writes those objects out to disk and gives them
> the age of the packfile they are contained in, which I think is the
> source of the bug.  We have a reference for those objects from the
> reflogs, know they aren't referenced anywhere else, so those objects
> to me are the age of those reflog entries: 90 days.  As such, they are
> "old enough" and should be deleted.

OK, I see what you're saying, but...

> I never got around to fixing it properly, though, because 'git prune'
> is such a handy workaround that for now.  Having people nuke all their
> loose objects is a bit risky, but the only other workaround people had
> was to re-clone (waiting the requisite 20+ minutes for the repo to
> download) and throw away their old clone.  (Though some people even
> went that route, IIRC.)

If we were to delete those objects, wouldn't it be exactly the same as
running "git prune"? Or setting your gc.pruneExpire to "now" or even "5
minutes"?  Or are you concerned with taking other objects along for the
ride that weren't part of old reflogs? I think that's a valid concern,
but it's also an issue for objects which were previously referenced in
a reflog, but are part of another current operation.

Also, what do you do if there weren't reflogs in the repo? Or the
reflogs were deleted (e.g., because branch deletion drops the associated
reflog entirely)?

> With big repos, it's easy to get into situations where there are well
> more than 1 objects satisfying these conditions.  In fact, it's
> not all that rare that the repo has far more loose objects after a git
> gc than before.

Yes, this is definitely a wart and I think is worth addressing.

> I totally agree with your general plan to put unreferenced loose
> objects into a pack.  However, I don't think these objects should be
> part of that pack; they should just be deleted instead.

I assume by "these objects" you mean ones which used to be reachable
from a reflog, but that reflog entry just expired.  I think you'd be
sacrificing some race-safety in that case.

If the objects went into a pack under a race-proof scheme, would that
actually bother you? Is it the 10,000 objects that's a problem, or is it
the horrible I/O from exploding them coupled with the annoying auto-gc
behavior?

-Peff


Re: [PATCH] gc: do not warn about too many loose objects

2018-07-16 Thread Jeff King
On Mon, Jul 16, 2018 at 01:56:46PM -0700, Jonathan Nieder wrote:

> I don't believe we are very good at transitively propagating freshness
> today, by the way.

Perhaps I don't understand what you mean by transitive here. But we
should be traversing from any fresh object and marking any non-fresh
ones that are reachable from it to be saved. If you know of a case where
we're not, there's a bug, and I'd be happy to look into it.

The only problem I know about is the utter lack of atomicity.

-Peff


Re: rev-parse --show-toplevel broken during exec'ed rebase?

2018-07-16 Thread Junio C Hamano
Jeff King  writes:

> On Mon, Jul 16, 2018 at 11:14:51AM -0700, Junio C Hamano wrote:
>
>> Porcelain, but I suspect in practice always giving GIT_DIR and
>> GIT_WORK_TREE would work well for many existing hooks.
>
> Yeah, that may be an option. I don't remember if this was discussed in
> this thread or elsewhere, but setting GIT_DIR is a regression for hooks,
> etc, which do:
>
>   git -C /some/other/repo log
>
> or similar. I'm not sure if that falls under "actual regression" or
> "just how it happened to work in the past". I'm not even sure it worked
> that way _consistently_ in the past.
>
> The best practice if you're switching directories is to do:
>
>   unset $(git rev-parse --local-env-vars)
>
> though I highly doubt that most people bother.

Yeah, that is exactly where my "in practice" comes from.  I think I
caught and killed another potential regression last week while it is
still in the draft status during my review ;-)


Re: [PATCH] gc: do not warn about too many loose objects

2018-07-16 Thread Elijah Newren
On Mon, Jul 16, 2018 at 1:38 PM, Jeff King  wrote:
> On Mon, Jul 16, 2018 at 01:16:45PM -0700, Elijah Newren wrote:
>
>> >> The basic problem here, at least for us, is that gc has enough
>> >> information to know it could expunge some objects, but because of how
>> >> it is structured in terms of several substeps (reflog expiration,
>> >> repack, prune), the information is lost between the steps and it
>> >> instead writes them out as unreachable objects.  If we could prune (or
>> >> avoid exploding) loose objects that are only reachable from reflog
>> >> entries that we are expiring, then the problem goes away for us.  (I
>> >> totally understand that other repos may have enough unreachable
>> >> objects for other reasons that Peff's suggestion to just pack up
>> >> unreachable objects is still a really good idea.  But on its own, it
>> >> seems like a waste since it's packing stuff that we know we could just
>> >> expunge.)
>> >
>> > No, we should have expunged everything that could be during the "repack"
>> > and "prune" steps. We feed the expiration time to repack, so that it
>> > knows to drop objects entirely instead of exploding them loose.
>>
>> Um, except it doesn't actually do that.  The testcase I provided shows
>> that it leaves around 1 objects that are totally deletable and
>> were only previously referenced by reflog entries -- entries that gc
>> removed without removing the corresponding objects.
>
> What's your definition of "totally deletable"?

The point of gc is to: expire reflogs, repack referenced objects, and
delete loose objects that (1) are no longer referenced and (2) are
"old enough".

The "old enough" bit is the special part here.  Before the gc, those
objects were referenced (only by old reflog entries) and were found in
a pack.  git currently writes those objects out to disk and gives them
the age of the packfile they are contained in, which I think is the
source of the bug.  We have a reference for those objects from the
reflogs, know they aren't referenced anywhere else, so those objects
to me are the age of those reflog entries: 90 days.  As such, they are
"old enough" and should be deleted.

With big repos, it's easy to get into situations where there are well
more than 1 objects satisfying these conditions.  In fact, it's
not all that rare that the repo has far more loose objects after a git
gc than before.

I never got around to fixing it properly, though, because 'git prune'
is such a handy workaround that for now.  Having people nuke all their
loose objects is a bit risky, but the only other workaround people had
was to re-clone (waiting the requisite 20+ minutes for the repo to
download) and throw away their old clone.  (Though some people even
went that route, IIRC.)

> If the pack they were in is less than 2 weeks old, then they are
> unreachable but "fresh", and are intentionally retained. If it was older
> than 2 weeks, they should have been deleted.

That's what's currently implemented, yes, but as above I think that
logic is bad.

> If you don't like that policy, you can set gc.pruneExpire to something
> lower (including "now", but watch out, as that can race with other
> operations!). But I don't think that's an implementation short-coming.
> It's intentional to keep those objects. It's just that the format
> they're in is inefficient and confuses later gc runs.

I'm aware of pruneExpire, but I think it's the wrong way to handle this.

I totally agree with your general plan to put unreferenced loose
objects into a pack.  However, I don't think these objects should be
part of that pack; they should just be deleted instead.


Re: [PATCH] gc: do not warn about too many loose objects

2018-07-16 Thread Jeff King
On Mon, Jul 16, 2018 at 01:37:53PM -0700, Jonathan Nieder wrote:

> >and I think the
> > right solution is to pack the unreachable objects instead of exploding
> > them.
> 
> That seems like a huge widening in scope relative to what this
> original patch tackled.

I'm not at all saying that Jonathan needs to work on that. It's a big
topic that we've been putting off for years. I _am_ saying that I don't
think his patch as-is should be merged, as it makes the daemonized
behavior here worse.

> I'm not aware of a way to do this without
> breaking compatibility with the current (broken) race prevention.  If
> you're saying that breaking compatibility in that way is okay with
> you, then great!

It depends what you're trying to fix. If you're trying to fix auto-gc
outputting an annoying message (or repeatedly doing useless invocations
of pack-objects), you don't have to break compatibility at all. You just
have to put the objects into a pack instead of exploding them loose.
Worst case if you use an old implementation to run "git gc", it may
accidentally freshen a cruft pack's timestamp.

If you want to improve the raciness, then yes, I think you need stronger
rules around marking packs as cruft/garbage, and making operations that
want to use those objects copy them out to a non-cruft location (like
you describe in the hash transition document). There you risk corruption
if an old implementation fails to make the non-cruft copy. But that's
really no worse than we are today.

So I don't even really see it as a backwards-compatibility break. But
even if it were, I'd probably still be fine with it. This is a
local-repo thing, and in the absolute worst case we could bump
core.repositoryformatversion (though again, I don't even think that
would be necessary since it would degrade to behaving just as the
current code does).

> > With the current code, that produces a bunch of annoying warnings for
> > every commit ("I couldn't gc because the last gc reported a warning").
> > But after Jonathan's patch, every single commit will do a full gc of the
> > repository. In this tiny repository that's relatively quick, but in a
> > real repo it's a ton of CPU and I/O, all for nothing.
> 
> I see.  Do I understand correctly that if we find a way to print the
> warning but not error out, that would be good enough for you?

Yes. I thought that's what I proposed originally. ;)

The key thing is that the presence of the warning/log still suppress
the actual gc for the gc.logExpiry period.

> >> Have you looked over the discussion in "Loose objects and unreachable
> >> objects" in Documentation/technical/hash-function-transition.txt?  Do
> >> you have thoughts on it (preferrably in a separate thread)?
> >
> > It seems to propose putting the unreachable objects into a pack. Which
> > yes, I absolutely agree with (as I thought I'd been saying in every
> > single email in this thread).
> 
> I figured you were proposing something like
> https://public-inbox.org/git/20180113100734.ga30...@sigill.intra.peff.net/,
> which is still racy (because it does not handle the freshening in a safe
> way).

That message is just me telling somebody that their idea _won't_ work. ;)

But I assume you mean the general idea of putting things in a cruft
pack[1]. Yes, I was only suggesting going that far as a solution to the
auto-gc woes. Solving races is another issue entirely (though obviously
it may make sense to build on the single-pack work, if it exists).

[1] The best message discussion on that is I think:

  
https://public-inbox.org/git/20170610080626.sjujpmgkli4mu...@sigill.intra.peff.net/

which I think I linked earlier, so possibly that is even what you
meant. :)

> What decides it for me is that the user did not invoke "git gc --auto"
> explicitly, so anything "git gc --auto" prints is tangential to what
> the user was trying to do.  If the gc failed, that is worth telling
> them, but if e.g. it encountered a disk I/O error reading and
> succeeded on retry (to make up a fake example), then that's likely
> worth logging to syslog but it's not something the user asked to be
> directly informed about.

I'm not sure I agree. If a repack discovered that you had a bit
corruption on disk, but you happened to have another copy of the object
available that made the repack succeed, I'd like to know that I'm
getting bit corruptions, and earlier is better. I think we need to
surface that information to the user eventually, and I don't think we
can count on syslog being universally available.

By definition we're daemonized here, so we can't count on even having
access to the user's terminal. But it would make more sense to me if the
logic were:

  - at the end of a "gc --auto" run, gc should write a machine-readable
indication of its exit code (either as a final line in the log file,
or to a gc.exitcode file). The warnings/errors remain intermingled
in the logfile.

  - on 

Re: [PATCH 2/2] fsck: downgrade gitmodulesParse default to "info"

2018-07-16 Thread Junio C Hamano
Jeff King  writes:

> I think you missed it. In the paragraph above the one you
> quoted, I said:
>
>The cgit repository, for example, has a file named
>.gitmodules from a pre-submodule attempt at sharing code,
>but does not actually have any gitlinks.

Indeed.

> So I'm curious if you found the argument in my commit
> message compelling. :)
> ...
>  - I think your main concern was that there be a way for the
>user to loosen/tighten, which there is.

Yeah, I think the solution looks good.


Re: [PATCH 00/17] object_id part 14

2018-07-16 Thread Junio C Hamano
Michael Haggerty  writes:

> The magic 100 blames back to our chief magician, Junio:
>
> 8ac65937d0 Make sure we do not write bogus reflog entries. (2007-01-26)

Yup, guilty as charged.

cf. 

"%s %s %s\n" with old and new commit object name and the message
will be "2 * len(hash_in_hex) + 4" bytes long (counting the three
whitespaces and the terminating NUL), and Shawn's original in
6de08ae6 ("Log ref updates to logs/refs/", 2006-05-17) actually
computed this one as "strlen(...) + 2*40+4".

100 was merely me being sloppier than Shawn at 8ac65937 ("Make sure
we do not write bogus reflog entries.", 2007-01-26), preferring
being sufficient over not wasting even a single byte.


Re: [PATCH] gc: do not warn about too many loose objects

2018-07-16 Thread Jonathan Nieder
Jeff King wrote:
> On Mon, Jul 16, 2018 at 01:21:40PM -0700, Elijah Newren wrote:
>> Jonathan Nieder wrote:

>>> My understanding is that exploding the objects is intentional behavior,
>>> to avoid a race where objects are newly referenced while they are being
>>> pruned.
[...]
>> Ah, that's good to know and at least makes sense.  It seems somewhat
>> odd, though; loose objects that are two weeks old are just as
>> susceptible to being referenced anew by new commits, so the default of
>> running 'git prune --expire=2.weeks.ago' as gc currently does would
>> also be unsafe, wouldn't it?  Why is that any more or less unsafe than
>> pruning objects only referenced by reflog entries that are more than
>> 90 days old?

Part of the answer is that this safety feature applies even when
reflogs are not in use.  Another part is that as you say, you can
start referencing an object at the same time as the reflog entry
pointing to it is expiring.

[...]
>That's why we retain non-fresh
> objects that are referenced from fresh ones (so as long as you made the
> new commit recently, it transitively infers freshness on the old blob),
> and why we fresh mtimes when we elide a write for an existing object.
>
> That's _still_ not race-proof, because none of these operations is
> atomic.

Indeed.  One of the advantages of using a packfile for unreachable
objects is that metadata associated with that packfile can be updated
atomically.

I don't believe we are very good at transitively propagating freshness
today, by the way.

Thanks,
Jonathan


Re: [PATCH v2 4/4] ref-filter: use oid_object_info() to get object

2018-07-16 Thread Junio C Hamano
Olga Telezhnaya  writes:

> -static int get_object(struct ref_array_item *ref, const struct object_id 
> *oid,
> -   int deref, struct object **obj, struct strbuf *err)
> +static int get_object(struct ref_array_item *ref, int deref, struct object 
> **obj,
> +   struct expand_data *oi, struct strbuf *err)
>  {
> - int eaten;
> - int ret = 0;
> - unsigned long size;
> - enum object_type type;
> - void *buf = read_object_file(oid, , );
> - if (!buf)
> - ret = strbuf_addf_ret(err, -1, _("missing object %s for %s"),
> -   oid_to_hex(oid), ref->refname);
> - else {
> - *obj = parse_object_buffer(oid, type, size, buf, );
> - if (!*obj)
> - ret = strbuf_addf_ret(err, -1, _("parse_object_buffer 
> failed on %s for %s"),
> -   oid_to_hex(oid), ref->refname);
> - else
> - grab_values(ref->value, deref, *obj, buf, size);
> + /* parse_object_buffer() will set eaten to 0 if free() will be needed */
> + int eaten = 1;

Hmph, doesn't this belong to the previous step?  In other words,
isn't the result of applying 1-3/4 has a bug that can leave eaten
uninitialized (and base decision to free(buf) later on it), and
isn't this change a fix for it?

> + if (oi->info.contentp) {
> + /* We need to know that to use parse_object_buffer properly */
> + oi->info.sizep = >size;
> + oi->info.typep = >type;
>   }
> + if (oid_object_info_extended(the_repository, >oid, >info,
> +  OBJECT_INFO_LOOKUP_REPLACE))
> + return strbuf_addf_ret(err, -1, _("missing object %s for %s"),
> +oid_to_hex(>oid), ref->refname);
> +
> + if (oi->info.contentp) {
> + *obj = parse_object_buffer(>oid, oi->type, oi->size, 
> oi->content, );
> + if (!obj) {
> + if (!eaten)
> + free(oi->content);
> + return strbuf_addf_ret(err, -1, _("parse_object_buffer 
> failed on %s for %s"),
> +oid_to_hex(>oid), 
> ref->refname);
> + }
> + grab_values(ref->value, deref, *obj, oi->content, oi->size);
> + }
> +
> + grab_common_values(ref->value, deref, oi);
>   if (!eaten)
> - free(buf);
> - return ret;
> + free(oi->content);
> + return 0;
>  }



Re: [PATCH v3 5/7] gpg-interface: introduce new config to select per gpg format program

2018-07-16 Thread Junio C Hamano
Henning Schild  writes:

> +gpg..program::
> + Use this to customize the program used for the signing format you
> + chose. (see gpg.program) gpg.openpgp.program is a synonym for the
> + legacy gpg.program.

I _think_ you meant "see gpg.format", but I am not 100% sure.

When X is a synonym for Y, Y is also a synonym for X, so technically
speaking this does not matter, but when we talk about backward
compatibility fallback, I think we say "OLDway is retained as a
legacy synonym for NEWway", i.e. the other way around.

Also, `typeset in tt` what end-users would type literally, like
configuration variable names, i.e.

Use this to customize the rpogram used for the signing
format you chose (see `gpg.format`).  `gpg.program` can
still be used as a legacy synonym for `gpg.openpgp.program`.

>  gui.commitMsgWidth::
>   Defines how wide the commit message window is in the
>   linkgit:git-gui[1]. "75" is the default.
> diff --git a/gpg-interface.c b/gpg-interface.c
> index 93bd0fb32..f3c22b551 100644
> --- a/gpg-interface.c
> +++ b/gpg-interface.c
> @@ -182,7 +182,7 @@ int git_gpg_config(const char *var, const char *value, 
> void *cb)
>   return 0;
>   }
>  
> - if (!strcmp(var, "gpg.program"))
> + if (!strcmp(var, "gpg.program") || !strcmp(var, "gpg.openpgp.program"))
>   fmtname = "openpgp";
>  
>   if (fmtname) {


Re: [PATCH v3 04/11] rerere: mark strings for translation

2018-07-16 Thread Thomas Gummerer
On 07/15, Simon Ruderich wrote:
> On Sat, Jul 14, 2018 at 10:44:36PM +0100, Thomas Gummerer wrote:
> > 'git rerere' is considered a plumbing command and as such its output
> 
> s/plumbing/porcelain/?

Ah yes indeed.  Thanks for catching!

> Regards
> Simon
> -- 
> + privacy is necessary
> + using gnupg http://gnupg.org
> + public key id: 0x92FEFDB7E44C32F9


Re: [PATCH v3 3/7] gpg-interface: introduce an abstraction for multiple gpg formats

2018-07-16 Thread Junio C Hamano
Henning Schild  writes:

> Create a struct that holds the format details for the supported formats.
> At the moment that is still just "openpgp". This commit prepares for the
> introduction of more formats, that might use other programs and match
> other signatures.
>
> Signed-off-by: Henning Schild 
> ---
>  gpg-interface.c | 84 
> ++---
>  1 file changed, 63 insertions(+), 21 deletions(-)
>
> diff --git a/gpg-interface.c b/gpg-interface.c
> index 960c40086..699651fd9 100644
> --- a/gpg-interface.c
> +++ b/gpg-interface.c
> @@ -7,11 +7,46 @@
>  #include "tempfile.h"
>  
>  static char *configured_signing_key;
> -static const char *gpg_format = "openpgp";
> -static const char *gpg_program = "gpg";
> +struct gpg_format {
> + const char *name;
> + const char *program;
> + const char **extra_args_verify;
> + const char **sigs;
> +};
> +
> +static const char *openpgp_verify_args[] = { "--keyid-format=long", NULL };

Let's write it like this, even if the current line is short enough:

static const char *openpgp_verify_args[] = {
"--keyid-format=long",
NULL
};

Ditto for the next one.  Even if you do not expect these two arrays
to get longer, people tend to copy & paste to imitate existing code
when making new things, and we definitely would not want them to be
doing so on a code like the above (or below).  When they need to add
new elements to their arrays, having the terminating NULL on its own
line means they will have to see less patch noise.

> +static const char *openpgp_sigs[] = {
> + "-BEGIN PGP SIGNATURE-",
> + "-BEGIN PGP MESSAGE-", NULL };
> +
> +static struct gpg_format gpg_formats[] = {
> + { .name = "openpgp", .program = "gpg",
> +   .extra_args_verify = openpgp_verify_args,

If the underlying aray is "verify_args" (not "extra"), perhaps the
field name should also be .verify_args to match.

Giving an array of things a name "things[]" is a disease, unless the
array is very often passed around as a whole to represent the
collection of everything.  When the array is mostly accessed one
element at a time, being able to say thing[3] to mean the third
thing is much more pleasant.

So, from that point of view, I pretty much agree with
openpgp_verify_args[] and openpgp_sigs[], but am against
gpg_formats[].  The last one should be gpg_format[], for which we
happen to have only one variant right now.

> +   .sigs = openpgp_sigs
> + },
> +};
> +static struct gpg_format *current_format = _formats[0];

Have a blank line before the above one.

What does "current_format" mean?  Is it different from "format to be
used"?  As we will overwrite the variable upon reading the config,
we cannot afford to call it default_format, but somehow "current"
feels misleading, at least to me.  I expected to find "old format"
elsewhere and there may be some format conversion or something from
the variable name.

static struct gpg_format *use_format = _format[0];

perhaps?

> +
> +static struct gpg_format *get_format_by_name(const char *str)
> +{
> + int i;
> +
> + for (i = 0; i < ARRAY_SIZE(gpg_formats); i++)
> + if (!strcasecmp(gpg_formats[i].name, str))

As [1/7], this would become strcmp(), I presume?

> + return gpg_formats + i;
> + return NULL;
> +}
>  
> -#define PGP_SIGNATURE "-BEGIN PGP SIGNATURE-"
> -#define PGP_MESSAGE "-BEGIN PGP MESSAGE-"
> +static struct gpg_format *get_format_by_sig(const char *sig)
> +{
> + int i, j;
> +
> + for (i = 0; i < ARRAY_SIZE(gpg_formats); i++)
> + for (j = 0; gpg_formats[i].sigs[j]; j++)
> + if (starts_with(sig, gpg_formats[i].sigs[j]))
> + return gpg_formats + i;
> + return NULL;
> +}

OK.

> @@ -140,18 +172,23 @@ int git_gpg_config(const char *var, const char *value, 
> void *cb)
>   }
>  
>   if (!strcmp(var, "gpg.format")) {
> - if (value && strcasecmp(value, "openpgp"))
> - return error("malformed value for %s: %s", var, value);
> - return git_config_string(_format, var, value);
> - }
> -
> - if (!strcmp(var, "gpg.program")) {
>   if (!value)
>   return config_error_nonbool(var);
> - gpg_program = xstrdup(value);
> + fmt = get_format_by_name(value);
> + if (!fmt)
> + return error("malformed value for %s: %s", var, value);

If I say "opongpg", that probably is not "malformed" (it is all
lowercase ascii alphabet that is very plausible-looking string), but
simply "bad value".

But other than the minor nit, yes, this structure is what I expected
to see from the very first step 1/7.

> + current_format = fmt;
>   return 0;
>   }

>  
> + if (!strcmp(var, "gpg.program"))
> + fmtname = "openpgp";

OK, this is a backward compatibility measure to 

Re: [PATCH v3 4/7] gpg-interface: do not hardcode the key string len anymore

2018-07-16 Thread Junio C Hamano
Henning Schild  writes:

> gnupg does print the keyid followed by a space and the signer comes
> next. The same pattern is also used in gpgsm, but there the key length
> would be 40 instead of 16. Instead of hardcoding the expected length,
> find the first space and calculate it.
> Input that does not match the expected format will be ignored now,
> before we jumped to found+17 which might have been behind the end of an
> unexpected string.
>
> Signed-off-by: Henning Schild 
> ---
>  gpg-interface.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)

Very nice.


> diff --git a/gpg-interface.c b/gpg-interface.c
> index 699651fd9..93bd0fb32 100644
> --- a/gpg-interface.c
> +++ b/gpg-interface.c
> @@ -89,10 +89,11 @@ static void parse_gpg_output(struct signature_check *sigc)
>   sigc->result = sigcheck_gpg_status[i].result;
>   /* The trust messages are not followed by key/signer 
> information */
>   if (sigc->result != 'U') {
> - sigc->key = xmemdupz(found, 16);
> + next = strchrnul(found, ' ');
> + sigc->key = xmemdupz(found, next - found);
>   /* The ERRSIG message is not followed by signer 
> information */
> - if (sigc-> result != 'E') {
> - found += 17;
> + if (*next && sigc-> result != 'E') {
> + found = next + 1;
>   next = strchrnul(found, '\n');
>   sigc->signer = xmemdupz(found, next - found);
>   }


Re: [PATCH] gc: do not warn about too many loose objects

2018-07-16 Thread Jeff King
On Mon, Jul 16, 2018 at 01:16:45PM -0700, Elijah Newren wrote:

> >> The basic problem here, at least for us, is that gc has enough
> >> information to know it could expunge some objects, but because of how
> >> it is structured in terms of several substeps (reflog expiration,
> >> repack, prune), the information is lost between the steps and it
> >> instead writes them out as unreachable objects.  If we could prune (or
> >> avoid exploding) loose objects that are only reachable from reflog
> >> entries that we are expiring, then the problem goes away for us.  (I
> >> totally understand that other repos may have enough unreachable
> >> objects for other reasons that Peff's suggestion to just pack up
> >> unreachable objects is still a really good idea.  But on its own, it
> >> seems like a waste since it's packing stuff that we know we could just
> >> expunge.)
> >
> > No, we should have expunged everything that could be during the "repack"
> > and "prune" steps. We feed the expiration time to repack, so that it
> > knows to drop objects entirely instead of exploding them loose.
> 
> Um, except it doesn't actually do that.  The testcase I provided shows
> that it leaves around 1 objects that are totally deletable and
> were only previously referenced by reflog entries -- entries that gc
> removed without removing the corresponding objects.

What's your definition of "totally deletable"?

If the pack they were in is less than 2 weeks old, then they are
unreachable but "fresh", and are intentionally retained. If it was older
than 2 weeks, they should have been deleted.

If you don't like that policy, you can set gc.pruneExpire to something
lower (including "now", but watch out, as that can race with other
operations!). But I don't think that's an implementation short-coming.
It's intentional to keep those objects. It's just that the format
they're in is inefficient and confuses later gc runs.

-Peff


Re: [PATCH] gc: do not warn about too many loose objects

2018-07-16 Thread Jonathan Nieder
Jeff King wrote:
> On Mon, Jul 16, 2018 at 12:54:31PM -0700, Jonathan Nieder wrote:

>> Even restricting attention to the warning, not the exit code, I think
>> you are saying the current behavior is acceptable.  I do not believe
>> it is.
>
> I didn't say that at all. The current situation sucks,

Thanks for clarifying.  That helps.

>and I think the
> right solution is to pack the unreachable objects instead of exploding
> them.

That seems like a huge widening in scope relative to what this
original patch tackled.  I'm not aware of a way to do this without
breaking compatibility with the current (broken) race prevention.  If
you're saying that breaking compatibility in that way is okay with
you, then great!

[...]
>> I think you are saying Jonathan's patch makes the behavior
>> worse, and I'm not seeing it.  Can you describe an example user
>> interaction where the current behavior would be better than the
>> behavior after Jonathan's patch?  That might help make this discussion
>> more concrete.
>
> It makes it worse because there is nothing to throttle a "gc --auto"
> that is making no progress.
[...]
> With the current code, that produces a bunch of annoying warnings for
> every commit ("I couldn't gc because the last gc reported a warning").
> But after Jonathan's patch, every single commit will do a full gc of the
> repository. In this tiny repository that's relatively quick, but in a
> real repo it's a ton of CPU and I/O, all for nothing.

I see.  Do I understand correctly that if we find a way to print the
warning but not error out, that would be good enough for you?

[...]
>> Have you looked over the discussion in "Loose objects and unreachable
>> objects" in Documentation/technical/hash-function-transition.txt?  Do
>> you have thoughts on it (preferrably in a separate thread)?
>
> It seems to propose putting the unreachable objects into a pack. Which
> yes, I absolutely agree with (as I thought I'd been saying in every
> single email in this thread).

I figured you were proposing something like
https://public-inbox.org/git/20180113100734.ga30...@sigill.intra.peff.net/,
which is still racy (because it does not handle the freshening in a safe
way).

[...]
>   Even if we were going to remove this message to help the
> daemonized case, I think we'd want to retain it for the non-daemon case.

Interesting.  That should be doable, e.g. following the approach
described below.

[...]
>> A simple way to do that without changing formats is to truncate the
>> file when exiting with status 0.
>
> That's a different behavior than what we do now (and what was suggested
> earlier), in that it assumes that anything written to stderr is OK for
> gc to hide from the user if the process exits with code zero.
>
> That's probably OK in most cases, though I wonder if there are corner
> cases. For example, if you have a corrupt ref, we used to say "error:
> refs/heads/foo does not point to a valid object!" but otherwise ignore
> it. These days git-gc sets REF_PARANOIA=1, so we'll actually barf on a
> corrupt ref. But I wonder if there are other cases lurking.

What decides it for me is that the user did not invoke "git gc --auto"
explicitly, so anything "git gc --auto" prints is tangential to what
the user was trying to do.  If the gc failed, that is worth telling
them, but if e.g. it encountered a disk I/O error reading and
succeeded on retry (to make up a fake example), then that's likely
worth logging to syslog but it's not something the user asked to be
directly informed about.

Thanks,
Jonathan


Re: [PATCH] gc: do not warn about too many loose objects

2018-07-16 Thread Jeff King
On Mon, Jul 16, 2018 at 01:21:40PM -0700, Elijah Newren wrote:

> > My understanding is that exploding the objects is intentional behavior,
> > to avoid a race where objects are newly referenced while they are being
> > pruned.
> >
> > I am not a fan of that behavior.  It's still racy.  But when we've
> > brought it up in the past, the consensus seems to have been that it's
> > better than nothing.  Documentation/technical/hash-function-transition.txt
> > section "Loose objects and unreachable objects" describes a way to
> > eliminate the race.
> 
> Ah, that's good to know and at least makes sense.  It seems somewhat
> odd, though; loose objects that are two weeks old are just as
> susceptible to being referenced anew by new commits, so the default of
> running 'git prune --expire=2.weeks.ago' as gc currently does would
> also be unsafe, wouldn't it?  Why is that any more or less unsafe than
> pruning objects only referenced by reflog entries that are more than
> 90 days old?

The 2-week safety isn't primarily about things which just became
unreferenced.  It's about things which are in the act of being
referenced.

Imagine a "git commit" racing with a "git prune". The commit has to
create an object, and then it will update a ref to point to it. But
between those two actions, prune may racily delete the object!
The mtime grace period is what makes that work.

Using 2 weeks is sort of ridiculous for that. But it also helps with
manual recovery (e.g., imagine a blob added to the index but never
committed; 3 days later you may want to try to recover your old work).

And you're correct that a new git-commit may still reference an old
object (e.g., a blob that's 5 seconds shy of being 2 weeks old that
you're including in a new commit). That's why we retain non-fresh
objects that are referenced from fresh ones (so as long as you made the
new commit recently, it transitively infers freshness on the old blob),
and why we fresh mtimes when we elide a write for an existing object.

That's _still_ not race-proof, because none of these operations is
atomic. git-prune can decide the blob is unfresh at the exact moment
you're creating the commit object.

-Peff


Re: [PATCH] gc: do not warn about too many loose objects

2018-07-16 Thread Jeff King
On Mon, Jul 16, 2018 at 12:54:31PM -0700, Jonathan Nieder wrote:

> Even restricting attention to the warning, not the exit code, I think
> you are saying the current behavior is acceptable.  I do not believe
> it is.

I didn't say that at all. The current situation sucks, and I think the
right solution is to pack the unreachable objects instead of exploding
them.

> I think you are saying Jonathan's patch makes the behavior
> worse, and I'm not seeing it.  Can you describe an example user
> interaction where the current behavior would be better than the
> behavior after Jonathan's patch?  That might help make this discussion
> more concrete.

It makes it worse because there is nothing to throttle a "gc --auto"
that is making no progress.

Try this:

-- >8 --
#!/bin/sh

rm -rf repo

# new mostly-empty repo
git init repo
cd repo
git commit --allow-empty -m base

# make a bunch of unreachable blobs
for i in $(seq 7000); do
  echo "blob"
  echo "data < [...]
> > See the thread I linked earlier on putting unreachable objects into a
> > pack, which I think is the real solution.
> 
> Have you looked over the discussion in "Loose objects and unreachable
> objects" in Documentation/technical/hash-function-transition.txt?  Do
> you have thoughts on it (preferrably in a separate thread)?

It seems to propose putting the unreachable objects into a pack. Which
yes, I absolutely agree with (as I thought I'd been saying in every
single email in this thread).

> > I mean that if you do not write a persistent log, then "gc --auto" will
> > do an unproductive gc every time it is invoked. I.e., it will see "oh,
> > there are too many loose objects", and then waste a bunch of CPU every
> > time you commit.
> 
> If so, then this would already be the behavior in non daemon mode.
> Are you saying this accidental effect of daemon mode is in fact
> intentional?

I'm not sure if I'd call it intentional, since I don't recall ever
seeing this side effect discussed. But since daemonizing is the default,
I think that's the case people have focused on when they hit annoying
cases. E.g., a831c06a2b (gc: ignore old gc.log files, 2017-02-10).

I'd consider the fact that "gc --auto" with gc.autoDetach=false will
repeatedly do useless work to be a minor bug. But I think Jonathan's
patch makes it even worse because we do not even tell people that
useless work is being done (while making them wait for each gc to
finish!). Even if we were going to remove this message to help the
daemonized case, I think we'd want to retain it for the non-daemon case.

> > A daemonized git-gc runs a bunch of sub-commands (e.g., "git prune")
> > with their stderr redirected into the logfile. If you want to have
> > warnings go somewhere else, then either:
> >
> >   - you need some way to tell those sub-commands to send the warnings
> > elsewhere (i.e., _not_ stderr)
> >
> > or
> >
> >   - you have to post-process the output they send to separate warnings
> > from other errors. Right now that means scraping. If you are
> > proposing a system of machine-readable output, it would need to work
> > not just for git-gc, but for every sub-command it runs.
> 
>   or
> 
> - you can simply record and trust the exit code
> 
> A simple way to do that without changing formats is to truncate the
> file when exiting with status 0.

That's a different behavior than what we do now (and what was suggested
earlier), in that it assumes that anything written to stderr is OK for
gc to hide from the user if the process exits with code zero.

That's probably OK in most cases, though I wonder if there are corner
cases. For example, if you have a corrupt ref, we used to say "error:
refs/heads/foo does not point to a valid object!" but otherwise ignore
it. These days git-gc sets REF_PARANOIA=1, so we'll actually barf on a
corrupt ref. But I wonder if there are other cases lurking.

-Peff


Re: [PATCH] gc: do not warn about too many loose objects

2018-07-16 Thread Elijah Newren
Hi,

On Mon, Jul 16, 2018 at 12:19 PM, Jonathan Nieder  wrote:
> Elijah Newren wrote:
>
>> The basic problem here, at least for us, is that gc has enough
>> information to know it could expunge some objects, but because of how
>> it is structured in terms of several substeps (reflog expiration,
>> repack, prune), the information is lost between the steps and it
>> instead writes them out as unreachable objects.  If we could prune (or
>> avoid exploding) loose objects that are only reachable from reflog
>> entries that we are expiring, then the problem goes away for us.
>
> My understanding is that exploding the objects is intentional behavior,
> to avoid a race where objects are newly referenced while they are being
> pruned.
>
> I am not a fan of that behavior.  It's still racy.  But when we've
> brought it up in the past, the consensus seems to have been that it's
> better than nothing.  Documentation/technical/hash-function-transition.txt
> section "Loose objects and unreachable objects" describes a way to
> eliminate the race.

Ah, that's good to know and at least makes sense.  It seems somewhat
odd, though; loose objects that are two weeks old are just as
susceptible to being referenced anew by new commits, so the default of
running 'git prune --expire=2.weeks.ago' as gc currently does would
also be unsafe, wouldn't it?  Why is that any more or less unsafe than
pruning objects only referenced by reflog entries that are more than
90 days old?

Elijah


Re: [PATCH] gc: do not warn about too many loose objects

2018-07-16 Thread Elijah Newren
On Mon, Jul 16, 2018 at 12:52 PM, Jeff King  wrote:
> On Mon, Jul 16, 2018 at 12:15:05PM -0700, Elijah Newren wrote:
>
>> The basic problem here, at least for us, is that gc has enough
>> information to know it could expunge some objects, but because of how
>> it is structured in terms of several substeps (reflog expiration,
>> repack, prune), the information is lost between the steps and it
>> instead writes them out as unreachable objects.  If we could prune (or
>> avoid exploding) loose objects that are only reachable from reflog
>> entries that we are expiring, then the problem goes away for us.  (I
>> totally understand that other repos may have enough unreachable
>> objects for other reasons that Peff's suggestion to just pack up
>> unreachable objects is still a really good idea.  But on its own, it
>> seems like a waste since it's packing stuff that we know we could just
>> expunge.)
>
> No, we should have expunged everything that could be during the "repack"
> and "prune" steps. We feed the expiration time to repack, so that it
> knows to drop objects entirely instead of exploding them loose.

Um, except it doesn't actually do that.  The testcase I provided shows
that it leaves around 1 objects that are totally deletable and
were only previously referenced by reflog entries -- entries that gc
removed without removing the corresponding objects.


I will note that my testcase was slightly out-of-date; with current
git it needs a call to 'wait_for_background_gc_to_finish' right before
the 'git gc --quiet' to avoid erroring out.

> You
> could literally just do:
>
>   find .git/objects/?? -type f |
>   perl -lne 's{../.{38}$} and print "$1$2"' |
>   git pack-objects .git/objects/pack/cruft-pack
>
> But:
>
>   - that will explode them out only to repack them, which is inefficient
> (if they're already packed, you can probably reuse deltas, not to
> mention the I/O savings)
>
>   - there's the question of how to handle timestamps. Some of those
> objects may have been _about_ to expire, but now you've just put
> them in a brand-new pack that adds another 2 weeks to their life
>
>   - the find above is sloppy, and will race with somebody adding new
> objects to the repo
>
> So probably you want to have pack-objects write out the list of objects
> it _would_ explode, rather than exploding them. And then before
> git-repack deletes the old packs, put those into a new cruft pack. That
> _just_ leaves the timestamp issue (which is discussed at length in the
> thread I linked earlier).
>
>> git_actual_garbage_collect() {
>> GITDIR=$(git rev-parse --git-dir)
>>
>> # Record all revisions stored in reflog before and after gc
>> git rev-list --no-walk --reflog >$GITDIR/gc.original-refs
>> git gc --auto
>> wait_for_background_gc_to_finish
>> git rev-list --no-walk --reflog >$GITDIR/gc.final-refs
>>
>> # Find out which reflog entries were removed
>> DELETED_REFS=$(comm -23 <(sort $GITDIR/gc.original-refs) <(sort 
>> $GITDIR/gc.final-refs))
>
> This is too detailed, I think. There are other reasons to have
> unreachable objects than expired reflogs. I think you really just want
> to consider all unreachable objects (like the pack-objects thing I
> mentioned above).

Yes, like I said, coarse workaround and I never had time to create a
real fix.  But I thought the testcase might be useful as a
demonstration of how git gc leaves around loose objects that were
previously reference by reflogs that gc itself pruned.


Re: [PATCH v3 2/7] t/t7510: check the validation of the new config gpg.format

2018-07-16 Thread Junio C Hamano
Henning Schild  writes:

> Test setting gpg.format to both invalid and valid values.
>
> Signed-off-by: Henning Schild 
> ---
>  t/t7510-signed-commit.sh | 9 +
>  1 file changed, 9 insertions(+)
>
> diff --git a/t/t7510-signed-commit.sh b/t/t7510-signed-commit.sh
> index 6e2015ed9..1b6a1dd90 100755
> --- a/t/t7510-signed-commit.sh
> +++ b/t/t7510-signed-commit.sh
> @@ -227,4 +227,13 @@ test_expect_success GPG 'log.showsignature behaves like 
> --show-signature' '
>   grep "gpg: Good signature" actual
>  '
>  
> +test_expect_success GPG 'check config gpg.format values' '
> + test_config gpg.format openpgp &&
> + git commit -S --amend -m "success" &&
> + test_config gpg.format OpEnPgP &&
> + git commit -S --amend -m "success" &&
> + test_config gpg.format malformed &&
> + test_must_fail git commit -S --amend -m "fail"
> +'
> +
>  test_done

Since we'd be doing case-sensitive value for gpg.format, this would
have to change in a later iteration, I guess.


Re: [PATCH v3 1/7] gpg-interface: add new config to select how to sign a commit

2018-07-16 Thread Junio C Hamano
Henning Schild  writes:

> Add "gpg.format" where the user can specify which type of signature to
> use for commits. At the moment only "openpgp" is supported and the value is
> not even used. This commit prepares for a new types of signatures.
>
> Signed-off-by: Henning Schild 
> ---
>  Documentation/config.txt | 4 
>  gpg-interface.c  | 7 +++
>  2 files changed, 11 insertions(+)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 1cc18a828..ac373e3f4 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -1828,6 +1828,10 @@ gpg.program::
>   signed, and the program is expected to send the result to its
>   standard output.
>  
> +gpg.format::
> + Specifies which key format to use when signing with `--gpg-sign`.
> + Default is "openpgp", that is also the only supported value.
> +
>  gui.commitMsgWidth::
>   Defines how wide the commit message window is in the
>   linkgit:git-gui[1]. "75" is the default.
> diff --git a/gpg-interface.c b/gpg-interface.c
> index 09ddfbc26..960c40086 100644
> --- a/gpg-interface.c
> +++ b/gpg-interface.c
> @@ -7,6 +7,7 @@
>  #include "tempfile.h"
>  
>  static char *configured_signing_key;
> +static const char *gpg_format = "openpgp";
>  static const char *gpg_program = "gpg";
>  
>  #define PGP_SIGNATURE "-BEGIN PGP SIGNATURE-"
> @@ -138,6 +139,12 @@ int git_gpg_config(const char *var, const char *value, 
> void *cb)
>   return 0;
>   }
>  
> + if (!strcmp(var, "gpg.format")) {
> + if (value && strcasecmp(value, "openpgp"))
> + return error("malformed value for %s: %s", var, value);
> + return git_config_string(_format, var, value);

I may be mistaken (sorry, I read so many topics X-<) but I think the
consensus was to accept only "openpgp" so that we can ensure that

[gpg "openpgp"] program = foo

etc. can be parsed more naturally without having to manually special
case the subsection name to be case insensitive.  In other words,
strcasecmp() should just be strcmp().  Otherwise, people would get a
wrong expectation that

[gpg] format = OpenPGP
[gpg "openpgp"] program = foo

should refer to the same and single OpenPGP, but that would violate
the usual configuration syntax rules.

The structure of checking the error looks quite suboptimal even when
we initially support the single "openpgp" at this step.  I would
have expected you to be writing the above like so:

if (!value)
return config_error_nonbool(var);
if (strcmp(value, "openpgp"))
return error("unsupported value for %s: %s", var, value);
return git_config_string(...);

That would make it more clear that the variable is "nonbool", which
does not change across additional support for new formats in later
steps.

> + }
> +
>   if (!strcmp(var, "gpg.program")) {
>   if (!value)
>   return config_error_nonbool(var);


Re: [PATCH] gc: do not warn about too many loose objects

2018-07-16 Thread Jonathan Nieder
Jeff King wrote:
> On Mon, Jul 16, 2018 at 12:09:49PM -0700, Jonathan Nieder wrote:
>> Jeff King wrote:

>>> Er, the action is to run "git prune", like the warning says. :)
>>
>> I don't think we want to recommend that, especially when "git gc --auto"
>> does the right thing automatically.
>
> But that's the point. This warning is written literally after running
> "git gc --auto" _didn't_ do the right thing. Yes, it would be nicer if
> it could do the right thing. But it doesn't yet know how to.

I think we fundamentally disagree, and I would like your help getting
past this impasse.

Even restricting attention to the warning, not the exit code, I think
you are saying the current behavior is acceptable.  I do not believe
it is.  I think you are saying Jonathan's patch makes the behavior
worse, and I'm not seeing it.  Can you describe an example user
interaction where the current behavior would be better than the
behavior after Jonathan's patch?  That might help make this discussion
more concrete.

[...]
> See the thread I linked earlier on putting unreachable objects into a
> pack, which I think is the real solution.

Have you looked over the discussion in "Loose objects and unreachable
objects" in Documentation/technical/hash-function-transition.txt?  Do
you have thoughts on it (preferrably in a separate thread)?

[...]
>> This sounds awful.  It sounds to me like you're saying "git gc --auto"
>> is saying "I just did the wrong thing, and here is how you can clean
>> up after me".  That's not how I want a program to behave.
>
> Sure, it would be nice if it did the right thing. Nobody has written
> that yet. Until they do, we have to deal with the fallout.

"git gc --auto" is already doing the right thing.

[...]
> I mean that if you do not write a persistent log, then "gc --auto" will
> do an unproductive gc every time it is invoked. I.e., it will see "oh,
> there are too many loose objects", and then waste a bunch of CPU every
> time you commit.

If so, then this would already be the behavior in non daemon mode.
Are you saying this accidental effect of daemon mode is in fact
intentional?

[...]
>>> But in practice, I think you have to
>>> resort to scraping anyway, if you are interested in treating warnings
>>> from sub-processes the same way.
>>
>> Can you say more about this for me?  I am not understanding what
>> you're saying necessitates scraping the output.  I would strongly
>> prefer to avoid scraping the output.
>
> A daemonized git-gc runs a bunch of sub-commands (e.g., "git prune")
> with their stderr redirected into the logfile. If you want to have
> warnings go somewhere else, then either:
>
>   - you need some way to tell those sub-commands to send the warnings
> elsewhere (i.e., _not_ stderr)
>
> or
>
>   - you have to post-process the output they send to separate warnings
> from other errors. Right now that means scraping. If you are
> proposing a system of machine-readable output, it would need to work
> not just for git-gc, but for every sub-command it runs.

  or

- you can simply record and trust the exit code

A simple way to do that without changing formats is to truncate the
file when exiting with status 0.

Thanks,
Jonathan


Re: [PATCH] gc: do not warn about too many loose objects

2018-07-16 Thread Jeff King
On Mon, Jul 16, 2018 at 12:15:05PM -0700, Elijah Newren wrote:

> The basic problem here, at least for us, is that gc has enough
> information to know it could expunge some objects, but because of how
> it is structured in terms of several substeps (reflog expiration,
> repack, prune), the information is lost between the steps and it
> instead writes them out as unreachable objects.  If we could prune (or
> avoid exploding) loose objects that are only reachable from reflog
> entries that we are expiring, then the problem goes away for us.  (I
> totally understand that other repos may have enough unreachable
> objects for other reasons that Peff's suggestion to just pack up
> unreachable objects is still a really good idea.  But on its own, it
> seems like a waste since it's packing stuff that we know we could just
> expunge.)

No, we should have expunged everything that could be during the "repack"
and "prune" steps. We feed the expiration time to repack, so that it
knows to drop objects entirely instead of exploding them loose.

So the leftovers really are objects that cannot be deleted yet. You
could literally just do:

  find .git/objects/?? -type f |
  perl -lne 's{../.{38}$} and print "$1$2"' |
  git pack-objects .git/objects/pack/cruft-pack

But:

  - that will explode them out only to repack them, which is inefficient
(if they're already packed, you can probably reuse deltas, not to
mention the I/O savings)

  - there's the question of how to handle timestamps. Some of those
objects may have been _about_ to expire, but now you've just put
them in a brand-new pack that adds another 2 weeks to their life

  - the find above is sloppy, and will race with somebody adding new
objects to the repo

So probably you want to have pack-objects write out the list of objects
it _would_ explode, rather than exploding them. And then before
git-repack deletes the old packs, put those into a new cruft pack. That
_just_ leaves the timestamp issue (which is discussed at length in the
thread I linked earlier).

> git_actual_garbage_collect() {
> GITDIR=$(git rev-parse --git-dir)
> 
> # Record all revisions stored in reflog before and after gc
> git rev-list --no-walk --reflog >$GITDIR/gc.original-refs
> git gc --auto
> wait_for_background_gc_to_finish
> git rev-list --no-walk --reflog >$GITDIR/gc.final-refs
> 
> # Find out which reflog entries were removed
> DELETED_REFS=$(comm -23 <(sort $GITDIR/gc.original-refs) <(sort 
> $GITDIR/gc.final-refs))

This is too detailed, I think. There are other reasons to have
unreachable objects than expired reflogs. I think you really just want
to consider all unreachable objects (like the pack-objects thing I
mentioned above).

-Peff


Re: [PATCH] gc: do not warn about too many loose objects

2018-07-16 Thread Jeff King
On Mon, Jul 16, 2018 at 12:09:49PM -0700, Jonathan Nieder wrote:

> >>> So while I completely agree that it's not a great thing to encourage
> >>> users to blindly run "git prune", I think it _is_ actionable.
> >>
> >> To flesh this out a little more: what user action do you suggest?  Could
> >> we carry out that action automatically?
> >
> > Er, the action is to run "git prune", like the warning says. :)
> 
> I don't think we want to recommend that, especially when "git gc --auto"
> does the right thing automatically.

But that's the point. This warning is written literally after running
"git gc --auto" _didn't_ do the right thing. Yes, it would be nicer if
it could do the right thing. But it doesn't yet know how to.

See the thread I linked earlier on putting unreachable objects into a
pack, which I think is the real solution.

> > The warning that is deleted by this patch is: you _just_ ran gc, and hey
> > we even did it automatically for you, but we're still in a funky state
> > afterwards. You might need to clean up this state.
> 
> This sounds awful.  It sounds to me like you're saying "git gc --auto"
> is saying "I just did the wrong thing, and here is how you can clean
> up after me".  That's not how I want a program to behave.

Sure, it would be nice if it did the right thing. Nobody has written
that yet. Until they do, we have to deal with the fallout.

> > If you do that without anything further, then it will break the
> > protection against repeatedly running auto-gc, as I described in the
> > previous email.
> 
> Do you mean ratelimiting for the message, or do you actually mean
> repeatedly running auto-gc itself?
> 
> If we suppress warnings, there would still be a gc.log while "git gc
> --auto" is running, just as though there had been no warnings at all.
> I believe this is close to the intended behavior; it's the same as
> what you'd get without daemon mode, except that you lose the warning.

I mean that if you do not write a persistent log, then "gc --auto" will
do an unproductive gc every time it is invoked. I.e., it will see "oh,
there are too many loose objects", and then waste a bunch of CPU every
time you commit.

> > Any of those would work similarly to the "just detect warnings" I
> > suggested earlier, with respect to keeping the "1 day" expiration
> > intact, so I'd be OK with them. In theory they'd be more robust than
> > scraping the "warning:" prefix. But in practice, I think you have to
> > resort to scraping anyway, if you are interested in treating warnings
> > from sub-processes the same way.
> 
> Can you say more about this for me?  I am not understanding what
> you're saying necessitates scraping the output.  I would strongly
> prefer to avoid scraping the output.

A daemonized git-gc runs a bunch of sub-commands (e.g., "git prune")
with their stderr redirected into the logfile. If you want to have
warnings go somewhere else, then either:

  - you need some way to tell those sub-commands to send the warnings
elsewhere (i.e., _not_ stderr)

or

  - you have to post-process the output they send to separate warnings
from other errors. Right now that means scraping. If you are
proposing a system of machine-readable output, it would need to work
not just for git-gc, but for every sub-command it runs.

-Peff


Re: [PATCH 06/16] upload-pack: generalize commit date cutoff

2018-07-16 Thread Stefan Beller
On Mon, Jul 16, 2018 at 6:00 AM Derrick Stolee via GitGitGadget
 wrote:
>
> From: Derrick Stolee 
>
> The ok_to_give_up() method uses the commit date as a cutoff to avoid
> walking the entire reachble set of commits. Before moving the
> reachable() method to commit-reach.c, pull out the dependence on the
> global constant 'oldest_have' with a 'min_commit_date' parameter.


  'oldest_have' seems to be used in only one method after that
  (function got_oid); but as that function is called many times
  we either have to make it a function-global or pass around as a parameter,
  we'll defer that to later.

Code (of all previous patches and this one) look good!
Thanks,
Stefan


Re: [PATCH 4/6] builtin/submodule--helper: store update_clone information in a struct

2018-07-16 Thread Junio C Hamano
Stefan Beller  writes:

> The information that is printed for update_submodules in
> 'submodule--helper update-clone' and consumed by 'git submodule update'
> is stored as a string per submodule. This made sense at the time of
> 48308681b07 (git submodule update: have a dedicated helper for cloning,
> 2016-02-29), but as we want to migrate the rest of the submodule update
> into C, we're better off having access to the raw information in a helper
> struct.

The reasoning above makes sense, but I have a few niggles on the
naming.

 * Anything you'd place to keep track of the state is "information",
   so a whole "_information" suffix to the structure name does not
   add much value.  We've seen our fair share of (meaningless)
   "_data" suffix used in many places but I think the overly long
   "_information" that is equally meaningless trumps them.  I think
   we also have "_info", but if we are not going to have a beter
   name, let's not be original and stick to "_data" like other
   existing codepath.  An alternative with more meaningful name is
   of course better, though, but it is not all that much worth to
   spend too many braincycles on it.

 * Is the fact that these parameters necessary to help populating
   the submodule repository are sent on a line to external process
   the most important aspect of the submodule_lines[] array?  As
   this step is a preparation to migrate out of that line/text
   oriented IPC, I think line-ness is the least important and
   interesting thing to name the variable with.


If I were writing this patch, perhaps I'd do

struct update_clone_data *update_clone;
int update_clone_nr, update_clone_alloc;

following my gut, but since you've been thinking about submodule
longer than I have, perhaps you can come up with a better name.

> @@ -1463,8 +1469,9 @@ struct submodule_update_clone {
>   const char *recursive_prefix;
>   const char *prefix;
>  
> - /* Machine-readable status lines to be consumed by git-submodule.sh */
> - struct string_list projectlines;
> + /* to be consumed by git-submodule.sh */
> + struct submodule_update_clone_information *submodule_lines;
> + int submodule_lines_nr; int submodule_lines_alloc;
>  
>   /* If we want to stop as fast as possible and return an error */
>   unsigned quickstop : 1;
> @@ -1478,7 +1485,7 @@ struct submodule_update_clone {
>  #define SUBMODULE_UPDATE_CLONE_INIT {0, MODULE_LIST_INIT, 0, \
>   SUBMODULE_UPDATE_STRATEGY_INIT, 0, 0, -1, STRING_LIST_INIT_DUP, 0, \
>   NULL, NULL, NULL, \
> - STRING_LIST_INIT_DUP, 0, NULL, 0, 0}
> + NULL, 0, 0, 0, NULL, 0, 0, 0}

The structure definition and this macro definition are nearby, so it
is not crucial, but its probably not a bad idea to switch to C99
initializers for a thing like this to make it more readable, once
the code around this area stabilizes back again sufficiently (IOW,
let's not distract ourselves in the middle of adding a new feature
like this one).


Re: [PATCH] gc: do not warn about too many loose objects

2018-07-16 Thread Jonathan Nieder
Hi,

Elijah Newren wrote:

> The basic problem here, at least for us, is that gc has enough
> information to know it could expunge some objects, but because of how
> it is structured in terms of several substeps (reflog expiration,
> repack, prune), the information is lost between the steps and it
> instead writes them out as unreachable objects.  If we could prune (or
> avoid exploding) loose objects that are only reachable from reflog
> entries that we are expiring, then the problem goes away for us.

My understanding is that exploding the objects is intentional behavior,
to avoid a race where objects are newly referenced while they are being
pruned.

I am not a fan of that behavior.  It's still racy.  But when we've
brought it up in the past, the consensus seems to have been that it's
better than nothing.  Documentation/technical/hash-function-transition.txt
section "Loose objects and unreachable objects" describes a way to
eliminate the race.

Thanks and hope that helps,
Jonathan


Re: [PATCH] gc: do not warn about too many loose objects

2018-07-16 Thread Elijah Newren


On Mon, Jul 16, 2018 at 10:27 AM, Jonathan Tan  wrote:
> In a087cc9819 ("git-gc --auto: protect ourselves from accumulated
> cruft", 2007-09-17), the user was warned if there were too many
> unreachable loose objects. This made sense at the time, because gc
> couldn't prune them safely. But subsequently, git prune learned the
> ability to not prune recently created loose objects, making pruning able
> to be done more safely, and gc was made to automatically prune old
> unreachable loose objects in 25ee9731c1 ("gc: call "prune --expire
> 2.weeks.ago" by default", 2008-03-12).
...
>
> ---
> This was noticed when a daemonized gc run wrote this warning to the log
> file, and returned 0; but a subsequent run merely read the log file, saw
> that it is non-empty and returned -1 (which is inconsistent in that such
> a run should return 0, as it did the first time).

Yeah, we've hit this several times too.  I even created a testcase and a
workaround, though I never got it into proper submission form.

The basic problem here, at least for us, is that gc has enough
information to know it could expunge some objects, but because of how
it is structured in terms of several substeps (reflog expiration,
repack, prune), the information is lost between the steps and it
instead writes them out as unreachable objects.  If we could prune (or
avoid exploding) loose objects that are only reachable from reflog
entries that we are expiring, then the problem goes away for us.  (I
totally understand that other repos may have enough unreachable
objects for other reasons that Peff's suggestion to just pack up
unreachable objects is still a really good idea.  But on its own, it
seems like a waste since it's packing stuff that we know we could just
expunge.)

Anyway, my very rough testcase is below.  The workaround is the
git_actual_garbage_collect() function (minus the call to
wait_for_background_gc_to_finish).

Elijah

---


wait_for_background_gc_to_finish() {
while ( ps -ef | grep -v grep | grep --quiet git.gc.--auto ); do
sleep 1;
done
}

git_standard_garbage_collect() {
# Current git gc sprays unreachable objects back in loose form; this is
# fine in many cases, but is annoying when done with objects which
# newly become unreachable because of something else git-gc does and
# git-gc doesn't clean them up.
git gc --auto
wait_for_background_gc_to_finish
}

git_actual_garbage_collect() {
GITDIR=$(git rev-parse --git-dir)

# Record all revisions stored in reflog before and after gc
git rev-list --no-walk --reflog >$GITDIR/gc.original-refs
git gc --auto
wait_for_background_gc_to_finish
git rev-list --no-walk --reflog >$GITDIR/gc.final-refs

# Find out which reflog entries were removed
DELETED_REFS=$(comm -23 <(sort $GITDIR/gc.original-refs) <(sort 
$GITDIR/gc.final-refs))

# Get the list of objects which used to be reachable, but were made
# unreachable due to gc's reflog expiration.  To get these, I need
# the intersection of things reachable from $DELETED_REFS and things
# which are unreachable now.
git rev-list --objects $DELETED_REFS --not --all --reflog | awk '{print 
$1}' >$GITDIR/gc.previously-reachable-objects
git prune --expire=now --dry-run | awk '{print $1}' > 
$GITDIR/gc.unreachable-objects

# Delete all the previously-reachable-objects made unreachable by the
# reflog expiration done by git gc
comm -12 <(sort $GITDIR/gc.unreachable-objects) <(sort 
$GITDIR/gc.previously-reachable-objects) | sed -e 
"s#^\(..\)#$GITDIR/objects/\1/#" | xargs rm
}


test -d testcase && { echo "testcase exists; exiting"; exit 1; }
git init testcase/
cd testcase

# Create a basic commit
echo hello >world
git add world
git commit -q -m "Initial"

# Create a commit with lots of files
for i in {..}; do echo $i >$i; done
git add [0-9]*
git commit --quiet -m "Lots of numbers"

# Pack it all up
git gc --quiet

# Stop referencing the garbage
git reset --quiet --hard HEAD~1

# Pretend we did all the above stuff 30 days ago
for rlog in $(find .git/logs -type f); do
  # Subtract 3E6 (just over 30 days) from every date (assuming dates have
  # exactly 10 digits, which just happens to be valid...right now at least)
  perl -i -ne '/(.*?)(\b[0-9]{10}\b)(.*)/ && print $1,$2-300,$3,"\n"' $rlog
done

# HOWEVER: note that the pack is new; if we make the pack old, the old objects
# will get pruned for us.  But it is quite common to have new packfiles with
# old-and-soon-to-be-unreferenced-objects because frequent gc's mean moving
# the objects to new packfiles often, and eventually the reflog is expired.
# If you want to test them being part of an old packfile, uncomment this:
#   find .git/objects/pack -type f | xargs touch -t 21010101

# Create 50 packfiles in the current repo so that 'git gc --auto' will
# trigger `git repack -A -d -l` instead of just `git repack -d -l`
for i in {01..50}; do
  git fast-export master | sed -e 

Re: [PATCH 03/16] commit-reach: move commit_contains from ref-filter

2018-07-16 Thread Stefan Beller
On Mon, Jul 16, 2018 at 6:00 AM Derrick Stolee via GitGitGadget
 wrote:

> +
> +int commit_contains(struct ref_filter *filter, struct commit *commit,
> +   struct commit_list *list, struct contains_cache *cache)

[...]

> -
> -static int commit_contains(struct ref_filter *filter, struct commit *commit,
> -  struct commit_list *list, struct contains_cache 
> *cache)

All moved code, but this one, which was exposed to the public.
Might be worth calling out in the commit message?
While exposing it, it is a good idea to question its name and if
it is good enough for public use (I think it is -- despite not understanding
what the function does by its arguments; so bonus points for docs!)

Thanks,
Stefan


Re: [PATCH 02/16] commit-reach: move ref_newer from remote.c

2018-07-16 Thread Stefan Beller
On Mon, Jul 16, 2018 at 6:00 AM Derrick Stolee via GitGitGadget
 wrote:
>
> From: Derrick Stolee 
>
> Signed-off-by: Derrick Stolee 

Another verbatim move!
(I'll just re-iterate that the --color-moved option is very helpful in
these reviews)

Thanks,
Stefan

> +++ b/commit-reach.h
> @@ -38,4 +38,6 @@ struct commit_list *reduce_heads(struct commit_list *heads);
>   */
>  void reduce_heads_replace(struct commit_list **heads);
>
> +int ref_newer(const struct object_id *new_oid, const struct object_id 
> *old_oid);
> +

Bonus points for docs on ref_newer!

> +++ b/http-push.c
> @@ -14,6 +14,7 @@
>  #include "argv-array.h"
>  #include "packfile.h"
>  #include "object-store.h"
> +#include "commit-reach.h"
>
>

Double new line here?
I missed that in p1, it would be nice if you could fix that up.


Re: [PATCH] gc: do not warn about too many loose objects

2018-07-16 Thread Jonathan Nieder
Hi,

Jeff King wrote:
> On Mon, Jul 16, 2018 at 11:22:07AM -0700, Jonathan Nieder wrote:
>> Jeff King wrote:

>>> So while I completely agree that it's not a great thing to encourage
>>> users to blindly run "git prune", I think it _is_ actionable.
>>
>> To flesh this out a little more: what user action do you suggest?  Could
>> we carry out that action automatically?
>
> Er, the action is to run "git prune", like the warning says. :)

I don't think we want to recommend that, especially when "git gc --auto"
does the right thing automatically.

Can you convince me I'm wrong?

[...]
>> I mentally make a distinction between this warning from "git gc
>> --auto" and the warning from commands like "git gui".
[...]
> I don't think those warnings are the same. The warning from "git gui"
> is: you may benefit from running gc.
>
> The warning that is deleted by this patch is: you _just_ ran gc, and hey
> we even did it automatically for you, but we're still in a funky state
> afterwards. You might need to clean up this state.

This sounds awful.  It sounds to me like you're saying "git gc --auto"
is saying "I just did the wrong thing, and here is how you can clean
up after me".  That's not how I want a program to behave.

But there's a simpler explanation for what "git gc --auto" was trying
to say, which Jonathan described.

[...]
>> Yes, this is a real problem, and it would affect any other warning
>> (or even GIT_TRACE=1 output) produced by gc --auto as well.  I think we
>> should consider it a serious bug, separate from the symptom Jonathan is
>> fixing.
>>
>> Unfortunately I don't have a great idea about how to fix it.  Should
>> we avoid writing warnings to gc.log in daemon mode?  That would
>> prevent the user from ever seeing the warnings, but because of the
>> nature of a warning, that might be reasonable.
>
> If you do that without anything further, then it will break the
> protection against repeatedly running auto-gc, as I described in the
> previous email.

Do you mean ratelimiting for the message, or do you actually mean
repeatedly running auto-gc itself?

If we suppress warnings, there would still be a gc.log while "git gc
--auto" is running, just as though there had been no warnings at all.
I believe this is close to the intended behavior; it's the same as
what you'd get without daemon mode, except that you lose the warning.

[...]
>> Should we put warnings
>> in a separate log file with different semantics?  Should we change the
>> format of gc.log to allow more structured information (include 'gc'
>> exit code) and perform a two-stage migration to the new format (first
>> learn to read the new format, then switch to writing it)?
>
> Any of those would work similarly to the "just detect warnings" I
> suggested earlier, with respect to keeping the "1 day" expiration
> intact, so I'd be OK with them. In theory they'd be more robust than
> scraping the "warning:" prefix. But in practice, I think you have to
> resort to scraping anyway, if you are interested in treating warnings
> from sub-processes the same way.

Can you say more about this for me?  I am not understanding what
you're saying necessitates scraping the output.  I would strongly
prefer to avoid scraping the output.

Thanks,
Jonathan


Re: [PATCH 00/16] Consolidate reachability logic

2018-07-16 Thread Eric Sunshine
On Mon, Jul 16, 2018 at 2:56 PM Jeff King  wrote:
> On Mon, Jul 16, 2018 at 02:40:21PM -0400, Eric Sunshine wrote:
> > On Mon, Jul 16, 2018 at 12:18 PM Jeff King  wrote:
> > > git-send-email uses the current time minus an offset, and then
> > > monotonically increases for each patch:
> >
> > Junio pointed this out to gitgitgadget developers in [1], which led to
> > an issue being opened[2]. That issue was merged today.
> >
> > [1]: 
> > https://public-inbox.org/git/xmqq7em7gg3j@gitster-ct.c.googlers.com/
> > [2]: https://github.com/gitgitgadget/gitgitgadget/pull/15
>
> I was going to say "oh good, fixed", but it looks like it just merged
> adding that line to the TODO list. :)

Erm, right. I actually knew a couple days ago that that issue was just
a change to the TODO list but forgot that important tidbit when I
wrote the above "was merged today". Anyhow, at least it's on the
radar.


Re: [PATCH 01/16] commit-reach: move walk methods from commit.c

2018-07-16 Thread Stefan Beller
On Mon, Jul 16, 2018 at 6:00 AM Derrick Stolee via GitGitGadget
 wrote:
>
> From: Derrick Stolee 
>
> Signed-off-by: Derrick Stolee 

This looks good, apart from nits below.

Thanks,
Stefan

> diff --git a/commit-reach.c b/commit-reach.c
> new file mode 100644
> index 0..f2e2f7461
> --- /dev/null
> +++ b/commit-reach.c
> @@ -0,0 +1,359 @@
> +#include "cache.h"
> +#include "prio-queue.h"
> +#include "commit-reach.h"

and commit.h (see discussion below) ?

> diff --git a/commit-reach.h b/commit-reach.h
> new file mode 100644
> index 0..244f48c5f
> --- /dev/null
> +++ b/commit-reach.h
> @@ -0,0 +1,41 @@
> +#ifndef __COMMIT_REACH_H__
> +#define __COMMIT_REACH_H__
> +
> +#include "commit.h"

Do we really need to include the header file in another header file?
I'd think forward declarations would work fine here?
(The benefit of forward declaring the structs commits, commit_list
is purely for the poor saps of developers that we are, as then touching
commit.h would not trigger a compilation of files that only include this
header but not commit.h. That are not many in this particular case,
but I consider it good practice that we should follow)

> +
> +struct commit_list *get_merge_bases_many(struct commit *one,
> +int n,
> +struct commit **twos);
> +struct commit_list *get_merge_bases_many_dirty(struct commit *one,
> +  int n,
> +  struct commit **twos);
> +struct commit_list *get_merge_bases(struct commit *one, struct commit *two);
> +struct commit_list *get_octopus_merge_bases(struct commit_list *in);
> +
> +/* To be used only when object flags after this call no longer matter */
> +struct commit_list *get_merge_bases_many_dirty(struct commit *one, int n, 
> struct commit **twos);
> +
> +int is_descendant_of(struct commit *commit, struct commit_list *with_commit);
> +int in_merge_bases_many(struct commit *commit, int nr_reference, struct 
> commit **reference);
> +int in_merge_bases(struct commit *commit, struct commit *reference);
> +
> +
> +/*
> + * Takes a list of commits and returns a new list where those
> + * have been removed that can be reached from other commits in
> + * the list. It is useful for, e.g., reducing the commits
> + * randomly thrown at the git-merge command and removing
> + * redundant commits that the user shouldn't have given to it.
> + *
> + * This function destroys the STALE bit of the commit objects'
> + * flags.
> + */
> +struct commit_list *reduce_heads(struct commit_list *heads);
> +
> +/*
> + * Like `reduce_heads()`, except it replaces the list. Use this
> + * instead of `foo = reduce_heads(foo);` to avoid memory leaks.
> + */
> +void reduce_heads_replace(struct commit_list **heads);

Thanks for the docs! Bonus points for also documenting the
other functions (is_descendant_of etc. For example is
is_descendant_of destroying some bit state?)

> +#endif
> diff --git a/commit.c b/commit.c
> index 39b80bd21..32d1234bd 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -843,364 +843,6 @@ void sort_in_topological_order(struct commit_list 
> **list, enum rev_sort_order so
> clear_author_date_slab(_date);
>  }
>
> -/* merge-base stuff */

This is the only line that did not make it to the other file. :-)
I don't think it is needed in commit-reach.c


Re: [PATCH 00/16] Consolidate reachability logic

2018-07-16 Thread Jeff King
On Mon, Jul 16, 2018 at 02:40:21PM -0400, Eric Sunshine wrote:

> On Mon, Jul 16, 2018 at 12:18 PM Jeff King  wrote:
> > On Mon, Jul 16, 2018 at 02:54:38PM +0100, Ramsay Jones wrote:
> > > This is not your problem, but I find these GitGitGadget
> > > submissions somewhat annoying. This series has been spewed
> > > all over my in-box in, what I assume, is commit date order.
> > >
> > > So, patches #4,5 dated 19/06, then #1,2,3 dated 25/06,
> > > then #15 dated 28/06, then #6,7 dated 12/07, then #8-16
> > > dated 13/07, then 00/16 dated today.
> >
> > Yeah, they're out of order in mutt's threaded display. And the
> > back-dating means there's a much higher chance of them getting blocked
> > as spam (e.g., some of the dates are from weeks ago).
> >
> > git-send-email uses the current time minus an offset, and then
> > monotonically increases for each patch:
> 
> Junio pointed this out to gitgitgadget developers in [1], which led to
> an issue being opened[2]. That issue was merged today.
> 
> [1]: https://public-inbox.org/git/xmqq7em7gg3j@gitster-ct.c.googlers.com/
> [2]: https://github.com/gitgitgadget/gitgitgadget/pull/15

I was going to say "oh good, fixed", but it looks like it just merged
adding that line to the TODO list. :)

Still, it looks like wheels are in motion, which is nice.

-Peff


Re: [PATCH] gc: do not warn about too many loose objects

2018-07-16 Thread Jeff King
On Mon, Jul 16, 2018 at 11:22:07AM -0700, Jonathan Nieder wrote:

> > I'm not sure if this tells the whole story. You may still run into a
> > case where auto-gc runs over and over during the grace period, because
> > you have a bunch of objects which are not reachable and not yet ready to
> > be expired. So a gc cannot make forward progress until the 2-week
> > expiration, and the way to break the cycle is to run an immediate
> > "prune".
> >
> > So while I completely agree that it's not a great thing to encourage
> > users to blindly run "git prune", I think it _is_ actionable.
> 
> To flesh this out a little more: what user action do you suggest?  Could
> we carry out that action automatically?

Er, the action is to run "git prune", like the warning says. :)

And no, I do not think we should run it automatically. It deletes
objects without respect to the grace period, which may not be what the
user wants. (And yes, the existing warning is terrible because it does
not at all make clear that following its advice may be dangerous).

> I mentally make a distinction between this warning from "git gc
> --auto" and the warning from commands like "git gui".  The former was
> saying "Please run 'git prune', because I'm not going to do it", and
> as Jonathan noticed, that's not true any more.  The latter says
> 
>   This repository currently has approximately %i loose objects.
> 
>   To maintain optimal performance it is strongly recommended
>   that you compress the database.
> 
>   Compress the database now?
> 
> which is relevant to the current operation (slowly reading the
> repository) and easy to act on (choose "yes").

I don't think those warnings are the same. The warning from "git gui"
is: you may benefit from running gc.

The warning that is deleted by this patch is: you _just_ ran gc, and hey
we even did it automatically for you, but we're still in a funky state
afterwards. You might need to clean up this state.

(As an aside, I think the git-gui warning pre-dates "gc --auto", and
probably should just run that rather than implementing its own
heuristic).

> > I agree that the "-1" return is a little funny. Perhaps on the reading
> > side we should detect that the log contains only a "warning" line and
> > adjust our exit code accordingly.
> 
> Yes, this is a real problem, and it would affect any other warning
> (or even GIT_TRACE=1 output) produced by gc --auto as well.  I think we
> should consider it a serious bug, separate from the symptom Jonathan is
> fixing.
> 
> Unfortunately I don't have a great idea about how to fix it.  Should
> we avoid writing warnings to gc.log in daemon mode?  That would
> prevent the user from ever seeing the warnings, but because of the
> nature of a warning, that might be reasonable.

If you do that without anything further, then it will break the
protection against repeatedly running auto-gc, as I described in the
previous email.

> Should we put warnings
> in a separate log file with different semantics?  Should we change the
> format of gc.log to allow more structured information (include 'gc'
> exit code) and perform a two-stage migration to the new format (first
> learn to read the new format, then switch to writing it)?

Any of those would work similarly to the "just detect warnings" I
suggested earlier, with respect to keeping the "1 day" expiration
intact, so I'd be OK with them. In theory they'd be more robust than
scraping the "warning:" prefix. But in practice, I think you have to
resort to scraping anyway, if you are interested in treating warnings
from sub-processes the same way.

-Peff


Re: [PATCH 11/25] t7400: fix broken "submodule add/reconfigure --force" test

2018-07-16 Thread Eric Sunshine
On Mon, Jul 16, 2018 at 11:51 AM Johannes Schindelin
 wrote:
> On Mon, 16 Jul 2018, Johannes Schindelin wrote:
> > > -   git submodule add --force bogus-url submod &&
> > > +   git submodule add --force /bogus-url submod &&
> >
> > This breaks on Windows because of the absolute Unix-y path having to be
> > translated to a Windows path:
> >
> >   
> > https://git-for-windows.visualstudio.com/git/git%20Team/_build/results?buildId=12365=logs
> >
> > I could imagine that using "$(pwd)/bogus-url" (which will generate a
> > Windows-y absolute path on Windows) would work, though.
>
> Yes, this works indeed, see the patch below. Could you please squash it in
> if you send another iteration of your patch series? Junio, could you
> please add this as a SQUASH??? commit so that `pu` is fixed on Windows?

Thanks for reporting and diagnosing. I wondered briefly if we could
get by with simply "./bogus-url" instead of having to use $(pwd),
however, "./bogus-url" gets normalized internally into an absolute
path, so $(pwd) is needed anyhow to make the test '"$bogus_url" =
"$(git config ...)"' check work.

So, this SQUASH looks like the correct way forward. Hopefully, Junio
can just squash it so I don't have to flood the list again with this
long series.


Re: [PATCH v3] sequencer: use configured comment character

2018-07-16 Thread Daniel Harding

Hi Johannes,

On Mon, 16 Jul 2018 at 18:59:03 +0300, Johannes Schindelin wrote:

Hi Aaron,

On Mon, 16 Jul 2018, Aaron Schrab wrote:


Looking into that a bit further, it does seem like my explanation above
was incorrect.  Here's another attempt to explain why setting
core.commentChar=auto isn't a problem for this change.

8< -

Use the configured comment character when generating comments about
branches in a todo list.  Failure to honor this configuration causes a
failure to parse the resulting todo list.

Setting core.commentChar to "auto" will not be honored here, and the
previously configured or default value will be used instead. But, since
the todo list will consist of only generated content, there should not
be any non-comment lines beginning with that character.


How about this instead?

If core.commentChar is set to "auto", the intention is to
determine the comment line character from whatever content is there
already.

As the code path in question is the one *generating* the todo list
from scratch, it will automatically use whatever core.commentChar
has been configured before the "auto" (and fall back to "#" if none
has been configured explicitly), which is consistent with users'
expectations.


Honestly, the above still doesn't read clearly to me.  I've take a stab 
at it myself - let me know what you think:


If core.commentChar is set to "auto", the comment_line_char global
variable will be initialized to '#'.  The only time
comment_line_char gets changed to an automatic value is when the
prepare_to_commit() function (in commit.c) calls
adjust_comment_line_char().  This does not happen when generating
the todo list, so '#' will be used as the comment character in the
todo list if core.commentChar is set to "auto".

Cheers,

Daniel Harding


Re: [PATCH 00/16] Consolidate reachability logic

2018-07-16 Thread Derrick Stolee

On 7/16/2018 2:44 PM, Eric Sunshine wrote:

On Mon, Jul 16, 2018 at 1:27 PM Stefan Beller  wrote:

Another pain point of the Gadget is that CC's in the cover letter
do not work as I would imagine. The line

CC: sbel...@google.com

did not put that email into the cc field.

gitgitgadget recognizes case-sensitive "Cc:" only[1].

[1]: 
https://github.com/gitgitgadget/gitgitgadget/blob/c4805370f59532aa438283431b8ea7d4484c530f/lib/patch-series.ts#L188


Thanks for everyone's patience while we improve gitgitgadget (and - in 
this case - I learn how to use it).


Thanks,

-Stolee



Re: [PATCH 00/16] Consolidate reachability logic

2018-07-16 Thread Eric Sunshine
On Mon, Jul 16, 2018 at 1:27 PM Stefan Beller  wrote:
> Another pain point of the Gadget is that CC's in the cover letter
> do not work as I would imagine. The line
>
> CC: sbel...@google.com
>
> did not put that email into the cc field.

gitgitgadget recognizes case-sensitive "Cc:" only[1].

[1]: 
https://github.com/gitgitgadget/gitgitgadget/blob/c4805370f59532aa438283431b8ea7d4484c530f/lib/patch-series.ts#L188


[PATCH] negotiator/skipping: skip commits during fetch

2018-07-16 Thread Jonathan Tan
Introduce a new negotiation algorithm used during fetch that skips
commits in an effort to find common ancestors faster. The skips grow
similarly to the Fibonacci sequence as the commit walk proceeds further
away from the tips. The skips may cause unnecessary commits to be
included in the packfile, but the negotiation step typically ends more
quickly.

Usage of this algorithm is guarded behind the configuration flag
fetch.negotiationAlgorithm.

Signed-off-by: Jonathan Tan 
---
This is on jt/fetch-pack-negotiator, but also applies cleanly on
jt/fetch-nego-tip.

A previous version of this had rough backtracking after a "match" is
found, but that backtracking implementation was complicated and inexact,
so I've removed it.

At $DAY_JOB we have been using this patch with some success, so I'm
sending it to this mailing list.
---
 Documentation/config.txt |   9 +
 Makefile |   1 +
 fetch-negotiator.c   |   8 +-
 fetch-negotiator.h   |   3 +-
 fetch-pack.c |   7 +-
 negotiator/skipping.c| 250 +++
 negotiator/skipping.h|   8 +
 t/t5552-skipping-fetch-negotiator.sh | 179 +++
 8 files changed, 461 insertions(+), 4 deletions(-)
 create mode 100644 negotiator/skipping.c
 create mode 100644 negotiator/skipping.h
 create mode 100755 t/t5552-skipping-fetch-negotiator.sh

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ab641bf5a9..53443fb9db 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1491,6 +1491,15 @@ fetch.output::
`full` and `compact`. Default value is `full`. See section
OUTPUT in linkgit:git-fetch[1] for detail.
 
+fetch.negotiationAlgorithm::
+   Control how information about the commits in the local repository is
+   sent when negotiating the contents of the packfile to be sent by the
+   server. Set to "skipping" to use an algorithm that skips commits in an
+   effort to converge faster, but may result in a larger-than-necessary
+   packfile; any other value instructs Git to use the default algorithm
+   that never skips commits (unless the server has acknowledged it or one
+   of its descendants).
+
 format.attach::
Enable multipart/mixed attachments as the default for
'format-patch'.  The value can also be a double quoted string
diff --git a/Makefile b/Makefile
index 96f84d1dcf..617475622b 100644
--- a/Makefile
+++ b/Makefile
@@ -893,6 +893,7 @@ LIB_OBJS += merge-recursive.o
 LIB_OBJS += mergesort.o
 LIB_OBJS += name-hash.o
 LIB_OBJS += negotiator/default.o
+LIB_OBJS += negotiator/skipping.o
 LIB_OBJS += notes.o
 LIB_OBJS += notes-cache.o
 LIB_OBJS += notes-merge.o
diff --git a/fetch-negotiator.c b/fetch-negotiator.c
index 2675d120fe..5d283049f4 100644
--- a/fetch-negotiator.c
+++ b/fetch-negotiator.c
@@ -1,8 +1,14 @@
 #include "git-compat-util.h"
 #include "fetch-negotiator.h"
 #include "negotiator/default.h"
+#include "negotiator/skipping.h"
 
-void fetch_negotiator_init(struct fetch_negotiator *negotiator)
+void fetch_negotiator_init(struct fetch_negotiator *negotiator,
+  const char *algorithm)
 {
+   if (algorithm && !strcmp(algorithm, "skipping")) {
+   skipping_negotiator_init(negotiator);
+   return;
+   }
default_negotiator_init(negotiator);
 }
diff --git a/fetch-negotiator.h b/fetch-negotiator.h
index b1290aa9c6..ddb44a22dc 100644
--- a/fetch-negotiator.h
+++ b/fetch-negotiator.h
@@ -52,6 +52,7 @@ struct fetch_negotiator {
void *data;
 };
 
-void fetch_negotiator_init(struct fetch_negotiator *negotiator);
+void fetch_negotiator_init(struct fetch_negotiator *negotiator,
+  const char *algorithm);
 
 #endif
diff --git a/fetch-pack.c b/fetch-pack.c
index 1e50d90082..50773fdde3 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -33,6 +33,7 @@ static int agent_supported;
 static int server_supports_filtering;
 static struct lock_file shallow_lock;
 static const char *alternate_shallow_file;
+static char *negotiation_algorithm;
 
 /* Remember to update object flag allocation in object.h */
 #define COMPLETE   (1U << 0)
@@ -905,7 +906,7 @@ static struct ref *do_fetch_pack(struct fetch_pack_args 
*args,
const char *agent_feature;
int agent_len;
struct fetch_negotiator negotiator;
-   fetch_negotiator_init();
+   fetch_negotiator_init(, negotiation_algorithm);
 
sort_ref_list(, ref_compare_name);
QSORT(sought, nr_sought, cmp_ref_by_name);
@@ -1287,7 +1288,7 @@ static struct ref *do_fetch_pack_v2(struct 
fetch_pack_args *args,
int in_vain = 0;
int haves_to_send = INITIAL_FLUSH;
struct fetch_negotiator negotiator;
-   fetch_negotiator_init();
+   fetch_negotiator_init(, negotiation_algorithm);
packet_reader_init(, fd[0], NULL, 0,
   

Re: [PATCH 00/16] Consolidate reachability logic

2018-07-16 Thread Eric Sunshine
On Mon, Jul 16, 2018 at 12:18 PM Jeff King  wrote:
> On Mon, Jul 16, 2018 at 02:54:38PM +0100, Ramsay Jones wrote:
> > This is not your problem, but I find these GitGitGadget
> > submissions somewhat annoying. This series has been spewed
> > all over my in-box in, what I assume, is commit date order.
> >
> > So, patches #4,5 dated 19/06, then #1,2,3 dated 25/06,
> > then #15 dated 28/06, then #6,7 dated 12/07, then #8-16
> > dated 13/07, then 00/16 dated today.
>
> Yeah, they're out of order in mutt's threaded display. And the
> back-dating means there's a much higher chance of them getting blocked
> as spam (e.g., some of the dates are from weeks ago).
>
> git-send-email uses the current time minus an offset, and then
> monotonically increases for each patch:

Junio pointed this out to gitgitgadget developers in [1], which led to
an issue being opened[2]. That issue was merged today.

[1]: https://public-inbox.org/git/xmqq7em7gg3j@gitster-ct.c.googlers.com/
[2]: https://github.com/gitgitgadget/gitgitgadget/pull/15


Re: rev-parse --show-toplevel broken during exec'ed rebase?

2018-07-16 Thread Jeff King
On Mon, Jul 16, 2018 at 11:14:51AM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > None of which is too surprising. The root of the bug is in the
> > conversion to rebase--helper, I think, when presumably we started
> > setting GIT_DIR at all (but I didn't dig further). Then 09d7b6c6fa fixed
> > _one_ fallout of that, which was relative paths, but didn't help the
> > subdirectory case.
> >
> > Just reading over this thread, I suspect the simplest fix is to pass
> > GIT_DIR and GIT_WORK_TREE together, which is almost always the right
> > thing to do.
> 
> Perhaps.  Not exporting GIT_DIR (unless the end-user already did to
> the environment before starting "git rebase"---it would be a bad
> change to unexport it unconditionally) may probably be a way to make
> rebase--helper conversion more faithful to the original scripted
> Porcelain, but I suspect in practice always giving GIT_DIR and
> GIT_WORK_TREE would work well for many existing hooks.

Yeah, that may be an option. I don't remember if this was discussed in
this thread or elsewhere, but setting GIT_DIR is a regression for hooks,
etc, which do:

  git -C /some/other/repo log

or similar. I'm not sure if that falls under "actual regression" or
"just how it happened to work in the past". I'm not even sure it worked
that way _consistently_ in the past.

The best practice if you're switching directories is to do:

  unset $(git rev-parse --local-env-vars)

though I highly doubt that most people bother.

-Peff


Re: [PATCH 2/2] fsck: downgrade gitmodulesParse default to "info"

2018-07-16 Thread Jeff King
On Mon, Jul 16, 2018 at 11:04:04AM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> >site's support). And the broken .gitmodules may be too
> >far back in history for rewriting to be feasible (again,
> >this is an issue for cgit).
> 
> "again" but this is the first mention that hints cgit has some
> problem (but not exactly what problem).  Is that the "cgit has a
> file called .gitmodules that predates the submodule support on our
> side?" thing?

I think you missed it. In the paragraph above the one you
quoted, I said:

   The cgit repository, for example, has a file named
   .gitmodules from a pre-submodule attempt at sharing code,
   but does not actually have any gitlinks.

> > So we're being unnecessarily restrictive without actually
> > improving the security in a meaningful way. It would be more
> > convenient to downgrade this check to "info", which means
> > we'd still comment on it, but not reject a push. Site admins
> > can already do this via config, but we should ship sensible
> > defaults.
> > ...
> > Considering both sets of arguments, it makes sense to loosen
> > this check for now.
> >
> > Note that we have to tweak the test in t7415 since fsck will
> > no longer consider this a fatal error. But we still check
> > that it reports the warning, and that we don't get the
> > spurious error from the config code.
> >
> > Signed-off-by: Jeff King 
> > ---
> 
> Thanks.

So I'm curious if you found the argument in my commit
message compelling. :)

My recollection from the earlier discussion was that you
were more in favor of keeping things tight. E.g.,:

  https://public-inbox.org/git/xmqqh8lgrz5c@gitster-ct.c.googlers.com/

but reading it again:

 - there we were talking about non-blob objects as
   .gitmodules

 - I think your main concern was that there be a way for the
   user to loosen/tighten, which there is.

-Peff


Re: [PATCH] gc: do not warn about too many loose objects

2018-07-16 Thread Jonathan Nieder
Hi,

Jeff King wrote:
> On Mon, Jul 16, 2018 at 10:27:17AM -0700, Jonathan Tan wrote:

>> In a087cc9819 ("git-gc --auto: protect ourselves from accumulated
>> cruft", 2007-09-17), the user was warned if there were too many
>> unreachable loose objects. This made sense at the time, because gc
>> couldn't prune them safely. But subsequently, git prune learned the
>> ability to not prune recently created loose objects, making pruning able
>> to be done more safely, and gc was made to automatically prune old
>> unreachable loose objects in 25ee9731c1 ("gc: call "prune --expire
>> 2.weeks.ago" by default", 2008-03-12).
>>
>> This makes the warning unactionable by the user, as any loose objects
>> left are not deleted yet because of safety, and "git prune" is not a
>> command that the user is recommended to run directly anyway.
>
> I'm not sure if this tells the whole story. You may still run into a
> case where auto-gc runs over and over during the grace period, because
> you have a bunch of objects which are not reachable and not yet ready to
> be expired. So a gc cannot make forward progress until the 2-week
> expiration, and the way to break the cycle is to run an immediate
> "prune".
>
> So while I completely agree that it's not a great thing to encourage
> users to blindly run "git prune", I think it _is_ actionable.

To flesh this out a little more: what user action do you suggest?  Could
we carry out that action automatically?

I mentally make a distinction between this warning from "git gc
--auto" and the warning from commands like "git gui".  The former was
saying "Please run 'git prune', because I'm not going to do it", and
as Jonathan noticed, that's not true any more.  The latter says

This repository currently has approximately %i loose objects.

To maintain optimal performance it is strongly recommended
that you compress the database.

Compress the database now?

which is relevant to the current operation (slowly reading the
repository) and easy to act on (choose "yes").

[...]
> I agree that the "-1" return is a little funny. Perhaps on the reading
> side we should detect that the log contains only a "warning" line and
> adjust our exit code accordingly.

Yes, this is a real problem, and it would affect any other warning
(or even GIT_TRACE=1 output) produced by gc --auto as well.  I think we
should consider it a serious bug, separate from the symptom Jonathan is
fixing.

Unfortunately I don't have a great idea about how to fix it.  Should
we avoid writing warnings to gc.log in daemon mode?  That would
prevent the user from ever seeing the warnings, but because of the
nature of a warning, that might be reasonable.  Should we put warnings
in a separate log file with different semantics?  Should we change the
format of gc.log to allow more structured information (include 'gc'
exit code) and perform a two-stage migration to the new format (first
learn to read the new format, then switch to writing it)?

Thanks,
Jonathan


Re: rev-parse --show-toplevel broken during exec'ed rebase?

2018-07-16 Thread Junio C Hamano
Jeff King  writes:

> None of which is too surprising. The root of the bug is in the
> conversion to rebase--helper, I think, when presumably we started
> setting GIT_DIR at all (but I didn't dig further). Then 09d7b6c6fa fixed
> _one_ fallout of that, which was relative paths, but didn't help the
> subdirectory case.
>
> Just reading over this thread, I suspect the simplest fix is to pass
> GIT_DIR and GIT_WORK_TREE together, which is almost always the right
> thing to do.

Perhaps.  Not exporting GIT_DIR (unless the end-user already did to
the environment before starting "git rebase"---it would be a bad
change to unexport it unconditionally) may probably be a way to make
rebase--helper conversion more faithful to the original scripted
Porcelain, but I suspect in practice always giving GIT_DIR and
GIT_WORK_TREE would work well for many existing hooks.


  1   2   >