[PATCH] doc: fix small typo in git show-branch

2018-10-16 Thread Saulius Gurklys
Fix small typo as in document  is used not .

Signed-off-by: Saulius Gurklys 
---
 Documentation/git-show-branch.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/git-show-branch.txt 
b/Documentation/git-show-branch.txt
index 262db049d..4a0137122 100644
--- a/Documentation/git-show-branch.txt
+++ b/Documentation/git-show-branch.txt
@@ -19,7 +19,7 @@ DESCRIPTION
 ---
 
 Shows the commit ancestry graph starting from the commits named
-with s or s (or all refs under refs/heads
+with s or s (or all refs under refs/heads
 and/or refs/tags) semi-visually.
 
 It cannot show more than 29 branches and commits at a time.
-- 
2.19.1



Recommendations for updating error() to error_errno()

2018-10-16 Thread 牛旭
Hi,
Our team works on enhance logging practices by learning from historical log 
revisions in evolution.
And we find three patches that update error(..., strerror(errmp)) to 
error_errno(...).

While applying this rule to git-2.14.2, we find 9 missed spot in file 
refs/files-backend.c, 1 in apply.c, 2 in trace.c, 4 in dir-iterator.c:.

Here are the missed spots:
1) Line 1734 in file git-2.14.2/refs/files-backend.c:
ret = raceproof_create_file(path.buf, rename_tmp_log_callback, );
if (ret) {
if (errno == EISDIR)
error("directory not empty: %s", path.buf);
else
error("unable to move logfile %s to %s: %s",
tmp.buf, path.buf,
strerror(cb.true_errno));
}

2) Line 1795 in file git-2.14.2/refs/files-backend.c: 
if (log && rename(sb_oldref.buf, tmp_renamed_log.buf)) {
ret = error("unable to move logfile logs/%s to logs/"TMP_RENAMED_LOG": 
%s",
oldrefname, strerror(errno));
goto out;
}

3) Line 1880, 1884 in file git-2.14.2/refs/files-backend.c:
rollbacklog:
if (logmoved && rename(sb_newref.buf, sb_oldref.buf))
error("unable to restore logfile %s from %s: %s",
oldrefname, newrefname, strerror(errno));
if (!logmoved && log &&
rename(tmp_renamed_log.buf, sb_oldref.buf))
error("unable to restore logfile %s from 
logs/"TMP_RENAMED_LOG": %s",
oldrefname, strerror(errno));
4) Line 2247 in file git-2.14.2/refs/files-backend.c:
if (commit_ref(lock) < 0)
return error("unable to write symref for %s: %s", refname,
 strerror(errno));

5) Line 2366 in file git-2.14.2/refs/files-backend.c:
if (fseek(logfp, 0, SEEK_END) < 0)
ret = error("cannot seek back reflog for %s: %s",
refname, strerror(errno));

6) Line 2378, 2384 in file git-2.14.2/refs/files-backend.c:
if (fseek(logfp, pos - cnt, SEEK_SET)) {
ret = error("cannot seek back reflog for %s: %s",
refname, strerror(errno));
break;
} 
nread = fread(buf, cnt, 1, logfp);
if (nread != 1) {
ret = error("cannot read %d bytes from reflog for %s: %s",
cnt, refname, strerror(errno));
break;
}

7) Line 3312 in file git-2.14.2/refs/files-backend.c:
cb.newlog = fdopen_lock_file(_lock, "w");
if (!cb.newlog) {
error("cannot fdopen %s (%s)",
  get_lock_file_path(_lock), strerror(errno));
goto failure;
}

8) Line 3337 in file git-2.14.2/refs/files-backend.c:
if (close_lock_file(_lock)) {
status |= error("couldn't write %s: %s", log_file,
   strerror(errno));

9) Line 3348 in file git-2.14.2/refs/files-backend.c:
static int files_reflog_expire(struct ref_store *ref_store,...
{
...
} else if (commit_lock_file(_lock)) {
status |= error("unable to write reflog '%s' (%s)",
log_file, strerror(errno));

10) Line 4859 in file git-2.14.2/apply.c:
fd = open(arg, O_RDONLY);
if (fd < 0) {
error(_("can't open patch '%s': %s"), arg, strerror(errno));
res = -128;
free(to_free);
goto end;
}

11) Line 64 in file git-2.14.2/trace.c:
int fd = open(trace, O_WRONLY | O_APPEND | O_CREAT, 0666);
if (fd == -1) {
warning("could not open '%s' for tracing: %s",
trace, strerror(errno));

12) Line 133 in file git-2.14.2/trace.c:
static void trace_write(struct trace_key *key, const void *buf, unsigned len)
{
if (write_in_full(get_trace_fd(key), buf, len) < 0) {
normalize_trace_key();
warning("unable to write trace for %s: %s",
key->key, strerror(errno));

13) Line 74 in file git-2.14.2/dir-iterator.c:
level->dir = opendir(iter->base.path.buf);
if (!level->dir && errno != ENOENT) {
warning("error opening directory %s: %s",
iter->base.path.buf, strerror(errno));
/* Popping the level is handled below */
}

14) Line 125, 128 in file git-2.14.2/dir-iterator.c:
de = readdir(level->dir);
if (!de) {
/* This level is exhausted; pop up a level. */
if (errno) {
warning("error reading directory %s: %s",
iter->base.path.buf, strerror(errno));
} else if (closedir(level->dir))
warning("error closing directory %s: %s",
iter->base.path.buf, strerror(errno));

15) Line 143 in file git-2.14.2/dir-iterator.c:
if (lstat(iter->base.path.buf, >base.st) < 0) {
if (errno != ENOENT)
warning("error reading path '%s': %s",
iter->base.path.buf,
strerror(errno));

16) Line 174 in file git-2.14.2/dir-iterator.c:
if (level->dir && closedir(level->dir)) {
strbuf_setlen(>base.path, level->prefix_len);
warning("error closing directory %s: 

Re: [PATCH] builtin/submodule--helper: remove debugging leftover tracing

2018-10-16 Thread Jonathan Nieder
Stefan Beller wrote:

> I noticed 74d4731da1 (submodule--helper: replace connect-gitdir-workingtree
> by ensure-core-worktree, 2018-08-13) had two leftover debugging statements
> when reading The coverage report [1]. Remove them.
>
> https://public-inbox.org/git/e30a9c05-87d8-1f2b-182c-6d6a5fefe...@gmail.com/
>
> Signed-off-by: Stefan Beller 
> ---
>  builtin/submodule--helper.c | 2 --
>  1 file changed, 2 deletions(-)

Doh.  Glad you caught it!

Is there some reference for The Coverage Report other than the mailing
list?  E.g. I suspect a reference to

make coverage-test
make coverage-report

would be useful to readers finding this commit later.

> To be applied on (or squashed into the tip of)
>   sb/submodule-update-in-c

Looks like that's already in "master", so not a candidate for
squashing.

Reviewed-by: Jonathan Nieder 


Re: [PATCH 1/1] subtree: make install targets depend on build targets

2018-10-16 Thread Jonathan Nieder
Hi,

Christian Hesse wrote:

> --- a/contrib/subtree/Makefile
> +++ b/contrib/subtree/Makefile
> @@ -69,11 +69,11 @@ install: $(GIT_SUBTREE)
[...]
> -install-html: $(GIT_SUBTREE_HTML)
> +install-html: html

This broke the build for me:

 make[2]: Entering directory '/build/git-2.19.1+next.20181016/contrib/subtree'
 install -m 644 html 
/build/git-2.19.1+next.20181016/debian/tmp/usr/share/doc/git/html
 install: cannot stat 'html': No such file or directory
 make[2]: *** [Makefile:78: install-html] Error 1

The rule says

 install-html: html
$(INSTALL) -d -m 755 $(DESTDIR)$(htmldir)
$(INSTALL) -m 644 $^ $(DESTDIR)$(htmldir)

and $^ substitutes to "html" after this change.  How was this patch
tested?

Thanks,
Jonathan


Re: problem with not being able to enforce git content filters

2018-10-16 Thread Stas Bekman
On 2018-10-16 05:54 PM, brian m. carlson wrote:
[...]
>> It doesn't help with direct commits to master, since CI would be
>> detecting it after it was committed. And when that happens we all know
>> that already because 'git pull' fails.
> 
> Typically projects that have CI set up don't allow direct pushes to
> master, since that tends to allow to bypass CI, or they allow it only in
> exceptional circumstances (e.g., reverts).  Also, typically most
> projects want to have some sort of review before code (resp. documents,
> other contributions) are merged.  Most Git hosting platforms, including
> GitHub, offer protected branches, so it's something to consider.
> 
> There is a possible alternative, and that's that if your project has
> some sort of build file (e.g., a Makefile), you can set it up to
> automatically insert hooks or git configuration into developers'
> systems, optionally only if it's in a Git repository.  For example, you
> could add a pre-commit hook that fails if the filters are disabled.
> 
> These are the approaches that tend to work well for most projects.  I
> tend to prefer the CI approach with restricted branches because often I
> want to customize my hooks, but your project will decide what works best
> for it.

Yes, Brian, what you're sharing makes total sense when things are setup
this way, but this is not the case with the project I'm contributing to.
This one is setup, commit first, review later, due to having too few hands.

And I have already setup a CI to detect misconfigured systems. It'll
catch RPs in time, and everything else post-commit. Let's hope the
developers will watch the status of their commits and will react quickly
to fix their setup and mend the broken commit, when they see their
commit broke things. That's as good as it can get in this particular
situation I understand.

Thank you, Brian and Ævar for your support and very helpful suggestions.

-- 

Stas Bekman   <'><   <'><
https://stasosphere.com  https://chestofbooks.com
https://experientialsexlab.com https://stason.org
https://stasosphere.com/experience-life/my-books


Re: problem with not being able to enforce git content filters

2018-10-16 Thread brian m. carlson
On Tue, Oct 16, 2018 at 04:48:13PM -0700, Stas Bekman wrote:
> 
> >> And the devs honestly try to do their best to remember to configure the
> >> filters, but for some reason they disappear for them, don't ask me why,
> >> I don't know. This is an open source project team, not a work place.
> > 
> > This sounds like it could be easily solved by continuous integration.
> > You could set up a job on any of a variety of services that checks that
> > a pull request or other commit is clean when when the filter runs.  If
> > it doesn't pass, the code doesn't merge.
> 
> This is an excellent idea wrt to PRs. Thank you, Brian! I will implement
> that.
> 
> It doesn't help with direct commits to master, since CI would be
> detecting it after it was committed. And when that happens we all know
> that already because 'git pull' fails.

Typically projects that have CI set up don't allow direct pushes to
master, since that tends to allow to bypass CI, or they allow it only in
exceptional circumstances (e.g., reverts).  Also, typically most
projects want to have some sort of review before code (resp. documents,
other contributions) are merged.  Most Git hosting platforms, including
GitHub, offer protected branches, so it's something to consider.

There is a possible alternative, and that's that if your project has
some sort of build file (e.g., a Makefile), you can set it up to
automatically insert hooks or git configuration into developers'
systems, optionally only if it's in a Git repository.  For example, you
could add a pre-commit hook that fails if the filters are disabled.

These are the approaches that tend to work well for most projects.  I
tend to prefer the CI approach with restricted branches because often I
want to customize my hooks, but your project will decide what works best
for it.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: problem with not being able to enforce git content filters

2018-10-16 Thread Stas Bekman


>> And the devs honestly try to do their best to remember to configure the
>> filters, but for some reason they disappear for them, don't ask me why,
>> I don't know. This is an open source project team, not a work place.
> 
> This sounds like it could be easily solved by continuous integration.
> You could set up a job on any of a variety of services that checks that
> a pull request or other commit is clean when when the filter runs.  If
> it doesn't pass, the code doesn't merge.

This is an excellent idea wrt to PRs. Thank you, Brian! I will implement
that.

It doesn't help with direct commits to master, since CI would be
detecting it after it was committed. And when that happens we all know
that already because 'git pull' fails.


-- 

Stas Bekman   <'><   <'><
https://stasosphere.com  https://chestofbooks.com
https://experientialsexlab.com https://stason.org
https://stasosphere.com/experience-life/my-books


[PATCH] builtin/submodule--helper: remove debugging leftover tracing

2018-10-16 Thread Stefan Beller
I noticed 74d4731da1 (submodule--helper: replace connect-gitdir-workingtree
by ensure-core-worktree, 2018-08-13) had two leftover debugging statements
when reading The coverage report [1]. Remove them.

https://public-inbox.org/git/e30a9c05-87d8-1f2b-182c-6d6a5fefe...@gmail.com/

Signed-off-by: Stefan Beller 
---

To be applied on (or squashed into the tip of)
  sb/submodule-update-in-c

 builtin/submodule--helper.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 5c9d1fb496..c7d3841ffc 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1459,7 +1459,6 @@ static void determine_submodule_update_strategy(struct 
repository *r,
key = xstrfmt("submodule.%s.update", sub->name);
 
if (update) {
-   trace_printf("parsing update");
if (parse_submodule_update_strategy(update, out) < 0)
die(_("Invalid update mode '%s' for submodule path 
'%s'"),
update, path);
@@ -1468,7 +1467,6 @@ static void determine_submodule_update_strategy(struct 
repository *r,
die(_("Invalid update mode '%s' configured for 
submodule path '%s'"),
val, path);
} else if (sub->update_strategy.type != SM_UPDATE_UNSPECIFIED) {
-   trace_printf("loaded thing");
out->type = sub->update_strategy.type;
out->command = sub->update_strategy.command;
} else
-- 
2.19.0



Re: [PATCH 2/3] pack-objects (mingw): demonstrate a segmentation fault with large deltas

2018-10-16 Thread brian m. carlson
On Tue, Oct 16, 2018 at 02:02:50PM -0700, Johannes Schindelin via GitGitGadget 
wrote:
> From: Johannes Schindelin 
> 
> There is a problem in the way 9ac3f0e5b3e4 (pack-objects: fix
> performance issues on packing large deltas, 2018-07-22) initializes that
> mutex in the `packing_data` struct. The problem manifests in a
> segmentation fault on Windows, when a mutex (AKA critical section) is
> accessed without being initialized. (With pthreads, you apparently do
> not really have to initialize them?)

This is a good catch.  You do have to initialize a pthread mutex, but on
amd64 Linux the default initializer is all zeros, so since we use a
static variable, it happens to coincidentally have the right value, so
we don't notice.

Thanks for fixing this.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH 0/4] Bloom filter experiment

2018-10-16 Thread Jonathan Tan
> | Implementation | Queries | Maybe | FP # | FP %  |
> ||-|---|--|---|
> | Szeder | 66095   | 1142  | 256  | 0.38% |
> | Jonathan   | 66459   | 107   | 89   | 0.16% |
> | Stolee | 53025   | 492   | 479  | 0.90% |
> 
> (Note that we must have used different starting points, which is why my 
> "Queries" is so much smaller.)

I suspect it's because your bloom filter implementation covers only the
first parent (if I'm understanding get_bloom_filter() correctly). When I
only covered the first parent in my initial test (see patch 2 of [1]), I
got (following the columns in the table above):

  53096 107 89 0.001676

Also, I think that the rejecting power (Queries - Maybe)/(Total tree
comparisons if no bloom filters were used) needs to be in the evaluation
criteria somewhere, as that indicates how many tree comparisons we
managed to avoid.

Also, we probably should also test on a file that changes more
frequently :-)

[1] https://public-inbox.org/git/cover.1539219248.git.jonathanta...@google.com/

> The increase in false-positive percentage is expected in my 
> implementation. I'm using the correct filter sizes to hit a <1% FP 
> ratio. This could be lowered by changing the settings, and the size 
> would dynamically grow. For my Git repo (which contains 
> git-for-windows/git and microsoft/git) this implementation grows the 
> commmit-graph file from 5.8 MB to 7.3 MB (1.5 MB total, compared to 
> Szeder's 8MB filter). For 105,260 commits, that rounds out to less than 
> 20 bytes per commit (compared to Jonathan's 256 bytes per commit).

Mine has 256 bits per commit, which is 32 bytes per commit (still more
than yours).

Having said all that, thanks for writing up your version - in
particular, variable sized filters (like in yours) seem to be the way to
go.

> We'll see how much time I have to do this polish, but I think the 
> benefit is proven.

Agreed.


[PATCH 16/19] commit: prepare logmsg_reencode to handle arbitrary repositories

2018-10-16 Thread Stefan Beller
Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 commit.h|  8 
 contrib/coccinelle/the_repository.cocci |  9 +
 pretty.c| 13 +++--
 3 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/commit.h b/commit.h
index f3ef497723..b69a4b140d 100644
--- a/commit.h
+++ b/commit.h
@@ -180,6 +180,14 @@ extern int has_non_ascii(const char *text);
 extern const char *logmsg_reencode(const struct commit *commit,
   char **commit_encoding,
   const char *output_encoding);
+const char *repo_logmsg_reencode(struct repository *r,
+const struct commit *commit,
+char **commit_encoding,
+const char *output_encoding);
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define logmsg_reencode(c, enc, out) repo_logmsg_reencode(the_repository, c, 
enc, out)
+#endif
+
 extern const char *skip_blank_lines(const char *msg);
 
 /** Removes the first commit from a list sorted by date, and adds all
diff --git a/contrib/coccinelle/the_repository.cocci 
b/contrib/coccinelle/the_repository.cocci
index 516f19ffee..f5b42cfc62 100644
--- a/contrib/coccinelle/the_repository.cocci
+++ b/contrib/coccinelle/the_repository.cocci
@@ -123,3 +123,12 @@ expression F;
 - unuse_commit_buffer(
 + repo_unuse_commit_buffer(the_repository,
   E, F);
+
+@@
+expression E;
+expression F;
+expression G;
+@@
+- logmsg_reencode(
++ repo_logmsg_reencode(the_repository,
+  E, F, G);
diff --git a/pretty.c b/pretty.c
index 98cf5228f9..26e44472bb 100644
--- a/pretty.c
+++ b/pretty.c
@@ -595,14 +595,15 @@ static char *replace_encoding_header(char *buf, const 
char *encoding)
return strbuf_detach(, NULL);
 }
 
-const char *logmsg_reencode(const struct commit *commit,
-   char **commit_encoding,
-   const char *output_encoding)
+const char *repo_logmsg_reencode(struct repository *r,
+const struct commit *commit,
+char **commit_encoding,
+const char *output_encoding)
 {
static const char *utf8 = "UTF-8";
const char *use_encoding;
char *encoding;
-   const char *msg = get_commit_buffer(commit, NULL);
+   const char *msg = repo_get_commit_buffer(r, commit, NULL);
char *out;
 
if (!output_encoding || !*output_encoding) {
@@ -630,7 +631,7 @@ const char *logmsg_reencode(const struct commit *commit,
 * the cached copy from get_commit_buffer, we need to duplicate 
it
 * to avoid munging the cached copy.
 */
-   if (msg == get_cached_commit_buffer(the_repository, commit, 
NULL))
+   if (msg == get_cached_commit_buffer(r, commit, NULL))
out = xstrdup(msg);
else
out = (char *)msg;
@@ -644,7 +645,7 @@ const char *logmsg_reencode(const struct commit *commit,
 */
out = reencode_string(msg, output_encoding, use_encoding);
if (out)
-   unuse_commit_buffer(commit, msg);
+   repo_unuse_commit_buffer(r, commit, msg);
}
 
/*
-- 
2.19.0



[PATCH 14/19] commit: prepare get_commit_buffer to handle arbitrary repositories

2018-10-16 Thread Stefan Beller
Signed-off-by: Stefan Beller 
---
 commit.c| 8 +---
 commit.h| 7 ++-
 contrib/coccinelle/the_repository.cocci | 8 
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/commit.c b/commit.c
index fe942754e2..cc5321af8c 100644
--- a/commit.c
+++ b/commit.c
@@ -297,13 +297,15 @@ const void *get_cached_commit_buffer(struct repository 
*r, const struct commit *
return v->buffer;
 }
 
-const void *get_commit_buffer(const struct commit *commit, unsigned long 
*sizep)
+const void *repo_get_commit_buffer(struct repository *r,
+  const struct commit *commit,
+  unsigned long *sizep)
 {
-   const void *ret = get_cached_commit_buffer(the_repository, commit, 
sizep);
+   const void *ret = get_cached_commit_buffer(r, commit, sizep);
if (!ret) {
enum object_type type;
unsigned long size;
-   ret = read_object_file(>object.oid, , );
+   ret = repo_read_object_file(r, >object.oid, , 
);
if (!ret)
die("cannot read commit object %s",
oid_to_hex(>object.oid));
diff --git a/commit.h b/commit.h
index 58a0c5409b..b8437f66e2 100644
--- a/commit.h
+++ b/commit.h
@@ -117,7 +117,12 @@ const void *get_cached_commit_buffer(struct repository *, 
const struct commit *,
  * from disk. The resulting memory should not be modified, and must be given
  * to unuse_commit_buffer when the caller is done.
  */
-const void *get_commit_buffer(const struct commit *, unsigned long *size);
+const void *repo_get_commit_buffer(struct repository *r,
+  const struct commit *,
+  unsigned long *size);
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define get_commit_buffer(c, s) repo_get_commit_buffer(the_repository, c, s)
+#endif
 
 /*
  * Tell the commit subsytem that we are done with a particular commit buffer.
diff --git a/contrib/coccinelle/the_repository.cocci 
b/contrib/coccinelle/the_repository.cocci
index 8c6a71bf64..4018e6eaf7 100644
--- a/contrib/coccinelle/the_repository.cocci
+++ b/contrib/coccinelle/the_repository.cocci
@@ -107,3 +107,11 @@ expression G;
 - in_merge_bases_many(
 + repo_in_merge_bases_many(the_repository,
   E, F, G);
+
+@@
+expression E;
+expression F;
+@@
+- get_commit_buffer(
++ repo_get_commit_buffer(the_repository,
+  E, F);
-- 
2.19.0



[PATCH 13/19] commit-reach: prepare in_merge_bases[_many] to handle arbitrary repositories

2018-10-16 Thread Stefan Beller
Signed-off-by: Stefan Beller 
---
 commit-reach.c  | 15 +--
 commit-reach.h  | 12 ++--
 contrib/coccinelle/the_repository.cocci | 16 
 3 files changed, 35 insertions(+), 8 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index aa3cde5c8a..1877f356e3 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -309,16 +309,17 @@ int is_descendant_of(struct commit *commit, struct 
commit_list *with_commit)
 /*
  * Is "commit" an ancestor of one of the "references"?
  */
-int in_merge_bases_many(struct commit *commit, int nr_reference, struct commit 
**reference)
+int repo_in_merge_bases_many(struct repository *r, struct commit *commit,
+int nr_reference, struct commit **reference)
 {
struct commit_list *bases;
int ret = 0, i;
uint32_t min_generation = GENERATION_NUMBER_INFINITY;
 
-   if (parse_commit(commit))
+   if (repo_parse_commit(r, commit))
return ret;
for (i = 0; i < nr_reference; i++) {
-   if (parse_commit(reference[i]))
+   if (repo_parse_commit(r, reference[i]))
return ret;
if (reference[i]->generation < min_generation)
min_generation = reference[i]->generation;
@@ -327,7 +328,7 @@ int in_merge_bases_many(struct commit *commit, int 
nr_reference, struct commit *
if (commit->generation > min_generation)
return ret;
 
-   bases = paint_down_to_common(the_repository, commit,
+   bases = paint_down_to_common(r, commit,
 nr_reference, reference,
 commit->generation);
if (commit->object.flags & PARENT2)
@@ -341,9 +342,11 @@ int in_merge_bases_many(struct commit *commit, int 
nr_reference, struct commit *
 /*
  * Is "commit" an ancestor of (i.e. reachable from) the "reference"?
  */
-int in_merge_bases(struct commit *commit, struct commit *reference)
+int repo_in_merge_bases(struct repository *r,
+   struct commit *commit,
+   struct commit *reference)
 {
-   return in_merge_bases_many(commit, 1, );
+   return repo_in_merge_bases_many(r, commit, 1, );
 }
 
 struct commit_list *reduce_heads(struct commit_list *heads)
diff --git a/commit-reach.h b/commit-reach.h
index 52667d64ac..a0d4a29d25 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -27,8 +27,16 @@ struct commit_list *repo_get_merge_bases_many_dirty(struct 
repository *r,
 struct commit_list *get_octopus_merge_bases(struct commit_list *in);
 
 int is_descendant_of(struct commit *commit, struct commit_list *with_commit);
-int in_merge_bases_many(struct commit *commit, int nr_reference, struct commit 
**reference);
-int in_merge_bases(struct commit *commit, struct commit *reference);
+int repo_in_merge_bases(struct repository *r,
+   struct commit *commit,
+   struct commit *reference);
+int repo_in_merge_bases_many(struct repository *r,
+struct commit *commit,
+int nr_reference, struct commit **reference);
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define in_merge_bases(c1, c2) repo_in_merge_bases(the_repository, c1, c2)
+#define in_merge_bases_many(c1, n, cs) 
repo_in_merge_bases_many(the_repository, c1, n, cs)
+#endif
 
 /*
  * Takes a list of commits and returns a new list where those
diff --git a/contrib/coccinelle/the_repository.cocci 
b/contrib/coccinelle/the_repository.cocci
index 5e037fe428..8c6a71bf64 100644
--- a/contrib/coccinelle/the_repository.cocci
+++ b/contrib/coccinelle/the_repository.cocci
@@ -91,3 +91,19 @@ expression G;
 + repo_get_merge_bases_many_dirty(the_repository,
   E, F, G);
 
+@@
+expression E;
+expression F;
+@@
+- in_merge_bases(
++ repo_in_merge_bases(the_repository,
+  E, F);
+
+@@
+expression E;
+expression F;
+expression G;
+@@
+- in_merge_bases_many(
++ repo_in_merge_bases_many(the_repository,
+  E, F, G);
-- 
2.19.0



[PATCH 15/19] commit: prepare repo_unuse_commit_buffer to handle arbitrary repositories

2018-10-16 Thread Stefan Beller
Signed-off-by: Stefan Beller 
---
 commit.c| 6 --
 commit.h| 7 ++-
 contrib/coccinelle/the_repository.cocci | 8 
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/commit.c b/commit.c
index cc5321af8c..8e7b1138f7 100644
--- a/commit.c
+++ b/commit.c
@@ -318,10 +318,12 @@ const void *repo_get_commit_buffer(struct repository *r,
return ret;
 }
 
-void unuse_commit_buffer(const struct commit *commit, const void *buffer)
+void repo_unuse_commit_buffer(struct repository *r,
+ const struct commit *commit,
+ const void *buffer)
 {
struct commit_buffer *v = buffer_slab_peek(
-   the_repository->parsed_objects->buffer_slab, commit);
+   r->parsed_objects->buffer_slab, commit);
if (!(v && v->buffer == buffer))
free((void *)buffer);
 }
diff --git a/commit.h b/commit.h
index b8437f66e2..f3ef497723 100644
--- a/commit.h
+++ b/commit.h
@@ -130,7 +130,12 @@ const void *repo_get_commit_buffer(struct repository *r,
  * from an earlier call to get_commit_buffer.  The buffer may or may not be
  * freed by this call; callers should not access the memory afterwards.
  */
-void unuse_commit_buffer(const struct commit *, const void *buffer);
+void repo_unuse_commit_buffer(struct repository *r,
+ const struct commit *,
+ const void *buffer);
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define unuse_commit_buffer(c, b) repo_unuse_commit_buffer(the_repository, c, 
b)
+#endif
 
 /*
  * Free any cached object buffer associated with the commit.
diff --git a/contrib/coccinelle/the_repository.cocci 
b/contrib/coccinelle/the_repository.cocci
index 4018e6eaf7..516f19ffee 100644
--- a/contrib/coccinelle/the_repository.cocci
+++ b/contrib/coccinelle/the_repository.cocci
@@ -115,3 +115,11 @@ expression F;
 - get_commit_buffer(
 + repo_get_commit_buffer(the_repository,
   E, F);
+
+@@
+expression E;
+expression F;
+@@
+- unuse_commit_buffer(
++ repo_unuse_commit_buffer(the_repository,
+  E, F);
-- 
2.19.0



[PATCH 17/19] pretty: prepare format_commit_message to handle arbitrary repositories

2018-10-16 Thread Stefan Beller
Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 contrib/coccinelle/the_repository.cocci | 10 ++
 pretty.c| 15 ---
 pretty.h|  7 ++-
 3 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/contrib/coccinelle/the_repository.cocci 
b/contrib/coccinelle/the_repository.cocci
index f5b42cfc62..2ee702ecf7 100644
--- a/contrib/coccinelle/the_repository.cocci
+++ b/contrib/coccinelle/the_repository.cocci
@@ -132,3 +132,13 @@ expression G;
 - logmsg_reencode(
 + repo_logmsg_reencode(the_repository,
   E, F, G);
+
+@@
+expression E;
+expression F;
+expression G;
+expression H;
+@@
+- format_commit_message(
++ repo_format_commit_message(the_repository,
+  E, F, G, H);
diff --git a/pretty.c b/pretty.c
index 26e44472bb..948f5346cf 100644
--- a/pretty.c
+++ b/pretty.c
@@ -1505,9 +1505,10 @@ void userformat_find_requirements(const char *fmt, 
struct userformat_want *w)
strbuf_release();
 }
 
-void format_commit_message(const struct commit *commit,
-  const char *format, struct strbuf *sb,
-  const struct pretty_print_context *pretty_ctx)
+void repo_format_commit_message(struct repository *r,
+   const struct commit *commit,
+   const char *format, struct strbuf *sb,
+   const struct pretty_print_context *pretty_ctx)
 {
struct format_commit_context context;
const char *output_enc = pretty_ctx->output_encoding;
@@ -1521,9 +1522,9 @@ void format_commit_message(const struct commit *commit,
 * convert a commit message to UTF-8 first
 * as far as 'format_commit_item' assumes it in UTF-8
 */
-   context.message = logmsg_reencode(commit,
- _encoding,
- utf8);
+   context.message = repo_logmsg_reencode(r, commit,
+  _encoding,
+  utf8);
 
strbuf_expand(sb, format, format_commit_item, );
rewrap_message_tail(sb, , 0, 0, 0);
@@ -1547,7 +1548,7 @@ void format_commit_message(const struct commit *commit,
}
 
free(context.commit_encoding);
-   unuse_commit_buffer(commit, context.message);
+   repo_unuse_commit_buffer(r, commit, context.message);
 }
 
 static void pp_header(struct pretty_print_context *pp,
diff --git a/pretty.h b/pretty.h
index 7359d318a9..e6625269cf 100644
--- a/pretty.h
+++ b/pretty.h
@@ -103,9 +103,14 @@ void pp_remainder(struct pretty_print_context *pp, const 
char **msg_p,
  * Put the result to "sb".
  * Please use this function for custom formats.
  */
-void format_commit_message(const struct commit *commit,
+void repo_format_commit_message(struct repository *r,
+   const struct commit *commit,
const char *format, struct strbuf *sb,
const struct pretty_print_context *context);
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define format_commit_message(c, f, s, con) \
+   repo_format_commit_message(the_repository, c, f, s, con)
+#endif
 
 /*
  * Parse given arguments from "arg", check it for correctness and
-- 
2.19.0



[PATCH 18/19] submodule: use submodule repos for object lookup

2018-10-16 Thread Stefan Beller
This converts the 'show_submodule_header' function to use
the repository API properly, such that the submodule objects
are not added to the main object store.

Signed-off-by: Stefan Beller 
---
 submodule.c | 75 ++---
 1 file changed, 60 insertions(+), 15 deletions(-)

diff --git a/submodule.c b/submodule.c
index 4ee69cc014..7305ae2e10 100644
--- a/submodule.c
+++ b/submodule.c
@@ -444,7 +444,7 @@ static int prepare_submodule_summary(struct rev_info *rev, 
const char *path,
return prepare_revision_walk(rev);
 }
 
-static void print_submodule_summary(struct rev_info *rev, struct diff_options 
*o)
+static void print_submodule_summary(struct repository *r, struct rev_info 
*rev, struct diff_options *o)
 {
static const char format[] = "  %m %s";
struct strbuf sb = STRBUF_INIT;
@@ -455,7 +455,8 @@ static void print_submodule_summary(struct rev_info *rev, 
struct diff_options *o
ctx.date_mode = rev->date_mode;
ctx.output_encoding = get_log_output_encoding();
strbuf_setlen(, 0);
-   format_commit_message(commit, format, , );
+   repo_format_commit_message(r, commit, format, ,
+ );
strbuf_addch(, '\n');
if (commit->object.flags & SYMMETRIC_LEFT)
diff_emit_submodule_del(o, sb.buf);
@@ -482,14 +483,46 @@ void prepare_submodule_repo_env(struct argv_array *out)
 DEFAULT_GIT_DIR_ENVIRONMENT);
 }
 
-/* Helper function to display the submodule header line prior to the full
- * summary output. If it can locate the submodule objects directory it will
- * attempt to lookup both the left and right commits and put them into the
- * left and right pointers.
+/*
+ * Initialize 'out' based on the provided submodule path.
+ *
+ * Unlike repo_submodule_init, this tolerates submodules not present
+ * in .gitmodules. This function exists only to preserve historical behavior,
+ *
+ * Returns 0 on success, -1 when the submodule is not present.
+ */
+static int open_submodule(struct repository *out, const char *path)
+{
+   struct strbuf sb = STRBUF_INIT;
+
+   if (submodule_to_gitdir(, path) || repo_init(out, sb.buf, NULL)) {
+   strbuf_release();
+   return -1;
+   }
+
+   out->submodule_prefix = xstrdup(path);
+   out->submodule_prefix = xstrfmt("%s%s/",
+   the_repository->submodule_prefix ?
+   the_repository->submodule_prefix :
+   "", path);
+
+   strbuf_release();
+   return 0;
+}
+
+/*
+ * Helper function to display the submodule header line prior to the full
+ * summary output.
+ *
+ * If it can locate the submodule git directory it will create a repository
+ * handle for the submodule and lookup both the left and right commits and
+ * put them into the left and right pointers.
  */
-static void show_submodule_header(struct diff_options *o, const char *path,
+static void show_submodule_header(struct diff_options *o,
+   const char *path,
struct object_id *one, struct object_id *two,
unsigned dirty_submodule,
+   struct repository *sub,
struct commit **left, struct commit **right,
struct commit_list **merge_bases)
 {
@@ -508,7 +541,7 @@ static void show_submodule_header(struct diff_options *o, 
const char *path,
else if (is_null_oid(two))
message = "(submodule deleted)";
 
-   if (add_submodule_odb(path)) {
+   if (!sub) {
if (!message)
message = "(commits not present)";
goto output_header;
@@ -518,8 +551,8 @@ static void show_submodule_header(struct diff_options *o, 
const char *path,
 * Attempt to lookup the commit references, and determine if this is
 * a fast forward or fast backwards update.
 */
-   *left = lookup_commit_reference(the_repository, one);
-   *right = lookup_commit_reference(the_repository, two);
+   *left = lookup_commit_reference(sub, one);
+   *right = lookup_commit_reference(sub, two);
 
/*
 * Warn about missing commits in the submodule project, but only if
@@ -529,7 +562,7 @@ static void show_submodule_header(struct diff_options *o, 
const char *path,
 (!is_null_oid(two) && !*right))
message = "(commits not present)";
 
-   *merge_bases = get_merge_bases(*left, *right);
+   *merge_bases = repo_get_merge_bases(sub, *left, *right);
if (*merge_bases) {
if ((*merge_bases)->item == *left)
fast_forward = 1;
@@ -563,16 +596,20 @@ void show_submodule_summary(struct diff_options *o, const 
char *path,
struct rev_info rev;
struct commit *left = NULL, *right = NULL;

[PATCH 19/19] submodule: don't add submodule as odb for push

2018-10-16 Thread Stefan Beller
In push_submodule(), because we do not actually need access to objects
in the submodule, do not invoke add_submodule_odb().
(for_each_remote_ref_submodule() does not require access to those
objects, and the actual push is done by spawning another process,
which handles object access by itself.)

Signed-off-by: Stefan Beller 
---
 submodule.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/submodule.c b/submodule.c
index 7305ae2e10..e623e6bf7f 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1024,9 +1024,6 @@ static int push_submodule(const char *path,
  const struct string_list *push_options,
  int dry_run)
 {
-   if (add_submodule_odb(path))
-   return 1;
-
if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) {
struct child_process cp = CHILD_PROCESS_INIT;
argv_array_push(, "push");
-- 
2.19.0



[PATCH 08/19] commit-reach.c: allow paint_down_to_common to handle arbitrary repositories

2018-10-16 Thread Stefan Beller
As the function is file local and not widely used, migrate it all at once.

Signed-off-by: Stefan Beller 
---
 commit-reach.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 5a845440a9..2f7ae20bd4 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -30,7 +30,8 @@ static int queue_has_nonstale(struct prio_queue *queue)
 }
 
 /* all input commits in one and twos[] must have been parsed! */
-static struct commit_list *paint_down_to_common(struct commit *one, int n,
+static struct commit_list *paint_down_to_common(struct repository *r,
+   struct commit *one, int n,
struct commit **twos,
int min_generation)
 {
@@ -80,7 +81,7 @@ static struct commit_list *paint_down_to_common(struct commit 
*one, int n,
parents = parents->next;
if ((p->object.flags & flags) == flags)
continue;
-   if (parse_commit(p))
+   if (repo_parse_commit(r, p))
return NULL;
p->object.flags |= flags;
prio_queue_put(, p);
@@ -113,7 +114,7 @@ static struct commit_list *merge_bases_many(struct commit 
*one, int n, struct co
return NULL;
}
 
-   list = paint_down_to_common(one, n, twos, 0);
+   list = paint_down_to_common(the_repository, one, n, twos, 0);
 
while (list) {
struct commit *commit = pop_commit();
@@ -184,8 +185,8 @@ static int remove_redundant(struct commit **array, int cnt)
if (array[j]->generation < min_generation)
min_generation = array[j]->generation;
}
-   common = paint_down_to_common(array[i], filled, work,
- min_generation);
+   common = paint_down_to_common(the_repository, array[i], filled,
+ work, min_generation);
if (array[i]->object.flags & PARENT2)
redundant[i] = 1;
for (j = 0; j < filled; j++)
@@ -319,7 +320,9 @@ int in_merge_bases_many(struct commit *commit, int 
nr_reference, struct commit *
if (commit->generation > min_generation)
return ret;
 
-   bases = paint_down_to_common(commit, nr_reference, reference, 
commit->generation);
+   bases = paint_down_to_common(the_repository, commit,
+nr_reference, reference,
+commit->generation);
if (commit->object.flags & PARENT2)
ret = 1;
clear_commit_marks(commit, all_flags);
-- 
2.19.0



[PATCH 05/19] object-store: prepare has_{sha1, object}_file[_with_flags] to handle arbitrary repositories

2018-10-16 Thread Stefan Beller
Signed-off-by: Stefan Beller 
---
 contrib/coccinelle/the_repository.cocci | 29 +
 object-store.h  | 22 ++-
 sha1-file.c | 15 -
 3 files changed, 55 insertions(+), 11 deletions(-)

diff --git a/contrib/coccinelle/the_repository.cocci 
b/contrib/coccinelle/the_repository.cocci
index 3c7fa70502..46f3a1b23a 100644
--- a/contrib/coccinelle/the_repository.cocci
+++ b/contrib/coccinelle/the_repository.cocci
@@ -11,3 +11,32 @@ expression G;
 + repo_read_object_file(the_repository,
   E, F, G)
 
+@@
+expression E;
+@@
+- has_sha1_file(
++ repo_has_sha1_file(the_repository,
+  E)
+
+@@
+expression E;
+expression F;
+@@
+- has_sha1_file_with_flags(
++ repo_has_sha1_file_with_flags(the_repository,
+  E)
+
+@@
+expression E;
+@@
+- has_object_file(
++ repo_has_object_file(the_repository,
+  E)
+
+@@
+expression E;
+expression F;
+@@
+- has_object_file_with_flags(
++ repo_has_object_file_with_flags(the_repository,
+  E)
diff --git a/object-store.h b/object-store.h
index 41ceebca48..e1c68d0774 100644
--- a/object-store.h
+++ b/object-store.h
@@ -197,15 +197,27 @@ int read_loose_object(const char *path,
  * object_info. OBJECT_INFO_SKIP_CACHED is automatically set; pass
  * nonzero flags to also set other flags.
  */
-extern int has_sha1_file_with_flags(const unsigned char *sha1, int flags);
-static inline int has_sha1_file(const unsigned char *sha1)
+int repo_has_sha1_file_with_flags(struct repository *r,
+ const unsigned char *sha1, int flags);
+static inline int repo_has_sha1_file(struct repository *r,
+const unsigned char *sha1)
 {
-   return has_sha1_file_with_flags(sha1, 0);
+   return repo_has_sha1_file_with_flags(r, sha1, 0);
 }
 
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define has_sha1_file_with_flags(sha1, flags) 
repo_has_sha1_file_with_flags(the_repository, sha1, flags)
+#define has_sha1_file(sha1) repo_has_sha1_file(the_repository, sha1)
+#endif
+
 /* Same as the above, except for struct object_id. */
-extern int has_object_file(const struct object_id *oid);
-extern int has_object_file_with_flags(const struct object_id *oid, int flags);
+int repo_has_object_file(struct repository *r, const struct object_id *oid);
+int repo_has_object_file_with_flags(struct repository *r,
+   const struct object_id *oid, int flags);
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define has_object_file(oid) repo_has_object_file(the_repository, oid)
+#define has_object_file_with_flags(oid, flags) 
repo_has_object_file_with_flags(the_repository, oid, flags)
+#endif
 
 /*
  * Return true iff an alternate object database has a loose object
diff --git a/sha1-file.c b/sha1-file.c
index ce47524679..6153cedcf3 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -1768,24 +1768,27 @@ int force_object_loose(const struct object_id *oid, 
time_t mtime)
return ret;
 }
 
-int has_sha1_file_with_flags(const unsigned char *sha1, int flags)
+int repo_has_sha1_file_with_flags(struct repository *r,
+ const unsigned char *sha1, int flags)
 {
struct object_id oid;
if (!startup_info->have_repository)
return 0;
hashcpy(oid.hash, sha1);
-   return oid_object_info_extended(the_repository, , NULL,
+   return oid_object_info_extended(r, , NULL,
flags | OBJECT_INFO_SKIP_CACHED) >= 0;
 }
 
-int has_object_file(const struct object_id *oid)
+int repo_has_object_file(struct repository *r,
+const struct object_id *oid)
 {
-   return has_sha1_file(oid->hash);
+   return repo_has_sha1_file(r, oid->hash);
 }
 
-int has_object_file_with_flags(const struct object_id *oid, int flags)
+int repo_has_object_file_with_flags(struct repository *r,
+   const struct object_id *oid, int flags)
 {
-   return has_sha1_file_with_flags(oid->hash, flags);
+   return repo_has_sha1_file_with_flags(r, oid->hash, flags);
 }
 
 static void check_tree(const void *buf, size_t size)
-- 
2.19.0



[PATCH 07/19] commit: allow parse_commit* to handle arbitrary repositories

2018-10-16 Thread Stefan Beller
Just like the previous commit, parse_commit and friends are used a lot
and are found in new patches, so we cannot change their signature easily.

Re-introduce these function prefixed with 'repo_' that take a repository
argument and keep the original as a shallow macro.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 commit.c| 18 +++---
 commit.h| 17 +
 contrib/coccinelle/the_repository.cocci | 24 
 3 files changed, 48 insertions(+), 11 deletions(-)

diff --git a/commit.c b/commit.c
index 5781f9bc67..fe942754e2 100644
--- a/commit.c
+++ b/commit.c
@@ -443,7 +443,10 @@ int parse_commit_buffer(struct repository *r, struct 
commit *item, const void *b
return 0;
 }
 
-int parse_commit_internal(struct commit *item, int quiet_on_missing, int 
use_commit_graph)
+int repo_parse_commit_internal(struct repository *r,
+  struct commit *item,
+  int quiet_on_missing,
+  int use_commit_graph)
 {
enum object_type type;
void *buffer;
@@ -454,9 +457,9 @@ int parse_commit_internal(struct commit *item, int 
quiet_on_missing, int use_com
return -1;
if (item->object.parsed)
return 0;
-   if (use_commit_graph && parse_commit_in_graph(the_repository, item))
+   if (use_commit_graph && parse_commit_in_graph(r, item))
return 0;
-   buffer = read_object_file(>object.oid, , );
+   buffer = repo_read_object_file(r, >object.oid, , );
if (!buffer)
return quiet_on_missing ? -1 :
error("Could not read %s",
@@ -467,18 +470,19 @@ int parse_commit_internal(struct commit *item, int 
quiet_on_missing, int use_com
 oid_to_hex(>object.oid));
}
 
-   ret = parse_commit_buffer(the_repository, item, buffer, size, 0);
+   ret = parse_commit_buffer(r, item, buffer, size, 0);
if (save_commit_buffer && !ret) {
-   set_commit_buffer(the_repository, item, buffer, size);
+   set_commit_buffer(r, item, buffer, size);
return 0;
}
free(buffer);
return ret;
 }
 
-int parse_commit_gently(struct commit *item, int quiet_on_missing)
+int repo_parse_commit_gently(struct repository *r,
+struct commit *item, int quiet_on_missing)
 {
-   return parse_commit_internal(item, quiet_on_missing, 1);
+   return repo_parse_commit_internal(r, item, quiet_on_missing, 1);
 }
 
 void parse_commit_or_die(struct commit *item)
diff --git a/commit.h b/commit.h
index e2c99d9b04..58a0c5409b 100644
--- a/commit.h
+++ b/commit.h
@@ -79,12 +79,21 @@ struct commit *lookup_commit_reference_by_name(const char 
*name);
 struct commit *lookup_commit_or_die(const struct object_id *oid, const char 
*ref_name);
 
 int parse_commit_buffer(struct repository *r, struct commit *item, const void 
*buffer, unsigned long size, int check_graph);
-int parse_commit_internal(struct commit *item, int quiet_on_missing, int 
use_commit_graph);
-int parse_commit_gently(struct commit *item, int quiet_on_missing);
-static inline int parse_commit(struct commit *item)
+int repo_parse_commit_internal(struct repository *r, struct commit *item,
+  int quiet_on_missing, int use_commit_graph);
+int repo_parse_commit_gently(struct repository *r,
+struct commit *item,
+int quiet_on_missing);
+static inline int repo_parse_commit(struct repository *r, struct commit *item)
 {
-   return parse_commit_gently(item, 0);
+   return repo_parse_commit_gently(r, item, 0);
 }
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define parse_commit_internal(item, quiet, use) 
repo_parse_commit_internal(the_repository, item, quiet, use)
+#define parse_commit_gently(item, quiet) 
repo_parse_commit_gently(the_repository, item, quiet)
+#define parse_commit(item) repo_parse_commit(the_repository, item)
+#endif
+
 void parse_commit_or_die(struct commit *item);
 
 struct buffer_slab;
diff --git a/contrib/coccinelle/the_repository.cocci 
b/contrib/coccinelle/the_repository.cocci
index 46f3a1b23a..b185fe0a1d 100644
--- a/contrib/coccinelle/the_repository.cocci
+++ b/contrib/coccinelle/the_repository.cocci
@@ -40,3 +40,27 @@ expression F;
 - has_object_file_with_flags(
 + repo_has_object_file_with_flags(the_repository,
   E)
+
+@@
+expression E;
+expression F;
+expression G;
+@@
+- parse_commit_internal(
++ repo_parse_commit_internal(the_repository,
+  E, F, G)
+
+@@
+expression E;
+expression F;
+@@
+- parse_commit_gently(
++ repo_parse_commit_gently(the_repository,
+  E, F)
+
+@@
+expression E;
+@@
+- parse_commit(
++ repo_parse_commit(the_repository,
+  E)
-- 
2.19.0



[PATCH 12/19] commit-reach: prepare get_merge_bases to handle arbitrary repositories

2018-10-16 Thread Stefan Beller
Similarly to previous patches, the get_merge_base functions are used
often in the code base, which makes migrating them hard.

Implement the new functions, prefixed with 'repo_' and hide the old
functions behind a wrapper macro.

Signed-off-by: Stefan Beller 
---
 commit-reach.c  | 24 +-
 commit-reach.h  | 26 +++-
 contrib/coccinelle/the_repository.cocci | 27 +
 3 files changed, 57 insertions(+), 20 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index df93274966..aa3cde5c8a 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -255,23 +255,27 @@ static struct commit_list *get_merge_bases_many_0(struct 
repository *r,
return result;
 }
 
-struct commit_list *get_merge_bases_many(struct commit *one,
-int n,
-struct commit **twos)
+struct commit_list *repo_get_merge_bases_many(struct repository *r,
+ struct commit *one,
+ int n,
+ struct commit **twos)
 {
-   return get_merge_bases_many_0(the_repository, one, n, twos, 1);
+   return get_merge_bases_many_0(r, one, n, twos, 1);
 }
 
-struct commit_list *get_merge_bases_many_dirty(struct commit *one,
-  int n,
-  struct commit **twos)
+struct commit_list *repo_get_merge_bases_many_dirty(struct repository *r,
+   struct commit *one,
+   int n,
+   struct commit **twos)
 {
-   return get_merge_bases_many_0(the_repository, one, n, twos, 0);
+   return get_merge_bases_many_0(r, one, n, twos, 0);
 }
 
-struct commit_list *get_merge_bases(struct commit *one, struct commit *two)
+struct commit_list *repo_get_merge_bases(struct repository *r,
+struct commit *one,
+struct commit *two)
 {
-   return get_merge_bases_many_0(the_repository, one, 1, , 1);
+   return get_merge_bases_many_0(r, one, 1, , 1);
 }
 
 /*
diff --git a/commit-reach.h b/commit-reach.h
index 7d313e2975..52667d64ac 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -8,17 +8,23 @@ struct commit_list;
 struct contains_cache;
 struct ref_filter;
 
-struct commit_list *get_merge_bases_many(struct commit *one,
-int n,
-struct commit **twos);
-struct commit_list *get_merge_bases_many_dirty(struct commit *one,
-  int n,
-  struct commit **twos);
-struct commit_list *get_merge_bases(struct commit *one, struct commit *two);
-struct commit_list *get_octopus_merge_bases(struct commit_list *in);
-
+struct commit_list *repo_get_merge_bases(struct repository *r,
+struct commit *rev1,
+struct commit *rev2);
+struct commit_list *repo_get_merge_bases_many(struct repository *r,
+ struct commit *one, int n,
+ struct commit **twos);
 /* To be used only when object flags after this call no longer matter */
-struct commit_list *get_merge_bases_many_dirty(struct commit *one, int n, 
struct commit **twos);
+struct commit_list *repo_get_merge_bases_many_dirty(struct repository *r,
+   struct commit *one, int n,
+   struct commit **twos);
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define get_merge_bases(r1, r2)   repo_get_merge_bases(the_repository, 
r1, r2)
+#define get_merge_bases_many(one, n, two) 
repo_get_merge_bases_many(the_repository, one, n, two)
+#define get_merge_bases_many_dirty(one, n, twos) 
repo_get_merge_bases_many_dirty(the_repository, one, n, twos)
+#endif
+
+struct commit_list *get_octopus_merge_bases(struct commit_list *in);
 
 int is_descendant_of(struct commit *commit, struct commit_list *with_commit);
 int in_merge_bases_many(struct commit *commit, int nr_reference, struct commit 
**reference);
diff --git a/contrib/coccinelle/the_repository.cocci 
b/contrib/coccinelle/the_repository.cocci
index b185fe0a1d..5e037fe428 100644
--- a/contrib/coccinelle/the_repository.cocci
+++ b/contrib/coccinelle/the_repository.cocci
@@ -64,3 +64,30 @@ expression E;
 - parse_commit(
 + repo_parse_commit(the_repository,
   E)
+
+@@
+expression E;
+expression F;
+@@
+- get_merge_bases(
++ repo_get_merge_bases(the_repository,
+  E, F);
+
+@@
+expression E;
+expression F;
+expression G;
+@@
+- get_merge_bases_many(
++ 

[PATCH 09/19] commit-reach.c: allow merge_bases_many to handle arbitrary repositories

2018-10-16 Thread Stefan Beller
Signed-off-by: Stefan Beller 
---
 commit-reach.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 2f7ae20bd4..aacd8cdc1e 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -92,7 +92,9 @@ static struct commit_list *paint_down_to_common(struct 
repository *r,
return result;
 }
 
-static struct commit_list *merge_bases_many(struct commit *one, int n, struct 
commit **twos)
+static struct commit_list *merge_bases_many(struct repository *r,
+   struct commit *one, int n,
+   struct commit **twos)
 {
struct commit_list *list = NULL;
struct commit_list *result = NULL;
@@ -107,14 +109,14 @@ static struct commit_list *merge_bases_many(struct commit 
*one, int n, struct co
return commit_list_insert(one, );
}
 
-   if (parse_commit(one))
+   if (repo_parse_commit(r, one))
return NULL;
for (i = 0; i < n; i++) {
-   if (parse_commit(twos[i]))
+   if (repo_parse_commit(r, twos[i]))
return NULL;
}
 
-   list = paint_down_to_common(the_repository, one, n, twos, 0);
+   list = paint_down_to_common(r, one, n, twos, 0);
 
while (list) {
struct commit *commit = pop_commit();
@@ -221,7 +223,7 @@ static struct commit_list *get_merge_bases_many_0(struct 
commit *one,
struct commit_list *result;
int cnt, i;
 
-   result = merge_bases_many(one, n, twos);
+   result = merge_bases_many(the_repository, one, n, twos);
for (i = 0; i < n; i++) {
if (one == twos[i])
return result;
-- 
2.19.0



[PATCH 11/19] commit-reach.c: allow get_merge_bases_many_0 to handle arbitrary repositories

2018-10-16 Thread Stefan Beller
Signed-off-by: Stefan Beller 
---
 commit-reach.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index bce4169f1f..df93274966 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -213,7 +213,8 @@ static int remove_redundant(struct repository *r, struct 
commit **array, int cnt
return filled;
 }
 
-static struct commit_list *get_merge_bases_many_0(struct commit *one,
+static struct commit_list *get_merge_bases_many_0(struct repository *r,
+ struct commit *one,
  int n,
  struct commit **twos,
  int cleanup)
@@ -223,7 +224,7 @@ static struct commit_list *get_merge_bases_many_0(struct 
commit *one,
struct commit_list *result;
int cnt, i;
 
-   result = merge_bases_many(the_repository, one, n, twos);
+   result = merge_bases_many(r, one, n, twos);
for (i = 0; i < n; i++) {
if (one == twos[i])
return result;
@@ -246,7 +247,7 @@ static struct commit_list *get_merge_bases_many_0(struct 
commit *one,
clear_commit_marks(one, all_flags);
clear_commit_marks_many(n, twos, all_flags);
 
-   cnt = remove_redundant(the_repository, rslt, cnt);
+   cnt = remove_redundant(r, rslt, cnt);
result = NULL;
for (i = 0; i < cnt; i++)
commit_list_insert_by_date(rslt[i], );
@@ -258,19 +259,19 @@ struct commit_list *get_merge_bases_many(struct commit 
*one,
 int n,
 struct commit **twos)
 {
-   return get_merge_bases_many_0(one, n, twos, 1);
+   return get_merge_bases_many_0(the_repository, one, n, twos, 1);
 }
 
 struct commit_list *get_merge_bases_many_dirty(struct commit *one,
   int n,
   struct commit **twos)
 {
-   return get_merge_bases_many_0(one, n, twos, 0);
+   return get_merge_bases_many_0(the_repository, one, n, twos, 0);
 }
 
 struct commit_list *get_merge_bases(struct commit *one, struct commit *two)
 {
-   return get_merge_bases_many_0(one, 1, , 1);
+   return get_merge_bases_many_0(the_repository, one, 1, , 1);
 }
 
 /*
-- 
2.19.0



[PATCH 10/19] commit-reach.c: allow remove_redundant to handle arbitrary repositories

2018-10-16 Thread Stefan Beller
Signed-off-by: Stefan Beller 
---
 commit-reach.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index aacd8cdc1e..bce4169f1f 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -153,7 +153,7 @@ struct commit_list *get_octopus_merge_bases(struct 
commit_list *in)
return ret;
 }
 
-static int remove_redundant(struct commit **array, int cnt)
+static int remove_redundant(struct repository *r, struct commit **array, int 
cnt)
 {
/*
 * Some commit in the array may be an ancestor of
@@ -171,7 +171,7 @@ static int remove_redundant(struct commit **array, int cnt)
ALLOC_ARRAY(filled_index, cnt - 1);
 
for (i = 0; i < cnt; i++)
-   parse_commit(array[i]);
+   repo_parse_commit(r, array[i]);
for (i = 0; i < cnt; i++) {
struct commit_list *common;
uint32_t min_generation = array[i]->generation;
@@ -187,7 +187,7 @@ static int remove_redundant(struct commit **array, int cnt)
if (array[j]->generation < min_generation)
min_generation = array[j]->generation;
}
-   common = paint_down_to_common(the_repository, array[i], filled,
+   common = paint_down_to_common(r, array[i], filled,
  work, min_generation);
if (array[i]->object.flags & PARENT2)
redundant[i] = 1;
@@ -246,7 +246,7 @@ static struct commit_list *get_merge_bases_many_0(struct 
commit *one,
clear_commit_marks(one, all_flags);
clear_commit_marks_many(n, twos, all_flags);
 
-   cnt = remove_redundant(rslt, cnt);
+   cnt = remove_redundant(the_repository, rslt, cnt);
result = NULL;
for (i = 0; i < cnt; i++)
commit_list_insert_by_date(rslt[i], );
@@ -367,7 +367,7 @@ struct commit_list *reduce_heads(struct commit_list *heads)
p->item->object.flags &= ~STALE;
}
}
-   num_head = remove_redundant(array, num_head);
+   num_head = remove_redundant(the_repository, array, num_head);
for (i = 0; i < num_head; i++)
tail = _list_insert(array[i], tail)->next;
free(array);
-- 
2.19.0



[PATCH 02/19] packfile: allow has_packed_and_bad to handle arbitrary repositories

2018-10-16 Thread Stefan Beller
has_packed_and_bad is not widely used, so just migrate it all at once.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 packfile.c  | 5 +++--
 packfile.h  | 2 +-
 sha1-file.c | 2 +-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/packfile.c b/packfile.c
index ebcb5742ec..40085fe160 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1024,12 +1024,13 @@ void mark_bad_packed_object(struct packed_git *p, const 
unsigned char *sha1)
p->num_bad_objects++;
 }
 
-const struct packed_git *has_packed_and_bad(const unsigned char *sha1)
+const struct packed_git *has_packed_and_bad(struct repository *r,
+   const unsigned char *sha1)
 {
struct packed_git *p;
unsigned i;
 
-   for (p = the_repository->objects->packed_git; p; p = p->next)
+   for (p = r->objects->packed_git; p; p = p->next)
for (i = 0; i < p->num_bad_objects; i++)
if (!hashcmp(sha1,
 p->bad_object_sha1 + the_hash_algo->rawsz 
* i))
diff --git a/packfile.h b/packfile.h
index 630f35cf31..1d2d170bf8 100644
--- a/packfile.h
+++ b/packfile.h
@@ -136,7 +136,7 @@ extern int packed_object_info(struct repository *r,
  off_t offset, struct object_info *);
 
 extern void mark_bad_packed_object(struct packed_git *p, const unsigned char 
*sha1);
-extern const struct packed_git *has_packed_and_bad(const unsigned char *sha1);
+extern const struct packed_git *has_packed_and_bad(struct repository *r, const 
unsigned char *sha1);
 
 /*
  * Iff a pack file in the given repository contains the object named by sha1,
diff --git a/sha1-file.c b/sha1-file.c
index 647068a836..13b3c5cb79 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -1432,7 +1432,7 @@ void *read_object_file_extended(const struct object_id 
*oid,
die(_("loose object %s (stored in %s) is corrupt"),
oid_to_hex(repl), path);
 
-   if ((p = has_packed_and_bad(repl->hash)) != NULL)
+   if ((p = has_packed_and_bad(the_repository, repl->hash)) != NULL)
die(_("packed object %s (stored in %s) is corrupt"),
oid_to_hex(repl), p->pack_name);
 
-- 
2.19.0



[PATCH 01/19] sha1_file: allow read_object to read objects in arbitrary repositories

2018-10-16 Thread Stefan Beller
Allow read_object (a file local functon in sha1_file) to
handle arbitrary repositories by passing the repository down
to oid_object_info_extended.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 sha1-file.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/sha1-file.c b/sha1-file.c
index 308d5e20e2..647068a836 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -1361,7 +1361,9 @@ int oid_object_info(struct repository *r,
return type;
 }
 
-static void *read_object(const unsigned char *sha1, enum object_type *type,
+static void *read_object(struct repository *r,
+const unsigned char *sha1,
+enum object_type *type,
 unsigned long *size)
 {
struct object_id oid;
@@ -1373,7 +1375,7 @@ static void *read_object(const unsigned char *sha1, enum 
object_type *type,
 
hashcpy(oid.hash, sha1);
 
-   if (oid_object_info_extended(the_repository, , , 0) < 0)
+   if (oid_object_info_extended(r, , , 0) < 0)
return NULL;
return content;
 }
@@ -1414,7 +1416,7 @@ void *read_object_file_extended(const struct object_id 
*oid,
lookup_replace_object(the_repository, oid) : oid;
 
errno = 0;
-   data = read_object(repl->hash, type, size);
+   data = read_object(the_repository, repl->hash, type, size);
if (data)
return data;
 
@@ -1755,7 +1757,7 @@ int force_object_loose(const struct object_id *oid, 
time_t mtime)
 
if (has_loose_object(oid))
return 0;
-   buf = read_object(oid->hash, , );
+   buf = read_object(the_repository, oid->hash, , );
if (!buf)
return error(_("cannot read sha1_file for %s"), 
oid_to_hex(oid));
hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %lu", type_name(type), len) + 
1;
-- 
2.19.0



[PATCH 03/19] object-store: allow read_object_file_extended to read from arbitrary repositories

2018-10-16 Thread Stefan Beller
read_object_file_extended is not widely used, so migrate it all at once.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 object-store.h |  5 +++--
 sha1-file.c| 11 ++-
 streaming.c|  2 +-
 3 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/object-store.h b/object-store.h
index 67e66227d9..6bb0ccbf05 100644
--- a/object-store.h
+++ b/object-store.h
@@ -146,12 +146,13 @@ void sha1_file_name(struct repository *r, struct strbuf 
*buf, const unsigned cha
 
 void *map_sha1_file(struct repository *r, const unsigned char *sha1, unsigned 
long *size);
 
-extern void *read_object_file_extended(const struct object_id *oid,
+extern void *read_object_file_extended(struct repository *r,
+  const struct object_id *oid,
   enum object_type *type,
   unsigned long *size, int lookup_replace);
 static inline void *read_object_file(const struct object_id *oid, enum 
object_type *type, unsigned long *size)
 {
-   return read_object_file_extended(oid, type, size, 1);
+   return read_object_file_extended(the_repository, oid, type, size, 1);
 }
 
 /* Read and unpack an object file into memory, write memory to an object file 
*/
diff --git a/sha1-file.c b/sha1-file.c
index 13b3c5cb79..ce47524679 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -1403,7 +1403,8 @@ int pretend_object_file(void *buf, unsigned long len, 
enum object_type type,
  * deal with them should arrange to call read_object() and give error
  * messages themselves.
  */
-void *read_object_file_extended(const struct object_id *oid,
+void *read_object_file_extended(struct repository *r,
+   const struct object_id *oid,
enum object_type *type,
unsigned long *size,
int lookup_replace)
@@ -1413,10 +1414,10 @@ void *read_object_file_extended(const struct object_id 
*oid,
const char *path;
struct stat st;
const struct object_id *repl = lookup_replace ?
-   lookup_replace_object(the_repository, oid) : oid;
+   lookup_replace_object(r, oid) : oid;
 
errno = 0;
-   data = read_object(the_repository, repl->hash, type, size);
+   data = read_object(r, repl->hash, type, size);
if (data)
return data;
 
@@ -1428,11 +1429,11 @@ void *read_object_file_extended(const struct object_id 
*oid,
die(_("replacement %s not found for %s"),
oid_to_hex(repl), oid_to_hex(oid));
 
-   if (!stat_sha1_file(the_repository, repl->hash, , ))
+   if (!stat_sha1_file(r, repl->hash, , ))
die(_("loose object %s (stored in %s) is corrupt"),
oid_to_hex(repl), path);
 
-   if ((p = has_packed_and_bad(the_repository, repl->hash)) != NULL)
+   if ((p = has_packed_and_bad(r, repl->hash)) != NULL)
die(_("packed object %s (stored in %s) is corrupt"),
oid_to_hex(repl), p->pack_name);
 
diff --git a/streaming.c b/streaming.c
index d1e6b2dce6..c843a1230f 100644
--- a/streaming.c
+++ b/streaming.c
@@ -490,7 +490,7 @@ static struct stream_vtbl incore_vtbl = {
 
 static open_method_decl(incore)
 {
-   st->u.incore.buf = read_object_file_extended(oid, type, >size, 0);
+   st->u.incore.buf = read_object_file_extended(the_repository, oid, type, 
>size, 0);
st->u.incore.read_ptr = 0;
st->vtbl = _vtbl;
 
-- 
2.19.0



[PATCH 06/19] object: parse_object to honor its repository argument

2018-10-16 Thread Stefan Beller
In 8e4b0b6047 (object.c: allow parse_object to handle
arbitrary repositories, 2018-06-28), we forgot to pass the
repository down to the read_object_file.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 object.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/object.c b/object.c
index 51c4594515..a32afa14df 100644
--- a/object.c
+++ b/object.c
@@ -259,8 +259,8 @@ struct object *parse_object(struct repository *r, const 
struct object_id *oid)
if (obj && obj->parsed)
return obj;
 
-   if ((obj && obj->type == OBJ_BLOB && has_object_file(oid)) ||
-   (!obj && has_object_file(oid) &&
+   if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
+   (!obj && repo_has_object_file(r, oid) &&
 oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
if (check_object_signature(repl, NULL, 0, NULL) < 0) {
error(_("sha1 mismatch %s"), oid_to_hex(oid));
@@ -270,7 +270,7 @@ struct object *parse_object(struct repository *r, const 
struct object_id *oid)
return lookup_object(r, oid->hash);
}
 
-   buffer = read_object_file(oid, , );
+   buffer = repo_read_object_file(r, oid, , );
if (buffer) {
if (check_object_signature(repl, buffer, size, type_name(type)) 
< 0) {
free(buffer);
-- 
2.19.0



[PATCH 04/19] object-store: prepare read_object_file to deal with arbitrary repositories

2018-10-16 Thread Stefan Beller
As read_object_file is a widely used function (which is also regularly used
in new code in flight between master..pu), changing its signature is painful
is hard, as other series in flight rely on the original signature. It would
burden the maintainer if we'd just change the signature.

Introduce repo_read_object_file which takes the repository argument, and
hide the original read_object_file as a macro behind
NO_THE_REPOSITORY_COMPATIBILITY_MACROS, similar to
e675765235 (diff.c: remove implicit dependency on the_index, 2018-09-21)

Add a coccinelle patch to convert existing callers, but do not apply
the resulting patch from 'make coccicheck' to keep the diff of this
patch small.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 contrib/coccinelle/the_repository.cocci | 13 +
 object-store.h  | 10 --
 2 files changed, 21 insertions(+), 2 deletions(-)
 create mode 100644 contrib/coccinelle/the_repository.cocci

diff --git a/contrib/coccinelle/the_repository.cocci 
b/contrib/coccinelle/the_repository.cocci
new file mode 100644
index 00..3c7fa70502
--- /dev/null
+++ b/contrib/coccinelle/the_repository.cocci
@@ -0,0 +1,13 @@
+// This file is used for the ongoing refactoring of
+// bringing the index or repository struct in all of
+// our code base.
+
+@@
+expression E;
+expression F;
+expression G;
+@@
+- read_object_file(
++ repo_read_object_file(the_repository,
+  E, F, G)
+
diff --git a/object-store.h b/object-store.h
index 6bb0ccbf05..41ceebca48 100644
--- a/object-store.h
+++ b/object-store.h
@@ -150,10 +150,16 @@ extern void *read_object_file_extended(struct repository 
*r,
   const struct object_id *oid,
   enum object_type *type,
   unsigned long *size, int lookup_replace);
-static inline void *read_object_file(const struct object_id *oid, enum 
object_type *type, unsigned long *size)
+static inline void *repo_read_object_file(struct repository *r,
+ const struct object_id *oid,
+ enum object_type *type,
+ unsigned long *size)
 {
-   return read_object_file_extended(the_repository, oid, type, size, 1);
+   return read_object_file_extended(r, oid, type, size, 1);
 }
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define read_object_file(oid, type, size) 
repo_read_object_file(the_repository, oid, type, size)
+#endif
 
 /* Read and unpack an object file into memory, write memory to an object file 
*/
 int oid_object_info(struct repository *r, const struct object_id *, unsigned 
long *);
-- 
2.19.0



[PATCH 00/19] Bring more repository handles into our code base

2018-10-16 Thread Stefan Beller
This rerolls sb/more-repo-in-api.
It applies on nd/the-index merged with ds/reachable and is available via
git fetch https://github.com/stefanbeller/git object-store-final-3

I addressed all of Jonathans comments, see range-diff below;
the last patch (applying the semantic patches) has been omitted as that
would produce a lot of merge conflicts. Without that patch, this merges
cleanly to next.

As for when to apply the semantic patches, I wondered if we would prefer a 
dirty merge
(created via "make coccicheck && git apply 
contrib/coccinelle/the_repository.cocci.patch")
of the semantic patches, or if we'd actually trickle in the changes over some 
time?

Thanks,
Stefan

An earlier RFC was sent out at
https://public-inbox.org/git/20181011211754.31369-1-sbel...@google.com/ and 
said:
This applies on nd/the-index (b3c7eef9b05) and is the logical continuation 
of
the object store series, which I sent over the last year.

The previous series did take a very slow and pedantic approach,
using a #define trick, see cfc62fc98c for details, but it turns out,
that it doesn't work:
   When changing the signature of widely used functions, it burdens the
   maintainer in resolving the semantic conflicts.
   
   In the orginal approach this was called a feature, as then we can ensure
   that not bugs creep into the code base during the merge window (while 
such
   a refactoring series wanders from pu to master). It turns out this
   was not well received and was just burdensome.
   
   The #define trick doesn't buy us much to begin with when dealing with
   non-merge-conflicts.  For example, see deref_tag at tag.c:68, which got
   the repository argument in 286d258d4f (tag.c: allow deref_tag to handle
   arbitrary repositories, 2018-06-28) but lost its property of working on 
any
   repository while 8c4cc32689 (tag: don't warn if target is missing but
   promised, 2018-07-12) was in flight simultaneously.
   
   Another example of failure of this approach is seen in patch 5, which
   shows that the pedantry was missed.

This series takes another approach as it doesn't change the signature of
functions, but introduces new functions that can deal with arbitrary 
repositories, keeping the old function signature around using a shallow 
wrapper.

Additionally each patch adds a semantic patch, that would port from the old 
to
the new function. These semantic patches are all applied in the very last 
patch,
but we could omit applying the last patch if it causes too many merge 
conflicts
and trickl in the semantic patches over time when there are no merge 
conflicts.


The original goal of all these refactoring series was to remove 
add_submodule_odb 
in submodule.c, which was partially reached with this series. I'll 
investigate the
remaining calls in another series, but it shows we're close to be done with 
these
large refactorings as far as I am concerned.

Thanks,
Stefan


Stefan Beller (19):
  sha1_file: allow read_object to read objects in arbitrary repositories
  packfile: allow has_packed_and_bad to handle arbitrary repositories
  object-store: allow read_object_file_extended to read from arbitrary
repositories
  object-store: prepare read_object_file to deal with arbitrary
repositories
  object-store: prepare has_{sha1, object}_file[_with_flags] to handle
arbitrary repositories
  object: parse_object to honor its repository argument
  commit: allow parse_commit* to handle arbitrary repositories
  commit-reach.c: allow paint_down_to_common to handle arbitrary
repositories
  commit-reach.c: allow merge_bases_many to handle arbitrary
repositories
  commit-reach.c: allow remove_redundant to handle arbitrary
repositories
  commit-reach.c: allow get_merge_bases_many_0 to handle arbitrary
repositories
  commit-reach: prepare get_merge_bases to handle arbitrary repositories
  commit-reach: prepare in_merge_bases[_many] to handle arbitrary
repositories
  commit: prepare get_commit_buffer to handle arbitrary repositories
  commit: prepare repo_unuse_commit_buffer to handle arbitrary
repositories
  commit: prepare logmsg_reencode to handle arbitrary repositories
  pretty: prepare format_commit_message to handle arbitrary repositories
  submodule: use submodule repos for object lookup
  submodule: don't add submodule as odb for push

 commit-reach.c  |  73 +++-
 commit-reach.h  |  38 +--
 commit.c|  32 --
 commit.h|  39 ++-
 contrib/coccinelle/the_repository.cocci | 144 
 object-store.h  |  35 --
 object.c|   6 +-
 packfile.c  |   5 +-
 packfile.h  |   

What's cooking in git.git (Oct 2018, #03; Wed, 17)

2018-10-16 Thread Junio C Hamano
I expect to be offline for most of today during the day, so here
is the usual report but sent at an unusual time.

Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.  The ones marked with '.' do not appear in any of
the integration branches, but I am still holding onto them.

Note that "cf. " are not meant as "clear all of these and
you are home free".  They are primarily to prevent me from merging
topics that are not ready to 'next' by mistake and remind me where
to go back to read the last state of the discussion in the archive.

You can find the changes described here in the integration branches
of the repositories listed at

http://git-blame.blogspot.com/p/git-public-repositories.html

--
[Graduated to "master"]

* ab/commit-graph-progress (2018-09-20) 3 commits
  (merged to 'next' on 2018-09-24 at 76f2f5c1e3)
 + gc: fix regression in 7b0f229222 impacting --quiet
 + commit-graph verify: add progress output
 + commit-graph write: add progress output
 (this branch is used by ds/commit-graph-leakfix.)

 (Originally merged to 'next' on 2018-09-20 at 24ca94b1d4)

 Generation of (experimental) commit-graph files have so far been
 fairly silent, even though it takes noticeable amount of time in a
 meaningfully large repository.  The users will now see progress
 output.


* ds/commit-graph-with-grafts (2018-08-21) 8 commits
  (merged to 'next' on 2018-10-09 at 851a457102)
 + commit-graph: close_commit_graph before shallow walk
 + commit-graph: not compatible with uninitialized repo
 + commit-graph: not compatible with grafts
 + commit-graph: not compatible with replace objects
 + test-repository: properly init repo
 + commit-graph: update design document
 + refs.c: upgrade for_each_replace_ref to be a each_repo_ref_fn callback
 + refs.c: migrate internal ref iteration to pass thru repository argument

 The recently introduced commit-graph auxiliary data is incompatible
 with mechanisms such as replace & grafts that "breaks" immutable
 nature of the object reference relationship.  Disable optimizations
 based on its use (and updating existing commit-graph) when these
 incompatible features are in use in the repository.


* ds/reachable-final-cleanup (2018-09-25) 1 commit
  (merged to 'next' on 2018-10-12 at 32c6d5f55a)
 + commit-reach: cleanups in can_all_from_reach...

 Code already in 'master' is further cleaned-up by this patch.


* dz/credential-doc-url-matching-rules (2018-09-27) 1 commit
  (merged to 'next' on 2018-10-12 at 4547952530)
 + doc: clarify gitcredentials path component matching

 Doc update.


* en/merge-cleanup (2018-09-20) 4 commits
  (merged to 'next' on 2018-10-09 at f3a00b506f)
 + merge-recursive: rename merge_file_1() and merge_content()
 + merge-recursive: remove final remaining caller of merge_file_one()
 + merge-recursive: avoid wrapper function when unnecessary and wasteful
 + merge-recursive: set paths correctly when three-way merging content
 (this branch is used by en/merge-cleanup-more.)

 Code clean-up.


* en/status-multiple-renames-to-the-same-target-fix (2018-09-27) 1 commit
  (merged to 'next' on 2018-10-12 at 4976fc61a0)
 + commit: fix erroneous BUG, 'multiple renames on the same target? how?'

 The code in "git status" sometimes hit an assertion failure.  This
 was caused by a structure that was reused without cleaning the data
 used for the first run, which has been corrected.


* fe/doc-updates (2018-09-21) 3 commits
  (merged to 'next' on 2018-10-10 at 2eea3a88bc)
 + git-describe.1: clarify that "human readable" is also git-readable
 + git-column.1: clarify initial description, provide examples
 + git-archimport.1: specify what kind of Arch we're talking about

 Doc updates.


* jk/check-everything-connected-is-long-gone (2018-09-25) 1 commit
  (merged to 'next' on 2018-10-12 at 4ce30c9a30)
 + receive-pack: update comment with check_everything_connected

 Comment fix.


* jk/delta-islands-with-bitmap-reuse-delta-fix (2018-09-19) 1 commit
  (merged to 'next' on 2018-10-09 at 10e58be2af)
 + pack-objects: handle island check for "external" delta base

 Fix interactions between two recent topics.


* jk/oideq-hasheq-cleanup (2018-10-04) 1 commit
  (merged to 'next' on 2018-10-12 at 7c9b5681da)
 + more oideq/hasheq conversions

 Code clean-up.


* jn/gc-auto (2018-07-17) 1 commit
  (merged to 'next' on 2018-10-10 at 9f0f1f770e)
 + gc: do not return error for prior errors in daemonized mode
 (this branch uses jn/gc-auto-prep.)

 "gc --auto" ended up calling exit(-1) upon error, which has been
 corrected to use exit(1).  Also the error reporting behaviour when
 daemonized has been updated to exit with zero status when stopping
 due to a previously discovered error (which implies there is no
 point running gc to improve the situation); we used to exit with
 failure in such a case.


* jn/gc-auto-prep (2018-07-17) 2 commits
  

Re: [PATCH v4] branch: introduce --show-current display option

2018-10-16 Thread Eric Sunshine
On Tue, Oct 16, 2018 at 7:09 PM Junio C Hamano  wrote:
> Eric Sunshine  writes:
> > This cleanup "checkout" needs to be encapsulated within a
> > test_when_finished(), doesn't it? Preferably just after the "git
> > checkout -b" invocation.
>
> In the meantime, here is what I'll have in 'pu' on top.
>
> diff --git a/t/t3203-branch-output.sh b/t/t3203-branch-output.sh
> @@ -119,12 +119,14 @@ test_expect_success 'git branch `--show-current` works 
> properly when tag exists'
> cat >expect <<-\EOF &&
> branch-and-tag-name
> EOF
> -   test_when_finished "git branch -D branch-and-tag-name" &&
> +   test_when_finished "
> +   git checkout branch-one
> +   git branch -D branch-and-tag-name
> +   " &&
> git checkout -b branch-and-tag-name &&
> test_when_finished "git tag -d branch-and-tag-name" &&
> git tag branch-and-tag-name &&
> git branch --show-current >actual &&
> -   git checkout branch-one &&
> test_cmp expect actual
>  '

This make sense to me.

> @@ -137,8 +139,7 @@ test_expect_success 'git branch `--show-current` works 
> properly with worktrees'
> git worktree add worktree branch-two &&
> (
> git branch --show-current &&
> -   cd worktree &&
> -   git branch --show-current
> +   git -C worktree branch --show-current
> ) >actual &&
> test_cmp expect actual
>  '

The subshell '(...)' could become '{...}' now that the 'cd' is gone,
but that's a minor point.


Re: problem with not being able to enforce git content filters

2018-10-16 Thread brian m. carlson
On Tue, Oct 16, 2018 at 01:36:37PM -0700, Stas Bekman wrote:
> The problem is that it can't be enforced.
> 
> When it's not enforced, we end up with some devs using it and others
> don't, or more often is the same dev sometimes doesn't have it configured.
> 
> When a person has a stripped out notebook checked out, when another
> person commits un-stripped out notebook, it leads to: invalid `git
> status` reports, `git pull` breaks, `git stash` doesn't work, since it
> tries to stash using the filters, and `git pull' can never succeed
> because it thinks that it'll overwrite the local changes, but `git diff`
> returns no changes.
> 
> So the only solution when this happens is to disable the filters, clean
> up the mess, re-enable the filters. Many people just make a new clone -
> ouch!
> 
> And the biggest problem is that it affects all users who may have the
> filters enabled, e.g. because they worked on a PR, and not just devs -
> i.e. the repercussions are much bigger than just a few devs affected.
> 
> We can't use server-side hooks to enforce this because the project is on
> github.
> 
> And the devs honestly try to do their best to remember to configure the
> filters, but for some reason they disappear for them, don't ask me why,
> I don't know. This is an open source project team, not a work place.

This sounds like it could be easily solved by continuous integration.
You could set up a job on any of a variety of services that checks that
a pull request or other commit is clean when when the filter runs.  If
it doesn't pass, the code doesn't merge.

This is what other projects do for style-related and tidiness issues.
Similar approaches can be used in other situations to enforce that all
line endings are LF, or whatever your project desires.

I don't think it's a good idea to provide Git configuration to end
users, even with prompting, since there are many novice users who don't
know what the security implications of various config options are.  I
also personally never would want to be prompted for such a thing, so
even if that were a feature, people would turn if off, and you'd be no
better off than you were before.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH 17/19] submodule: use submodule repos for object lookup

2018-10-16 Thread Stefan Beller
On Tue, Oct 16, 2018 at 4:13 PM Jonathan Tan  wrote:
>
> > Thanks for the review of the whole series!
> >
> > I have redone this series, addressing all your comments. I addressed
> > this comment differently than suggested, and put the submodule
> > repository argument at the end of the parameter list, such that it
> > goes with all the other arguments to be filled in.
>
> Sounds good.

Actually I changed my mind on that after figuring out how to free
the submodule repository appropriately and went with your original suggestion.

>
> > I was about to resend the series, but test-merged with the other
> > submodule series in flight 
> > (origin/sb/submodule-recursive-fetch-gets-the-tip)
> > which had some conflicts that I can easily resolve by rebasing on top.
>
> I presume you are talking about [1]? Maybe consider rebasing that one on
> top of this instead, since this is just a refactoring whereas
> submodule-recursive-fetch-gets-the-tip changes functionality, from what
> I understand of patches 8 and 9.

>From my understanding, that series is further along than this one,
so I would not want to mix up their order.

Currently I am rebasing this on top of select topics from next,
(ds/reachable) as that are the other conflicts that I'd have to deal with.

> [1] https://public-inbox.org/git/20181016181327.107186-1-sbel...@google.com/
>
> > It conflicts a lot when merging to next, due to the latest patch
> > ("Apply semantic patches from previous patches"), so I am not sure
> > how to proceed properly. Maybe we'd omit that patch and
> > run 'make coccicheck' on next to apply the semantic patches
> > there instead.
>
> Omitting the patch sounds good to me. For me, just stating that you have
> excluded any coccinelle-generated patches in order to ease merging into
> the various branches is sufficient, and people can test the coccinelle
> patches included by running "make coccicheck" then "patch -p1
> 

Re: [PATCH 17/19] submodule: use submodule repos for object lookup

2018-10-16 Thread Jonathan Tan
> Thanks for the review of the whole series!
> 
> I have redone this series, addressing all your comments. I addressed
> this comment differently than suggested, and put the submodule
> repository argument at the end of the parameter list, such that it
> goes with all the other arguments to be filled in.

Sounds good.

> I was about to resend the series, but test-merged with the other
> submodule series in flight (origin/sb/submodule-recursive-fetch-gets-the-tip)
> which had some conflicts that I can easily resolve by rebasing on top.

I presume you are talking about [1]? Maybe consider rebasing that one on
top of this instead, since this is just a refactoring whereas
submodule-recursive-fetch-gets-the-tip changes functionality, from what
I understand of patches 8 and 9.

[1] https://public-inbox.org/git/20181016181327.107186-1-sbel...@google.com/

> It conflicts a lot when merging to next, due to the latest patch
> ("Apply semantic patches from previous patches"), so I am not sure
> how to proceed properly. Maybe we'd omit that patch and
> run 'make coccicheck' on next to apply the semantic patches
> there instead.

Omitting the patch sounds good to me. For me, just stating that you have
excluded any coccinelle-generated patches in order to ease merging into
the various branches is sufficient, and people can test the coccinelle
patches included by running "make coccicheck" then "patch -p1


Re: What's cooking in git.git (Oct 2018, #02; Sat, 13)

2018-10-16 Thread Junio C Hamano
Josh Steadmon  writes:

> The first two patches (test cleanup and packet_reader refactor) are OK,
> but the latter two will break the archive command when an old client
> attempts to talk to a new server (due to the version advertisement
> problem noted in [1]). Sorry that I didn't catch that these had made it
> into next.

Thanks.  Will hld.


Re: [PATCH v4] branch: introduce --show-current display option

2018-10-16 Thread Junio C Hamano
Eric Sunshine  writes:

>> +test_expect_success 'git branch `--show-current` works properly when tag 
>> exists' '
>> +   cat >expect <<-\EOF &&
>> +   branch-and-tag-name
>> +   EOF
>> +   test_when_finished "git branch -D branch-and-tag-name" &&
>> +   git checkout -b branch-and-tag-name &&
>> +   test_when_finished "git tag -d branch-and-tag-name" &&
>> +   git tag branch-and-tag-name &&
>> +   git branch --show-current >actual &&
>> +   git checkout branch-one &&
>
> This cleanup "checkout" needs to be encapsulated within a
> test_when_finished(), doesn't it? Preferably just after the "git
> checkout -b" invocation.

In the meantime, here is what I'll have in 'pu' on top.

 t/t3203-branch-output.sh | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/t/t3203-branch-output.sh b/t/t3203-branch-output.sh
index 1bf708dffc..d1f4fec9de 100755
--- a/t/t3203-branch-output.sh
+++ b/t/t3203-branch-output.sh
@@ -119,12 +119,14 @@ test_expect_success 'git branch `--show-current` works 
properly when tag exists'
cat >expect <<-\EOF &&
branch-and-tag-name
EOF
-   test_when_finished "git branch -D branch-and-tag-name" &&
+   test_when_finished "
+   git checkout branch-one
+   git branch -D branch-and-tag-name
+   " &&
git checkout -b branch-and-tag-name &&
test_when_finished "git tag -d branch-and-tag-name" &&
git tag branch-and-tag-name &&
git branch --show-current >actual &&
-   git checkout branch-one &&
test_cmp expect actual
 '
 
@@ -137,8 +139,7 @@ test_expect_success 'git branch `--show-current` works 
properly with worktrees'
git worktree add worktree branch-two &&
(
git branch --show-current &&
-   cd worktree &&
-   git branch --show-current
+   git -C worktree branch --show-current
) >actual &&
test_cmp expect actual
 '
-- 
2.19.1-328-g5a0cc8aca7



Re: [PATCH 15/16] commit-reach: make can_all_from_reach... linear

2018-10-16 Thread Junio C Hamano
René Scharfe  writes:

> Apart from that the macro is simple and doesn't use any tricks or
> added checks.  It just sets up boilerplate functions to offer type-safe
> sorting.
> ...
> diff --git a/git-compat-util.h b/git-compat-util.h
> index 5f2e90932f..491230fc57 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -1066,6 +1066,21 @@ static inline void sane_qsort(void *base, size_t 
> nmemb, size_t size,
>   qsort(base, nmemb, size, compar);
>  }
>  
> +#define DECLARE_SORT(scope, name, elemtype) \
> +scope void name(elemtype, size_t)
> +
> +#define DEFINE_SORT(scope, name, elemtype, one, two) \
> +static int name##_compare(const elemtype, const elemtype);   \
> +static int name##_compare_void(const void *a, const void *b) \
> +{\
> + return name##_compare(a, b);\
> +}\
> +scope void name(elemtype base, size_t nmemb) \
> +{\
> + QSORT(base, nmemb, name##_compare_void);\
> +}\
> +static int name##_compare(const elemtype one, const elemtype two)

... and here comes the body of the comparison function that takes
two "things" we are about, i.e. elements of the array being sorted.

Quite cleanly done and the result looks pleasant, at least to me.


Re: [PATCH v2 00/13] Base SHA-256 implementation

2018-10-16 Thread brian m. carlson
On Tue, Oct 16, 2018 at 01:01:07PM +0900, Junio C Hamano wrote:
> "brian m. carlson"  writes:
> 
> >  t/t0014-hash.sh   |  54 
> >  create mode 100755 t/t0014-hash.sh
> 
> If I am not mistaken, 0014 is already used by another topic in
> flight, and will cause test-lint failure on 'pu'.

This series was written a while back.  I'll rename it.  Feel free to
wait for v3 before picking it up.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH v2 13/13] commit-graph: specify OID version for SHA-256

2018-10-16 Thread brian m. carlson
On Tue, Oct 16, 2018 at 06:09:41PM +0200, Duy Nguyen wrote:
> On Tue, Oct 16, 2018 at 6:01 PM Derrick Stolee  wrote:
> >
> > On 10/16/2018 11:35 AM, Duy Nguyen wrote:
> > > On Mon, Oct 15, 2018 at 4:23 AM brian m. carlson
> > >  wrote:
> > >> Since the commit-graph code wants to serialize the hash algorithm into
> > >> the data store, specify a version number for each supported algorithm.
> > >> Note that we don't use the values of the constants themselves, as they
> > >> are internal and could change in the future.
> > >>
> > >> Signed-off-by: brian m. carlson 
> > >> ---
> > >>   commit-graph.c | 9 -
> > >>   1 file changed, 8 insertions(+), 1 deletion(-)
> > >>
> > >> diff --git a/commit-graph.c b/commit-graph.c
> > >> index 7a28fbb03f..e587c21bb6 100644
> > >> --- a/commit-graph.c
> > >> +++ b/commit-graph.c
> > >> @@ -45,7 +45,14 @@ char *get_commit_graph_filename(const char *obj_dir)
> > >>
> > >>   static uint8_t oid_version(void)
> > >>   {
> > >> -   return 1;
> > >> +   switch (hash_algo_by_ptr(the_hash_algo)) {
> > >> +   case GIT_HASH_SHA1:
> > >> +   return 1;
> > >> +   case GIT_HASH_SHA256:
> > >> +   return 2;
> > > Should we just increase this field to uint32_t and store format_id
> > > instead? That will keep oid version unique in all data formats.
> > Both the commit-graph and multi-pack-index store a single byte for the
> > hash version, so that ship has sailed (without incrementing the full
> > file version number in each format).
> 
> And it's probably premature to add the oid version field when multiple
> hash support has not been fully realized. Now we have different ways
> of storing hash id and need separate mappings.

Honestly, anything in the .git directory that is not the v3 pack indexes
or the loose object file should be in exactly one hash algorithm.  We
could simply just leave this value at 1 all the time and ignore the
field, since we already know what algorithm it will use.

> I would go for incrementing file version. Otherwise maybe we just
> update format_id to be one byte instead, and the way of storing hash
> version in commit-graph will be used everywhere.

It needs to be four bytes for pack files so that it's four-byte aligned.
Otherwise accessing pack index fields will cause alignment issues if we
don't massively rewrite the pack handling code.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: On overriding make variables from the environment...

2018-10-16 Thread Jonathan Nieder
SZEDER Gábor wrote:
> On Tue, Oct 16, 2018 at 02:54:56PM -0700, Jonathan Nieder wrote:
>> SZEDER Gábor wrote:

>>> Our Makefile has lines like these:
>>>
>>>   CFLAGS = -g -O2 -Wall
>>>   CC = cc
>>>   AR = ar
>>>   SPATCH = spatch
[...]
>>> I'm not sure what to do.  I'm fine with updating our 'ci/' scripts to
>>> explicitly respect CC in the environment (either by running 'make
>>> CC=$CC' or by writing $CC into 'config.mak').  Or I could update our
>>> Makefile to use '?=' for specific variables, but:
>>
>> That's a good question.  I don't have a strong opinion myself, so I
>> tend to trust larger projects like Linux to have thought this through
>> more, and they use 'CC = cc' as well.
>
> I don't think Linux is a good example to follow in this case, see e.g.
> 6d62c983f7 (Makefile: Change the default compiler from "gcc" to "cc",
> 2011-12-20) (in short: Git is supposed to be buildable with compilers
> other than GCC as well, while Linux not really, so they can hardcode
> 'CC = gcc').

Nowadays Linux builds with clang as well.  People also have other
reasons to override the CC setting (cross-compiling, etc) and to
override other settings like CFLAGS.

> Also, the projects I have on hand usually have 'CC = gcc' hardcoded in
> their Makefiles, too, but those Makefiles were generated by their
> ./configure script (which in turn by ./autogen.sh...), and those tend
> to write CC from the environment into the generated Makefiles.

Yes, I think that's what makes travis's setup normally work.  It makes
sense to me since ./configure is expected to have more implicit or
automatic behavior.  For "make", I kind of like that it doesn't depend
on environment variables that I am not thinking about when debugging a
reported build problem.

When building Git without autoconf, the corresponding place for
settings is config.mak.  Would it make sense for the ci scripts to
write a config.mak file?

Thanks,
Jonathan


Re: [PATCH v2 13/13] commit-graph: specify OID version for SHA-256

2018-10-16 Thread brian m. carlson
On Tue, Oct 16, 2018 at 11:00:19AM +0900, Junio C Hamano wrote:
> "brian m. carlson"  writes:
> 
> > Since the commit-graph code wants to serialize the hash algorithm into
> > the data store, specify a version number for each supported algorithm.
> > Note that we don't use the values of the constants themselves, as they
> > are internal and could change in the future.
> >
> > Signed-off-by: brian m. carlson 
> > ---
> >  commit-graph.c | 9 -
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/commit-graph.c b/commit-graph.c
> > index 7a28fbb03f..e587c21bb6 100644
> > --- a/commit-graph.c
> > +++ b/commit-graph.c
> > @@ -45,7 +45,14 @@ char *get_commit_graph_filename(const char *obj_dir)
> >  
> >  static uint8_t oid_version(void)
> >  {
> > -   return 1;
> > +   switch (hash_algo_by_ptr(the_hash_algo)) {
> > +   case GIT_HASH_SHA1:
> > +   return 1;
> > +   case GIT_HASH_SHA256:
> > +   return 2;
> > +   default:
> > +   BUG("unknown hash algorithm");
> > +   }
> 
> Style: align switch/case.

Will fix.

> Shouldn't this be just using GIT_HASH_* constants instead?  Are we
> expecting that it would be benefitial to allow more than one "oid
> version" even within a same GIT_HASH_*?

I really would like to have us rely not rely explicitly on those values.
We don't serialize them anywhere else, and they're explicitly documented
as not being suitable for serialization.  If we were writing this fresh
today, we'd probably use the format_version field, which is designed for
this purpose, or simply omit the field altogether.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


[PATCH v4 4/7] revision.c: begin refactoring --topo-order logic

2018-10-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

When running 'git rev-list --topo-order' and its kin, the topo_order
setting in struct rev_info implies the limited setting. This means
that the following things happen during prepare_revision_walk():

* revs->limited implies we run limit_list() to walk the entire
  reachable set. There are some short-cuts here, such as if we
  perform a range query like 'git rev-list COMPARE..HEAD' and we
  can stop limit_list() when all queued commits are uninteresting.

* revs->topo_order implies we run sort_in_topological_order(). See
  the implementation of that method in commit.c. It implies that
  the full set of commits to order is in the given commit_list.

These two methods imply that a 'git rev-list --topo-order HEAD'
command must walk the entire reachable set of commits _twice_ before
returning a single result.

If we have a commit-graph file with generation numbers computed, then
there is a better way. This patch introduces some necessary logic
redirection when we are in this situation.

In v2.18.0, the commit-graph file contains zero-valued bytes in the
positions where the generation number is stored in v2.19.0 and later.
Thus, we use generation_numbers_enabled() to check if the commit-graph
is available and has non-zero generation numbers.

When setting revs->limited only because revs->topo_order is true,
only do so if generation numbers are not available. There is no
reason to use the new logic as it will behave similarly when all
generation numbers are INFINITY or ZERO.

In prepare_revision_walk(), if we have revs->topo_order but not
revs->limited, then we trigger the new logic. It breaks the logic
into three pieces, to fit with the existing framework:

1. init_topo_walk() fills a new struct topo_walk_info in the rev_info
   struct. We use the presence of this struct as a signal to use the
   new methods during our walk. In this patch, this method simply
   calls limit_list() and sort_in_topological_order(). In the future,
   this method will set up a new data structure to perform that logic
   in-line.

2. next_topo_commit() provides get_revision_1() with the next topo-
   ordered commit in the list. Currently, this simply pops the commit
   from revs->commits.

3. expand_topo_walk() provides get_revision_1() with a way to signal
   walking beyond the latest commit. Currently, this calls
   add_parents_to_list() exactly like the old logic.

While this commit presents method redirection for performing the
exact same logic as before, it allows the next commit to focus only
on the new logic.

Signed-off-by: Derrick Stolee 
---
 revision.c | 42 ++
 revision.h |  4 
 2 files changed, 42 insertions(+), 4 deletions(-)

diff --git a/revision.c b/revision.c
index e18bd530e4..2dcde8a8ac 100644
--- a/revision.c
+++ b/revision.c
@@ -25,6 +25,7 @@
 #include "worktree.h"
 #include "argv-array.h"
 #include "commit-reach.h"
+#include "commit-graph.h"
 
 volatile show_early_output_fn_t show_early_output;
 
@@ -2454,7 +2455,7 @@ int setup_revisions(int argc, const char **argv, struct 
rev_info *revs, struct s
if (revs->diffopt.objfind)
revs->simplify_history = 0;
 
-   if (revs->topo_order)
+   if (revs->topo_order && !generation_numbers_enabled(the_repository))
revs->limited = 1;
 
if (revs->prune_data.nr) {
@@ -2892,6 +2893,33 @@ static int mark_uninteresting(const struct object_id 
*oid,
return 0;
 }
 
+struct topo_walk_info {};
+
+static void init_topo_walk(struct rev_info *revs)
+{
+   struct topo_walk_info *info;
+   revs->topo_walk_info = xmalloc(sizeof(struct topo_walk_info));
+   info = revs->topo_walk_info;
+   memset(info, 0, sizeof(struct topo_walk_info));
+
+   limit_list(revs);
+   sort_in_topological_order(>commits, revs->sort_order);
+}
+
+static struct commit *next_topo_commit(struct rev_info *revs)
+{
+   return pop_commit(>commits);
+}
+
+static void expand_topo_walk(struct rev_info *revs, struct commit *commit)
+{
+   if (add_parents_to_list(revs, commit, >commits, NULL) < 0) {
+   if (!revs->ignore_missing_links)
+   die("Failed to traverse parents of commit %s",
+   oid_to_hex(>object.oid));
+   }
+}
+
 int prepare_revision_walk(struct rev_info *revs)
 {
int i;
@@ -2928,11 +2956,13 @@ int prepare_revision_walk(struct rev_info *revs)
commit_list_sort_by_date(>commits);
if (revs->no_walk)
return 0;
-   if (revs->limited)
+   if (revs->limited) {
if (limit_list(revs) < 0)
return -1;
-   if (revs->topo_order)
-   sort_in_topological_order(>commits, revs->sort_order);
+   if (revs->topo_order)
+   sort_in_topological_order(>commits, 
revs->sort_order);
+   } else if (revs->topo_order)
+   init_topo_walk(revs);
if 

[PATCH v4 6/7] revision.c: generation-based topo-order algorithm

2018-10-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

The current --topo-order algorithm requires walking all
reachable commits up front, topo-sorting them, all before
outputting the first value. This patch introduces a new
algorithm which uses stored generation numbers to
incrementally walk in topo-order, outputting commits as
we go. This can dramatically reduce the computation time
to write a fixed number of commits, such as when limiting
with "-n " or filling the first page of a pager.

When running a command like 'git rev-list --topo-order HEAD',
Git performed the following steps:

1. Run limit_list(), which parses all reachable commits,
   adds them to a linked list, and distributes UNINTERESTING
   flags. If all unprocessed commits are UNINTERESTING, then
   it may terminate without walking all reachable commits.
   This does not occur if we do not specify UNINTERESTING
   commits.

2. Run sort_in_topological_order(), which is an implementation
   of Kahn's algorithm. It first iterates through the entire
   set of important commits and computes the in-degree of each
   (plus one, as we use 'zero' as a special value here). Then,
   we walk the commits in priority order, adding them to the
   priority queue if and only if their in-degree is one. As
   we remove commits from this priority queue, we decrement the
   in-degree of their parents.

3. While we are peeling commits for output, get_revision_1()
   uses pop_commit on the full list of commits computed by
   sort_in_topological_order().

In the new algorithm, these three steps correspond to three
different commit walks. We run these walks simultaneously,
and advance each only as far as necessary to satisfy the
requirements of the 'higher order' walk. We know when we can
pause each walk by using generation numbers from the commit-
graph feature.

Recall that the generation number of a commit satisfies:

* If the commit has at least one parent, then the generation
  number is one more than the maximum generation number among
  its parents.

* If the commit has no parent, then the generation number is one.

There are two special generation numbers:

* GENERATION_NUMBER_INFINITY: this value is 0x and
  indicates that the commit is not stored in the commit-graph and
  the generation number was not previously calculated.

* GENERATION_NUMBER_ZERO: this value (0) is a special indicator
  to say that the commit-graph was generated by a version of Git
  that does not compute generation numbers (such as v2.18.0).

Since we use generation_numbers_enabled() before using the new
algorithm, we do not need to worry about GENERATION_NUMBER_ZERO.
However, the existence of GENERATION_NUMBER_INFINITY implies the
following weaker statement than the usual we expect from
generation numbers:

If A and B are commits with generation numbers gen(A) and
gen(B) and gen(A) < gen(B), then A cannot reach B.

Thus, we will walk in each of our stages until the "maximum
unexpanded generation number" is strictly lower than the
generation number of a commit we are about to use.

The walks are as follows:

1. EXPLORE: using the explore_queue priority queue (ordered by
   maximizing the generation number), parse each reachable
   commit until all commits in the queue have generation
   number strictly lower than needed. During this walk, update
   the UNINTERESTING flags as necessary.

2. INDEGREE: using the indegree_queue priority queue (ordered
   by maximizing the generation number), add one to the in-
   degree of each parent for each commit that is walked. Since
   we walk in order of decreasing generation number, we know
   that discovering an in-degree value of 0 means the value for
   that commit was not initialized, so should be initialized to
   two. (Recall that in-degree value "1" is what we use to say a
   commit is ready for output.) As we iterate the parents of a
   commit during this walk, ensure the EXPLORE walk has walked
   beyond their generation numbers.

3. TOPO: using the topo_queue priority queue (ordered based on
   the sort_order given, which could be commit-date, author-
   date, or typical topo-order which treats the queue as a LIFO
   stack), remove a commit from the queue and decrement the
   in-degree of each parent. If a parent has an in-degree of
   one, then we add it to the topo_queue. Before we decrement
   the in-degree, however, ensure the INDEGREE walk has walked
   beyond that generation number.

The implementations of these walks are in the following methods:

* explore_walk_step and explore_to_depth
* indegree_walk_step and compute_indegrees_to_depth
* next_topo_commit and expand_topo_walk

These methods have some patterns that may seem strange at first,
but they are probably carry-overs from their equivalents in
limit_list and sort_in_topological_order.

One thing that is missing from this implementation is a proper
way to stop walking when the entire queue is UNINTERESTING, so
this implementation is not enabled by comparisions, such as in
'git rev-list 

[PATCH v4 3/7] test-reach: add rev-list tests

2018-10-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

The rev-list command is critical to Git's functionality. Ensure it
works in the three commit-graph environments constructed in
t6600-test-reach.sh. Here are a few important types of rev-list
operations:

* Basic: git rev-list --topo-order HEAD
* Range: git rev-list --topo-order compare..HEAD
* Ancestry: git rev-list --topo-order --ancestry-path compare..HEAD
* Symmetric Difference: git rev-list --topo-order compare...HEAD

Signed-off-by: Derrick Stolee 
---
 t/t6600-test-reach.sh | 84 +++
 1 file changed, 84 insertions(+)

diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index 9d65b8b946..288f703b7b 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -243,4 +243,88 @@ test_expect_success 'commit_contains:miss' '
test_three_modes commit_contains --tag
 '
 
+test_expect_success 'rev-list: basic topo-order' '
+   git rev-parse \
+   commit-6-6 commit-5-6 commit-4-6 commit-3-6 commit-2-6 
commit-1-6 \
+   commit-6-5 commit-5-5 commit-4-5 commit-3-5 commit-2-5 
commit-1-5 \
+   commit-6-4 commit-5-4 commit-4-4 commit-3-4 commit-2-4 
commit-1-4 \
+   commit-6-3 commit-5-3 commit-4-3 commit-3-3 commit-2-3 
commit-1-3 \
+   commit-6-2 commit-5-2 commit-4-2 commit-3-2 commit-2-2 
commit-1-2 \
+   commit-6-1 commit-5-1 commit-4-1 commit-3-1 commit-2-1 
commit-1-1 \
+   >expect &&
+   run_three_modes git rev-list --topo-order commit-6-6
+'
+
+test_expect_success 'rev-list: first-parent topo-order' '
+   git rev-parse \
+   commit-6-6 \
+   commit-6-5 \
+   commit-6-4 \
+   commit-6-3 \
+   commit-6-2 \
+   commit-6-1 commit-5-1 commit-4-1 commit-3-1 commit-2-1 
commit-1-1 \
+   >expect &&
+   run_three_modes git rev-list --first-parent --topo-order commit-6-6
+'
+
+test_expect_success 'rev-list: range topo-order' '
+   git rev-parse \
+   commit-6-6 commit-5-6 commit-4-6 commit-3-6 commit-2-6 
commit-1-6 \
+   commit-6-5 commit-5-5 commit-4-5 commit-3-5 commit-2-5 
commit-1-5 \
+   commit-6-4 commit-5-4 commit-4-4 commit-3-4 commit-2-4 
commit-1-4 \
+   commit-6-3 commit-5-3 commit-4-3 \
+   commit-6-2 commit-5-2 commit-4-2 \
+   commit-6-1 commit-5-1 commit-4-1 \
+   >expect &&
+   run_three_modes git rev-list --topo-order commit-3-3..commit-6-6
+'
+
+test_expect_success 'rev-list: range topo-order' '
+   git rev-parse \
+   commit-6-6 commit-5-6 commit-4-6 \
+   commit-6-5 commit-5-5 commit-4-5 \
+   commit-6-4 commit-5-4 commit-4-4 \
+   commit-6-3 commit-5-3 commit-4-3 \
+   commit-6-2 commit-5-2 commit-4-2 \
+   commit-6-1 commit-5-1 commit-4-1 \
+   >expect &&
+   run_three_modes git rev-list --topo-order commit-3-8..commit-6-6
+'
+
+test_expect_success 'rev-list: first-parent range topo-order' '
+   git rev-parse \
+   commit-6-6 \
+   commit-6-5 \
+   commit-6-4 \
+   commit-6-3 \
+   commit-6-2 \
+   commit-6-1 commit-5-1 commit-4-1 \
+   >expect &&
+   run_three_modes git rev-list --first-parent --topo-order 
commit-3-8..commit-6-6
+'
+
+test_expect_success 'rev-list: ancestry-path topo-order' '
+   git rev-parse \
+   commit-6-6 commit-5-6 commit-4-6 commit-3-6 \
+   commit-6-5 commit-5-5 commit-4-5 commit-3-5 \
+   commit-6-4 commit-5-4 commit-4-4 commit-3-4 \
+   commit-6-3 commit-5-3 commit-4-3 \
+   >expect &&
+   run_three_modes git rev-list --topo-order --ancestry-path 
commit-3-3..commit-6-6
+'
+
+test_expect_success 'rev-list: symmetric difference topo-order' '
+   git rev-parse \
+   commit-6-6 commit-5-6 commit-4-6 \
+   commit-6-5 commit-5-5 commit-4-5 \
+   commit-6-4 commit-5-4 commit-4-4 \
+   commit-6-3 commit-5-3 commit-4-3 \
+   commit-6-2 commit-5-2 commit-4-2 \
+   commit-6-1 commit-5-1 commit-4-1 \
+   commit-3-8 commit-2-8 commit-1-8 \
+   commit-3-7 commit-2-7 commit-1-7 \
+   >expect &&
+   run_three_modes git rev-list --topo-order commit-3-8...commit-6-6
+'
+
 test_done
-- 
gitgitgadget



[PATCH v4 1/7] prio-queue: add 'peek' operation

2018-10-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

When consuming a priority queue, it can be convenient to inspect
the next object that will be dequeued without actually dequeueing
it. Our existing library did not have such a 'peek' operation, so
add it as prio_queue_peek().

Add a reference-level comparison in t/helper/test-prio-queue.c
so this method is exercised by t0009-prio-queue.sh. Further, add
a test that checks the behavior when the compare function is NULL
(i.e. the queue becomes a stack).

Signed-off-by: Derrick Stolee 
---
 prio-queue.c   |  9 +
 prio-queue.h   |  6 ++
 t/helper/test-prio-queue.c | 26 ++
 t/t0009-prio-queue.sh  | 14 ++
 4 files changed, 47 insertions(+), 8 deletions(-)

diff --git a/prio-queue.c b/prio-queue.c
index a078451872..d3f488cb05 100644
--- a/prio-queue.c
+++ b/prio-queue.c
@@ -85,3 +85,12 @@ void *prio_queue_get(struct prio_queue *queue)
}
return result;
 }
+
+void *prio_queue_peek(struct prio_queue *queue)
+{
+   if (!queue->nr)
+   return NULL;
+   if (!queue->compare)
+   return queue->array[queue->nr - 1].data;
+   return queue->array[0].data;
+}
diff --git a/prio-queue.h b/prio-queue.h
index d030ec9dd6..682e51867a 100644
--- a/prio-queue.h
+++ b/prio-queue.h
@@ -46,6 +46,12 @@ extern void prio_queue_put(struct prio_queue *, void *thing);
  */
 extern void *prio_queue_get(struct prio_queue *);
 
+/*
+ * Gain access to the "thing" that would be returned by
+ * prio_queue_get, but do not remove it from the queue.
+ */
+extern void *prio_queue_peek(struct prio_queue *);
+
 extern void clear_prio_queue(struct prio_queue *);
 
 /* Reverse the LIFO elements */
diff --git a/t/helper/test-prio-queue.c b/t/helper/test-prio-queue.c
index 9807b649b1..5bc9c46ea5 100644
--- a/t/helper/test-prio-queue.c
+++ b/t/helper/test-prio-queue.c
@@ -22,14 +22,24 @@ int cmd__prio_queue(int argc, const char **argv)
struct prio_queue pq = { intcmp };
 
while (*++argv) {
-   if (!strcmp(*argv, "get"))
-   show(prio_queue_get());
-   else if (!strcmp(*argv, "dump")) {
-   int *v;
-   while ((v = prio_queue_get()))
-  show(v);
-   }
-   else {
+   if (!strcmp(*argv, "get")) {
+   void *peek = prio_queue_peek();
+   void *get = prio_queue_get();
+   if (peek != get)
+   BUG("peek and get results do not match");
+   show(get);
+   } else if (!strcmp(*argv, "dump")) {
+   void *peek;
+   void *get;
+   while ((peek = prio_queue_peek())) {
+   get = prio_queue_get();
+   if (peek != get)
+   BUG("peek and get results do not 
match");
+   show(get);
+   }
+   } else if (!strcmp(*argv, "stack")) {
+   pq.compare = NULL;
+   } else {
int *v = malloc(sizeof(*v));
*v = atoi(*argv);
prio_queue_put(, v);
diff --git a/t/t0009-prio-queue.sh b/t/t0009-prio-queue.sh
index e56dfce668..3941ad2528 100755
--- a/t/t0009-prio-queue.sh
+++ b/t/t0009-prio-queue.sh
@@ -47,4 +47,18 @@ test_expect_success 'notice empty queue' '
test_cmp expect actual
 '
 
+cat >expect <<'EOF'
+3
+2
+6
+4
+5
+1
+8
+EOF
+test_expect_success 'stack order' '
+   test-tool prio-queue stack 8 1 5 4 6 2 3 dump >actual &&
+   test_cmp expect actual
+'
+
 test_done
-- 
gitgitgadget



[PATCH v4 7/7] t6012: make rev-list tests more interesting

2018-10-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

As we are working to rewrite some of the revision-walk machinery,
there could easily be some interesting interactions between the
options that force topological constraints (--topo-order,
--date-order, and --author-date-order) along with specifying a
path.

Add extra tests to t6012-rev-list-simplify.sh to add coverage of
these interactions. To ensure interesting things occur, alter the
repo data shape to have different orders depending on topo-, date-,
or author-date-order.

When testing using GIT_TEST_COMMIT_GRAPH, this assists in covering
the new logic for topo-order walks using generation numbers. The
extra tests can be added indepently.

Signed-off-by: Derrick Stolee 
---
 t/t6012-rev-list-simplify.sh | 45 
 1 file changed, 36 insertions(+), 9 deletions(-)

diff --git a/t/t6012-rev-list-simplify.sh b/t/t6012-rev-list-simplify.sh
index b5a1190ffe..a10f0df02b 100755
--- a/t/t6012-rev-list-simplify.sh
+++ b/t/t6012-rev-list-simplify.sh
@@ -12,6 +12,22 @@ unnote () {
git name-rev --tags --stdin | sed -e "s|$OID_REGEX (tags/\([^)]*\)) |\1 
|g"
 }
 
+#
+# Create a test repo with interesting commit graph:
+#
+# A--B--G--H--I--K--L
+#  \  \   / /
+#   \  \ / /
+#C--E---F J
+#\_/
+#
+# The commits are laid out from left-to-right starting with
+# the root commit A and terminating at the tip commit L.
+#
+# There are a few places where we adjust the commit date or
+# author date to make the --topo-order, --date-order, and
+# --author-date-order flags produce different output.
+
 test_expect_success setup '
echo "Hi there" >file &&
echo "initial" >lost &&
@@ -21,10 +37,18 @@ test_expect_success setup '
 
git branch other-branch &&
 
+   git symbolic-ref HEAD refs/heads/unrelated &&
+   git rm -f "*" &&
+   echo "Unrelated branch" >side &&
+   git add side &&
+   test_tick && git commit -m "Side root" &&
+   note J &&
+   git checkout master &&
+
echo "Hello" >file &&
echo "second" >lost &&
git add file lost &&
-   test_tick && git commit -m "Modified file and lost" &&
+   test_tick && GIT_AUTHOR_DATE=$(($test_tick + 120)) git commit -m 
"Modified file and lost" &&
note B &&
 
git checkout other-branch &&
@@ -63,13 +87,6 @@ test_expect_success setup '
test_tick && git commit -a -m "Final change" &&
note I &&
 
-   git symbolic-ref HEAD refs/heads/unrelated &&
-   git rm -f "*" &&
-   echo "Unrelated branch" >side &&
-   git add side &&
-   test_tick && git commit -m "Side root" &&
-   note J &&
-
git checkout master &&
test_tick && git merge --allow-unrelated-histories -m "Coolest" 
unrelated &&
note K &&
@@ -103,14 +120,24 @@ check_result () {
check_outcome success "$@"
 }
 
-check_result 'L K J I H G F E D C B A' --full-history
+check_result 'L K J I H F E D C G B A' --full-history --topo-order
+check_result 'L K I H G F E D C B J A' --full-history
+check_result 'L K I H G F E D C B J A' --full-history --date-order
+check_result 'L K I H G F E D B C J A' --full-history --author-date-order
 check_result 'K I H E C B A' --full-history -- file
 check_result 'K I H E C B A' --full-history --topo-order -- file
 check_result 'K I H E C B A' --full-history --date-order -- file
+check_result 'K I H E B C A' --full-history --author-date-order -- file
 check_result 'I E C B A' --simplify-merges -- file
+check_result 'I E C B A' --simplify-merges --topo-order -- file
+check_result 'I E C B A' --simplify-merges --date-order -- file
+check_result 'I E B C A' --simplify-merges --author-date-order -- file
 check_result 'I B A' -- file
 check_result 'I B A' --topo-order -- file
+check_result 'I B A' --date-order -- file
+check_result 'I B A' --author-date-order -- file
 check_result 'H' --first-parent -- another-file
+check_result 'H' --first-parent --topo-order -- another-file
 
 check_result 'E C B A' --full-history E -- lost
 test_expect_success 'full history simplification without parent' '
-- 
gitgitgadget


[PATCH v4 5/7] commit/revisions: bookkeeping before refactoring

2018-10-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

There are a few things that need to move around a little before
making a big refactoring in the topo-order logic:

1. We need access to record_author_date() and
   compare_commits_by_author_date() in revision.c. These are used
   currently by sort_in_topological_order() in commit.c.

2. Moving these methods to commit.h requires adding the author_slab
   definition to commit.h.

3. The add_parents_to_list() method in revision.c performs logic
   around the UNINTERESTING flag and other special cases depending
   on the struct rev_info. Allow this method to ignore a NULL 'list'
   parameter, as we will not be populating the list for our walk.
   Also rename the method to the slightly more generic name
   process_parents() to make clear that this method does more than
   add to a list (and no list is required anymore).

Helped-by: Jeff King 
Signed-off-by: Derrick Stolee 
---
 commit.c   | 11 +--
 commit.h   |  8 
 revision.c | 18 ++
 3 files changed, 23 insertions(+), 14 deletions(-)

diff --git a/commit.c b/commit.c
index d0f199e122..861a485e93 100644
--- a/commit.c
+++ b/commit.c
@@ -655,11 +655,10 @@ struct commit *pop_commit(struct commit_list **stack)
 /* count number of children that have not been emitted */
 define_commit_slab(indegree_slab, int);
 
-/* record author-date for each commit object */
-define_commit_slab(author_date_slab, timestamp_t);
+implement_shared_commit_slab(author_date_slab, timestamp_t);
 
-static void record_author_date(struct author_date_slab *author_date,
-  struct commit *commit)
+void record_author_date(struct author_date_slab *author_date,
+   struct commit *commit)
 {
const char *buffer = get_commit_buffer(commit, NULL);
struct ident_split ident;
@@ -684,8 +683,8 @@ fail_exit:
unuse_commit_buffer(commit, buffer);
 }
 
-static int compare_commits_by_author_date(const void *a_, const void *b_,
- void *cb_data)
+int compare_commits_by_author_date(const void *a_, const void *b_,
+  void *cb_data)
 {
const struct commit *a = a_, *b = b_;
struct author_date_slab *author_date = cb_data;
diff --git a/commit.h b/commit.h
index 2b1a734388..977d397356 100644
--- a/commit.h
+++ b/commit.h
@@ -8,6 +8,7 @@
 #include "gpg-interface.h"
 #include "string-list.h"
 #include "pretty.h"
+#include "commit-slab.h"
 
 #define COMMIT_NOT_FROM_GRAPH 0x
 #define GENERATION_NUMBER_INFINITY 0x
@@ -328,6 +329,13 @@ extern int remove_signature(struct strbuf *buf);
  */
 extern int check_commit_signature(const struct commit *commit, struct 
signature_check *sigc);
 
+/* record author-date for each commit object */
+define_shared_commit_slab(author_date_slab, timestamp_t);
+
+void record_author_date(struct author_date_slab *author_date,
+   struct commit *commit);
+
+int compare_commits_by_author_date(const void *a_, const void *b_, void 
*unused);
 int compare_commits_by_commit_date(const void *a_, const void *b_, void 
*unused);
 int compare_commits_by_gen_then_commit_date(const void *a_, const void *b_, 
void *unused);
 
diff --git a/revision.c b/revision.c
index 2dcde8a8ac..36458265a0 100644
--- a/revision.c
+++ b/revision.c
@@ -768,8 +768,8 @@ static void commit_list_insert_by_date_cached(struct commit 
*p, struct commit_li
*cache = new_entry;
 }
 
-static int add_parents_to_list(struct rev_info *revs, struct commit *commit,
-   struct commit_list **list, struct commit_list **cache_ptr)
+static int process_parents(struct rev_info *revs, struct commit *commit,
+  struct commit_list **list, struct commit_list 
**cache_ptr)
 {
struct commit_list *parent = commit->parents;
unsigned left_flag;
@@ -808,7 +808,8 @@ static int add_parents_to_list(struct rev_info *revs, 
struct commit *commit,
if (p->object.flags & SEEN)
continue;
p->object.flags |= SEEN;
-   commit_list_insert_by_date_cached(p, list, cached_base, 
cache_ptr);
+   if (list)
+   commit_list_insert_by_date_cached(p, list, 
cached_base, cache_ptr);
}
return 0;
}
@@ -847,7 +848,8 @@ static int add_parents_to_list(struct rev_info *revs, 
struct commit *commit,
p->object.flags |= left_flag;
if (!(p->object.flags & SEEN)) {
p->object.flags |= SEEN;
-   commit_list_insert_by_date_cached(p, list, cached_base, 
cache_ptr);
+   if (list)
+   commit_list_insert_by_date_cached(p, list, 
cached_base, cache_ptr);
}
if (revs->first_parent_only)
break;
@@ -1091,7 +1093,7 @@ static 

[PATCH v4 0/7] Use generation numbers for --topo-order

2018-10-16 Thread Derrick Stolee via GitGitGadget
This patch series performs a decently-sized refactoring of the revision-walk
machinery. Well, "refactoring" is probably the wrong word, as I don't
actually remove the old code. Instead, when we see certain options in the
'rev_info' struct, we redirect the commit-walk logic to a new set of methods
that distribute the workload differently. By using generation numbers in the
commit-graph, we can significantly improve 'git log --graph' commands (and
the underlying 'git rev-list --topo-order').

On the Linux repository, I got the following performance results when
comparing to the previous version with or without a commit-graph:

Test: git rev-list --topo-order -100 HEAD
HEAD~1, no commit-graph: 6.80 s
HEAD~1, w/ commit-graph: 0.77 s
  HEAD, w/ commit-graph: 0.02 s

Test: git rev-list --topo-order -100 HEAD -- tools
HEAD~1, no commit-graph: 9.63 s
HEAD~1, w/ commit-graph: 6.06 s
  HEAD, w/ commit-graph: 0.06 s

If you want to read this series but are unfamiliar with the commit-graph and
generation numbers, then I recommend reading 
Documentation/technical/commit-graph.txt or a blob post [1] I wrote on the
subject. In particular, the three-part walk described in "revision.c:
refactor basic topo-order logic" is present (but underexplained) as an
animated PNG [2].

Since revision.c is an incredibly important (and old) portion of the
codebase -- and because there are so many orthogonal options in 'struct
rev_info' -- I consider this submission to be "RFC quality". That is, I am
not confident that I am not missing anything, or that my solution is the
best it can be. I did merge this branch with ds/commit-graph-with-grafts and
the "DO-NOT-MERGE: write and read commit-graph always" commit that computes
a commit-graph with every 'git commit' command. The test suite passed with
that change, available on GitHub [3]. To ensure that I cover at least the
case I think are interesting, I added tests to t6600-test-reach.sh to verify
the walks report the correct results for the three cases there (no
commit-graph, full commit-graph, and a partial commit-graph so the walk
starts at GENERATION_NUMBER_INFINITY).

One notable case that is not included in this series is the case of a
history comparison such as 'git rev-list --topo-order A..B'. The existing
code in limit_list() has ways to cut the walk short when all pending commits
are UNINTERESTING. Since this code depends on commit_list instead of the
prio_queue we are using here, I chose to leave it untouched for now. We can
revisit it in a separate series later. Since handle_commit() turns on
revs->limited when a commit is UNINTERESTING, we do not hit the new code in
this case. Removing this 'revs->limited = 1;' line yields correct results,
but the performance is worse.

This series was based on ds/reachable, but is now based on 'master' to not
conflict with 182070 "commit: use timestamp_t for author_date_slab". There
is a small conflict with md/filter-trees, because it renamed a flag in
revisions.h in the line before I add new flags. Hopefully this conflict is
not too difficult to resolve.

Changes in V3: I added a new patch that updates the tab-alignment for flags
in revision.h before adding new ones (Thanks, Ævar!). Also, I squashed the
recommended changes to run_three_modes and test_three_modes from Szeder and
Junio. Thanks!

Changes in V4: I'm sending a V4 to respond to the feedback so far. Still
looking forward to more on the really big commit!

 * Removed the whitespace changes to the flags in revision.c that caused
   merge pain. 
   
   
 * The prio-queue peek function is now covered by tests when in "stack"
   mode.
   
   
 * The "add_parents_to_list()" function is now renamed to
   "process_parents()"
   
   
 * Added a new commit that expands test coverage with alternate orders and
   file history (use GIT_TEST_COMMIT_GRAPH to have
   t6012-rev-list-simplify.sh cover the new logic). These tests found a
   problem with author dates (I forgot to record them during the explore
   walk).
   
   
 * Commit message edits.
   
   

Thanks, -Stolee

[1] 
https://blogs.msdn.microsoft.com/devops/2018/07/09/supercharging-the-git-commit-graph-iii-generations/
Supercharging the Git Commit Graph III: Generations and Graph Algorithms

[2] 
https://msdnshared.blob.core.windows.net/media/2018/06/commit-graph-topo-order-b-a.png
Animation showing three-part walk

[3] https://github.com/derrickstolee/git/tree/topo-order/testA branch
containing this series along with commits to compute commit-graph in entire
test suite.

Cc: avarab@gmail.comCc: szeder@gmail.com

Derrick Stolee (7):
  prio-queue: add 'peek' operation
  test-reach: add run_three_modes method
  test-reach: add rev-list tests
  revision.c: begin refactoring --topo-order logic
  commit/revisions: bookkeeping before refactoring
  revision.c: generation-based topo-order algorithm
  t6012: make rev-list tests more interesting

 commit.c |  11 +-
 commit.h |   8 ++
 object.h

[PATCH v4 2/7] test-reach: add run_three_modes method

2018-10-16 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

The 'test_three_modes' method assumes we are using the 'test-tool
reach' command for our test. However, we may want to use the data
shape of our commit graph and the three modes (no commit-graph,
full commit-graph, partial commit-graph) for other git commands.

Split test_three_modes to be a simple translation on a more general
run_three_modes method that executes the given command and tests
the actual output to the expected output.

Signed-off-by: Derrick Stolee 
---
 t/t6600-test-reach.sh | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index d139a00d1d..9d65b8b946 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -53,18 +53,22 @@ test_expect_success 'setup' '
git config core.commitGraph true
 '
 
-test_three_modes () {
+run_three_modes () {
test_when_finished rm -rf .git/objects/info/commit-graph &&
-   test-tool reach $1 actual &&
+   "$@" actual &&
test_cmp expect actual &&
cp commit-graph-full .git/objects/info/commit-graph &&
-   test-tool reach $1 actual &&
+   "$@" actual &&
test_cmp expect actual &&
cp commit-graph-half .git/objects/info/commit-graph &&
-   test-tool reach $1 actual &&
+   "$@" actual &&
test_cmp expect actual
 }
 
+test_three_modes () {
+   run_three_modes test-tool reach "$@"
+}
+
 test_expect_success 'ref_newer:miss' '
cat >input <<-\EOF &&
A:commit-5-7
-- 
gitgitgadget



Re: On overriding make variables from the environment...

2018-10-16 Thread SZEDER Gábor
On Tue, Oct 16, 2018 at 02:54:56PM -0700, Jonathan Nieder wrote:
> SZEDER Gábor wrote:
> > Our Makefile has lines like these:
> >
> >   CFLAGS = -g -O2 -Wall
> >   CC = cc
> >   AR = ar
> >   SPATCH = spatch
> >
> > Note the use of '=', not '?='.
> [...]
> > I'm not sure what to do.  I'm fine with updating our 'ci/' scripts to
> > explicitly respect CC in the environment (either by running 'make
> > CC=$CC' or by writing $CC into 'config.mak').  Or I could update our
> > Makefile to use '?=' for specific variables, but:
> 
> That's a good question.  I don't have a strong opinion myself, so I
> tend to trust larger projects like Linux to have thought this through
> more, and they use 'CC = cc' as well.

I don't think Linux is a good example to follow in this case, see e.g.
6d62c983f7 (Makefile: Change the default compiler from "gcc" to "cc",
2011-12-20) (in short: Git is supposed to be buildable with compilers
other than GCC as well, while Linux not really, so they can hardcode
'CC = gcc').

Also, the projects I have on hand usually have 'CC = gcc' hardcoded in
their Makefiles, too, but those Makefiles were generated by their
./configure script (which in turn by ./autogen.sh...), and those tend
to write CC from the environment into the generated Makefiles.



Re:Business proposition for you

2018-10-16 Thread Melvin Greg
Hello, 

Business proposition for you.

I have a client from Syrian who will like to invest with your 
company. My client is willing to invest $4 Million. Can I have 
your company website to show to my client your company so that 
they will check and decide if they will invest there funds with 
you as joint partner. 

This information is needed urgently.

Please reply. 

Best Regards,
Agent Melvin Greg
Tel:+1 4045966532


Re: [PATCH v4 9/9] Documentation/config: add odb..promisorRemote

2018-10-16 Thread Jonathan Tan
>  1. Teaching partial clone to attempt to fetch missing objects from
> multiple remotes instead of only one.  This is useful because you
> can have a server that is nearby and cheaper to serve from (some
> kind of local cache server) that you make requests to first before
> falling back to the canonical source of objects.

Quoting the above definition of (1) for reference. I think Jonathan
Nieder has covered the relevant points well - I'll just expand on (1).

> So much for the current setup.  For (1), I believe you are proposing to
> still have only one effective , so it doesn't necessarily
> require modifying the extensions.* configuration.  Instead, the idea is
> that when trying to access an object, we would follow one of a list of
> steps:
> 
>  1. First, check the local object store. If it's there, we're done.
>  2. Second, try alternates --- maybe the object is in one of those!
>  3. Now, try promisor remotes, one at a time, in user-configured order.
> 
> In other words, I think that for (1) all we would need is a new
> configuration
> 
>   [object]
>   missingObjectRemote = local-cache-remote
>   missingObjectRemote = origin
> 
> The semantics would be that when trying to access a promised object,
> we attempt to fetch from these remotes one at a time, in the order
> specified.  We could require that the remote named in
> extensions.partialClone be one of the listed remotes, without having
> to care where it shows up in the list.

Or allow extensions.partialClone= wherein  is not in the
missingObjectRemote, in which case  is tried first, so that we don't
have to reject some configurations.

> That way, we get the benefit (1) without having to change the
> semantics of extensions.partialClone and without having to care about
> the order of sections in the config.  What do you think?

Let's define the promisor remotes of a repository as those in
missingObjectRemote or extensions.partialClone (currently, we talk about
"the promisor remote" (singular), defined in extensions.partialClone).

Overall, this seems like a reasonable idea to me, if we keep the
restriction that we can only fetch with filter from a promisor remote.
This allows us to extend the definition of a promisor object in a
manner consistent to the current definition - to say "a promisor object
is one promised by at least one promisor remote" (currently, "a promisor
object is promised by the promisor remote").

In the presence of missingObjectRemote, old versions of Git, when lazily
fetching, would only know to try the extensions.partialClone remote. But
this is safe because existing data wouldn't be clobbered (since we're
not using ideas like adding meaning to the contents of the .promisor
file). Also, other things like fsck and gc still work.


Re: problem with not being able to enforce git content filters

2018-10-16 Thread Stas Bekman
On 2018-10-16 02:17 PM, Ævar Arnfjörð Bjarmason wrote:
[...]
>> We can't use server-side hooks to enforce this because the project is on
>> github.
> 
> Ultimately git's a distributed system and we won't ever be able to
> enforce that users in their local copies use filters, and they might
> edit stuff e.g. in the GitHub UI directly, or with some other git
> implementation.

In this particular case github won't be a problem, since for the problem
to appear it has to be executed on the user side. Editing directly in
github UI is not a problem.

Just to give a little big more context to the issue in first place. A
jupyter notebook is a json file that contains the source code, the
outputs from executing that code and the unique user environment bits.
We want to "git" the source code but not the rest. Otherwise merging is
a hell. Hence the stripping.

There are other approaches to solve this problem, besides stripping, but
 they all require some kind of pre-processing before committing the file.

> So if you have such special needs maybe consider hosting your own setup
> where you can have pre-receive hooks, or within GitHub e.g. enforce that
> all things must go through merge requests, and have some robot that
> checks that the content to be merged has gone through filters before
> being approved.

Yes, that would be ideal. But I doubt this is going to happen, I'm just
a contributing developer, not the creator/owner of the project. And as I
said this affects anybody who collaborates on jupyter notebooks, not
just in our project. I think there are several millions of them on github.

> *Picks up flag*. For the purposes of us trying to understand this report
> it would be really useful to boil what's happening down to some
> step-by-step reproducible recipe.
> 
> I.e. with some dummy filter configured & not configured how does "git
> pull" end up breaking, and in this case you're alluding to git simply
> forgetting config, how would that happen?

The problem of 'git pull' and 'git status' and 'git stash' is presented
in details here:
https://stackoverflow.com/questions/51883227/git-pull-stash-conflicts-with-a-git-filter
and more here:
https://github.com/kynan/nbstripout/issues/65
https://github.com/jupyter/nbdime/issues/410#issuecomment-412999758

The problem of configuration disappearing I sadly have no input. It
doesn't happen to me, and all I get from others is that it first works,
and then it doesn't. Perhaps it has something to do with some of them
using windows. I don't know, sorry.

>> The bottom line it sucks and I hope that you can help with offering a
>> programmatic solution, rather than recommending creating a police
>> department.
> 
> I think it would be great to have .gitconfig in-repo support, but a lot
> of security & UI problems need to be surmounted before that would
> happen, here's some previous ramblings of mine on that issue:
> https://public-inbox.org/git/?q=87zi6eakkt.fsf%40evledraar.gmail.com
> 
> It now occurs to me that a rather minimal proof-of-concept version of
> that would be:
> 
>  1. On repository clone, detect if HEAD:.gitconfig exists
> 
>  2. If it does, and we trust $(git config -f  -l)
> enough, emit some advice() output saying the project suggests
> setting config so-and-so, and that you could run the following one
> liner(s) to set it if you agree.
> 
>  3. Once we have that we could eventually nudge our way towards
> something like what I suggested in the linked threads above,
> i.e. consuming some subset of that config directly from the repo's
> HEAD:.gitconfig

I like it, Ævar!

I feel doing the check and prompting the user on the first push/commit
after clone would be a better. It'd be too easy for the user to skip
that step on git clone. In our particular case we want it where the
problem is introduced, which is on commit, and not on clone. I hope it
makes sense.


-- 

Stas Bekman   <'><   <'><
https://stasosphere.com  https://chestofbooks.com
https://experientialsexlab.com https://stason.org
https://stasosphere.com/experience-life/my-books


[PATCH] upload-pack: clear flags before each v2 request

2018-10-16 Thread Jonathan Tan
Suppose a server has the following commit graph:

 A   B
  \ /
   O

We create a client by cloning A from the server with depth 1, and add
many commits to it (so that future fetches span multiple requests due to
lengthy negotiation). If it then fetches B using protocol v2, the fetch
spanning multiple requests, the resulting packfile does not contain O
even though the client did report that A is shallow.

This is because upload_pack_v2() can be called multiple times while
processing the same session. During the 2nd and all subsequent
invocations, some object flags remain from the previous invocations. In
particular, CLIENT_SHALLOW remains, preventing process_shallow() from
adding client-reported shallows to the "shallows" array, and hence
pack-objects not knowing about these client-reported shallows.

Therefore, teach upload_pack_v2() to clear object flags at the start of
each invocation.

(One alternative is to reduce or eliminate usage of object flags in
protocol v2, but that doesn't seem feasible because almost all 8 flags
are used pervasively in v2 code.)

Signed-off-by: Jonathan Tan 
---
This was noticed by Arturas Moskvinas  in [1]. The
reproduction steps given were to repeat a shallow fetch twice in
succession, but I found it easier to write a more understandable test if
I made the 2nd fetch an ordinary fetch. In any case, I also reran the
original reproduction steps, and the fetch completes without error.

This patch doesn't cover the negotiation issue that I mentioned in my
previous reply [2].

[1] 
https://public-inbox.org/git/CAGY-PBgsG-T3JY=awszwgmpfx+jdx-a1fcv0s6vr067bsqg...@mail.gmail.com/
[2] 
https://public-inbox.org/git/20181013004356.257709-1-jonathanta...@google.com/
---
 t/t5702-protocol-v2.sh | 25 +
 upload-pack.c  |  5 +
 2 files changed, 30 insertions(+)

diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index 88a886975d..70b88385ba 100755
--- a/t/t5702-protocol-v2.sh
+++ b/t/t5702-protocol-v2.sh
@@ -429,6 +429,31 @@ test_expect_success 'fetch supports include-tag and tag 
following' '
git -C client cat-file -e $(git -C client rev-parse annotated_tag)
 '
 
+test_expect_success 'upload-pack respects client shallows' '
+   rm -rf server client trace &&
+
+   git init server &&
+   test_commit -C server base &&
+   test_commit -C server client_has &&
+
+   git clone --depth=1 "file://$(pwd)/server" client &&
+
+   # Add extra commits to the client so that the whole fetch takes more
+   # than 1 request (due to negotiation)
+   for i in $(seq 1 32)
+   do
+   test_commit -C client c$i
+   done &&
+
+   git -C server checkout -b newbranch base &&
+   test_commit -C server client_wants &&
+
+   GIT_TRACE_PACKET="$(pwd)/trace" git -C client -c protocol.version=2 \
+   fetch origin newbranch &&
+   # Ensure that protocol v2 is used
+   grep "git< version 2" trace
+'
+
 # Test protocol v2 with 'http://' transport
 #
 . "$TEST_DIRECTORY"/lib-httpd.sh
diff --git a/upload-pack.c b/upload-pack.c
index 62a1000f44..de7de1de38 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -37,6 +37,9 @@
 #define CLIENT_SHALLOW (1u << 18)
 #define HIDDEN_REF (1u << 19)
 
+#define ALL_FLAGS (THEY_HAVE | OUR_REF | WANTED | COMMON_KNOWN | SHALLOW | \
+   NOT_SHALLOW | CLIENT_SHALLOW | HIDDEN_REF)
+
 static timestamp_t oldest_have;
 
 static int deepen_relative;
@@ -1393,6 +1396,8 @@ int upload_pack_v2(struct repository *r, struct 
argv_array *keys,
enum fetch_state state = FETCH_PROCESS_ARGS;
struct upload_pack_data data;
 
+   clear_object_flags(ALL_FLAGS);
+
git_config(upload_pack_config, NULL);
 
upload_pack_data_init();
-- 
2.19.0.271.gfe8321ec05.dirty



Re: On overriding make variables from the environment...

2018-10-16 Thread Jonathan Nieder
Hi,

SZEDER Gábor wrote:

> Our Makefile has lines like these:
>
>   CFLAGS = -g -O2 -Wall
>   CC = cc
>   AR = ar
>   SPATCH = spatch
>
> Note the use of '=', not '?='.
[...]
> I'm not sure what to do.  I'm fine with updating our 'ci/' scripts to
> explicitly respect CC in the environment (either by running 'make
> CC=$CC' or by writing $CC into 'config.mak').  Or I could update our
> Makefile to use '?=' for specific variables, but:

That's a good question.  I don't have a strong opinion myself, so I
tend to trust larger projects like Linux to have thought this through
more, and they use 'CC = cc' as well.  So I'd lean toward the updating
'ci/' scripts approach, to do something like

make ${CC:+"CC=$CC"} ...

(or the equivalent multi-line construction).

That also has the bonus of being explicit.

Just my two cents,
Jonathan


[RFC] revision: Add --sticky-default option

2018-10-16 Thread Andreas Gruenbacher
Hi,

here's a long-overdue update of my proposal from August 29:

  [RFC] revision: Don't let ^ cancel out the default 

Does this look more acceptable that my first shot?

Thanks,
Andreas

--

Some commands like 'log' default to HEAD if no other revisions are
specified on the command line or otherwise.  Currently, excludes
(^) cancel out that default, so when a command only excludes
revisions (e.g., 'git log ^origin/master'), it will produce no output.

With the --sticky-default option, the default becomes more "sticky" and
is no longer canceled out by excludes.

This is useful in wrappers that exclude certain revisions: for example,
a simple alias l='git log --sticky-default ^origin/master' will show the
revisions between origin/master and HEAD when invoked without arguments,
and 'l foo' will show the revisions between origin/master and foo.

Signed-off-by: Andreas Gruenbacher 
---
 revision.c | 31 +++
 revision.h |  1 +
 t/t4202-log.sh |  6 ++
 3 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/revision.c b/revision.c
index e18bd530e..6c93ec74b 100644
--- a/revision.c
+++ b/revision.c
@@ -1159,6 +1159,25 @@ static void add_rev_cmdline(struct rev_info *revs,
info->nr++;
 }
 
+static int has_revisions(struct rev_info *revs)
+{
+   struct rev_cmdline_info *info = >cmdline;
+
+   return info->nr != 0;
+}
+
+static int has_interesting_revisions(struct rev_info *revs)
+{
+   struct rev_cmdline_info *info = >cmdline;
+   unsigned int n;
+
+   for (n = 0; n < info->nr; n++) {
+   if (!(info->rev[n].flags & UNINTERESTING))
+   return 1;
+   }
+   return 0;
+}
+
 static void add_rev_cmdline_list(struct rev_info *revs,
 struct commit_list *commit_list,
 int whence,
@@ -2132,6 +2151,8 @@ static int handle_revision_opt(struct rev_info *revs, int 
argc, const char **arg
} else if (!strcmp(arg, "--children")) {
revs->children.name = "children";
revs->limited = 1;
+   } else if (!strcmp(arg, "--sticky-default")) {
+   revs->sticky_default = 1;
} else if (!strcmp(arg, "--ignore-missing")) {
revs->ignore_missing = 1;
} else if (!strcmp(arg, "--exclude-promisor-objects")) {
@@ -2319,7 +2340,8 @@ static void NORETURN diagnose_missing_default(const char 
*def)
  */
 int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct 
setup_revision_opt *opt)
 {
-   int i, flags, left, seen_dashdash, got_rev_arg = 0, revarg_opt;
+   int i, flags, left, seen_dashdash, revarg_opt;
+   int cancel_default;
struct argv_array prune_data = ARGV_ARRAY_INIT;
const char *submodule = NULL;
 
@@ -2401,8 +2423,6 @@ int setup_revisions(int argc, const char **argv, struct 
rev_info *revs, struct s
argv_array_pushv(_data, argv + i);
break;
}
-   else
-   got_rev_arg = 1;
}
 
if (prune_data.argc) {
@@ -2431,7 +2451,10 @@ int setup_revisions(int argc, const char **argv, struct 
rev_info *revs, struct s
opt->tweak(revs, opt);
if (revs->show_merge)
prepare_show_merge(revs);
-   if (revs->def && !revs->pending.nr && !revs->rev_input_given && 
!got_rev_arg) {
+   cancel_default = revs->sticky_default ?
+has_interesting_revisions(revs) :
+has_revisions(revs);
+   if (revs->def && !revs->rev_input_given && !cancel_default) {
struct object_id oid;
struct object *object;
struct object_context oc;
diff --git a/revision.h b/revision.h
index 2b30ac270..570fa1a6d 100644
--- a/revision.h
+++ b/revision.h
@@ -92,6 +92,7 @@ struct rev_info {
 
unsigned int early_output;
 
+   unsigned intsticky_default:1;
unsigned intignore_missing:1,
ignore_missing_links:1;
 
diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index 153a50615..9517a65da 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -213,6 +213,12 @@ test_expect_success 'git show  leaves list of 
commits as given' '
test_cmp expect actual
 '
 
+printf "sixth\nfifth\n" > expect
+test_expect_success '--sticky-default ^' '
+   git log --pretty="tformat:%s" --sticky-default ^HEAD~2 > actual &&
+   test_cmp expect actual
+'
+
 test_expect_success 'setup case sensitivity tests' '
echo case >one &&
test_tick &&
-- 
2.17.2



Re: What's cooking in git.git (Oct 2018, #02; Sat, 13)

2018-10-16 Thread Josh Steadmon
On 2018.10.12 23:53, Junio C Hamano wrote:
> * js/remote-archive-v2 (2018-09-28) 4 commits
>   (merged to 'next' on 2018-10-12 at 5f34377f60)
>  + archive: allow archive over HTTP(S) with proto v2
>  + archive: implement protocol v2 archive command
>  + archive: use packet_reader for communications
>  + archive: follow test standards around assertions
> 
>  The original implementation of "git archive --remote" more or less
>  bypassed the transport layer and did not work over http(s).  The
>  version 2 of the protocol is defined to allow going over http(s) as
>  well as Git native transport.
> 
>  Will merge to 'master'.

The first two patches (test cleanup and packet_reader refactor) are OK,
but the latter two will break the archive command when an old client
attempts to talk to a new server (due to the version advertisement
problem noted in [1]). Sorry that I didn't catch that these had made it
into next.

[1]: https://public-inbox.org/git/20180927223314.ga230...@google.com/#t


Re: problem with not being able to enforce git content filters

2018-10-16 Thread Ævar Arnfjörð Bjarmason


On Tue, Oct 16 2018, Stas Bekman wrote:

> When a person has a stripped out notebook checked out, when another
> person commits un-stripped out notebook, it leads to: invalid `git
> status` reports, `git pull` breaks, `git stash` doesn't work, since it
> tries to stash using the filters, and `git pull' can never succeed
> because it thinks that it'll overwrite the local changes, but `git diff`
> returns no changes.

Planting a flag here. Let's get to this later.

> So the only solution when this happens is to disable the filters, clean
> up the mess, re-enable the filters. Many people just make a new clone -
> ouch!
>
> And the biggest problem is that it affects all users who may have the
> filters enabled, e.g. because they worked on a PR, and not just devs -
> i.e. the repercussions are much bigger than just a few devs affected.
>
> We can't use server-side hooks to enforce this because the project is on
> github.

Ultimately git's a distributed system and we won't ever be able to
enforce that users in their local copies use filters, and they might
edit stuff e.g. in the GitHub UI directly, or with some other git
implementation.

So if you have such special needs maybe consider hosting your own setup
where you can have pre-receive hooks, or within GitHub e.g. enforce that
all things must go through merge requests, and have some robot that
checks that the content to be merged has gone through filters before
being approved.

> And the devs honestly try to do their best to remember to configure the
> filters, but for some reason they disappear for them, don't ask me why,
> I don't know. This is an open source project team, not a work place.

*Picks up flag*. For the purposes of us trying to understand this report
it would be really useful to boil what's happening down to some
step-by-step reproducible recipe.

I.e. with some dummy filter configured & not configured how does "git
pull" end up breaking, and in this case you're alluding to git simply
forgetting config, how would that happen?

> There needs to be a way for a project to define content filters w/o
> going via user's .gitconfig configuration, since due to git's security
> it can't be distributed to all users and must be enabled manually by
> each user, which is just not the right solution in this case.
>
> Of course, I'm open to other suggestions that may help this issue.
>
> "Tell your developers they must configure the filters" is not it - I
> tried it for a long time and in vain. If you look at our install
> instructions: https://github.com/fastai/fastai#developer-install
>
>   git clone https://github.com/fastai/fastai
>   cd fastai
>   tools/run-after-git-clone
>
> It already includes an instruction to run a script which enables the
> filters, but this doesn't seem to help (and no, it's not a problem with
> the script). The devs report that the configuration is there for a
> while, and then suddenly it is not there, I don't know why. Perhaps they
> make a new clone and forget to re-enable the filters, perhaps they
> disable them to clean up and forget to reenable them, I can't tell.

FWIW I tried following these on my Debian install and both on python2
and python3 I get either syntax errors or a combo of that and a missing
pathlib from that script. I don't know Python well enough to debug
it. Maybe that's part of the issue?

> The bottom line it sucks and I hope that you can help with offering a
> programmatic solution, rather than recommending creating a police
> department.

I think it would be great to have .gitconfig in-repo support, but a lot
of security & UI problems need to be surmounted before that would
happen, here's some previous ramblings of mine on that issue:
https://public-inbox.org/git/?q=87zi6eakkt.fsf%40evledraar.gmail.com

It now occurs to me that a rather minimal proof-of-concept version of
that would be:

 1. On repository clone, detect if HEAD:.gitconfig exists

 2. If it does, and we trust $(git config -f  -l)
enough, emit some advice() output saying the project suggests
setting config so-and-so, and that you could run the following one
liner(s) to set it if you agree.

 3. Once we have that we could eventually nudge our way towards
something like what I suggested in the linked threads above,
i.e. consuming some subset of that config directly from the repo's
HEAD:.gitconfig


Re: [PATCH] submodule helper: convert relative URL to absolute URL if needed

2018-10-16 Thread Jonathan Nieder
Stefan Beller wrote:

> Reported-by: Jaewoong Jung 
> Signed-off-by: Stefan Beller 
> ---
>  builtin/submodule--helper.c | 51 -
>  t/t7400-submodule-basic.sh  | 24 +
>  2 files changed, 58 insertions(+), 17 deletions(-)

Reviewed-by: Jonathan Nieder 

Thanks for your patient work.


[PATCH 3/3] pack-objects (mingw): initialize `packing_data` mutex in the correct spot

2018-10-16 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

In 9ac3f0e5b3e4 (pack-objects: fix performance issues on packing large
deltas, 2018-07-22), a mutex was introduced that is used to guard the
call to set the delta size. This commit even added code to initialize
it, but at an incorrect spot: in `init_threaded_search()`, while the
call to `oe_set_delta_size()` (and hence to `packing_data_lock()`) can
happen in the call chain `check_object()` <- `get_object_details()` <-
`prepare_pack()` <- `cmd_pack_objects()`, which is long before the
`prepare_pack()` function calls `ll_find_deltas()` (which initializes
the threaded search).

Another tell-tale that the mutex was initialized in an incorrect spot is
that the function to initialize it lives in builtin/, while the code
that uses the mutex is defined in a libgit.a header file.

Let's use a more appropriate function: `prepare_packing_data()`, which
not only lives in libgit.a, but *has* to be called before the
`packing_data` struct is used that contains that mutex.

This fixes https://github.com/git-for-windows/git/issues/1839.

Signed-off-by: Johannes Schindelin 
---
 builtin/pack-objects.c| 1 -
 pack-objects.c| 3 +++
 t/t5321-pack-large-objects.sh | 2 +-
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index e6316d294d..e752cf9c7a 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2363,7 +2363,6 @@ static void init_threaded_search(void)
pthread_mutex_init(_mutex, NULL);
pthread_mutex_init(_mutex, NULL);
pthread_cond_init(_cond, NULL);
-   pthread_mutex_init(_pack.lock, NULL);
old_try_to_free_routine = 
set_try_to_free_routine(try_to_free_from_threads);
 }
 
diff --git a/pack-objects.c b/pack-objects.c
index 7e624c30eb..b6cdbb0166 100644
--- a/pack-objects.c
+++ b/pack-objects.c
@@ -148,6 +148,9 @@ void prepare_packing_data(struct packing_data *pdata)
 1U << OE_SIZE_BITS);
pdata->oe_delta_size_limit = git_env_ulong("GIT_TEST_OE_DELTA_SIZE",
   1UL << OE_DELTA_SIZE_BITS);
+#ifndef NO_PTHREADS
+   pthread_mutex_init(>lock, NULL);
+#endif
 }
 
 struct object_entry *packlist_alloc(struct packing_data *pdata,
diff --git a/t/t5321-pack-large-objects.sh b/t/t5321-pack-large-objects.sh
index c36c66fbb4..a75eab87d3 100755
--- a/t/t5321-pack-large-objects.sh
+++ b/t/t5321-pack-large-objects.sh
@@ -24,7 +24,7 @@ test_expect_success 'setup' '
git index-pack --stdin 

[PATCH 0/3] Fix gc segfault

2018-10-16 Thread Jameson Miller via GitGitGadget
In 9ac3f0e (pack-objects: fix performance issues on packing large deltas,
2018-07-22), a mutex was introduced that is used to guard the call to set
the delta size. This commit added code to initialize it, but at an incorrect
spot: in init_threaded_search(), while the call to oe_set_delta_size() (and
hence to packing_data_lock()) can happen in the call chain check_object() <- 
get_object_details() <-prepare_pack() <- cmd_pack_objects(), which is long
before theprepare_pack() function calls ll_find_deltas() (which initializes
the threaded search).

Another tell-tale that the mutex was initialized in an incorrect spot is
that the function to initialize it lives in builtin/, while the code that
uses the mutex is defined in a libgit.a header file.

Let's use a more appropriate function: prepare_packing_data(), which not
only lives in libgit.a, but has to be called before thepacking_data struct
is used that contains that mutex.

Johannes Schindelin (3):
  Fix typo 'detla' -> 'delta'
  pack-objects (mingw): demonstrate a segmentation fault with large
deltas
  pack-objects (mingw): initialize `packing_data` mutex in the correct
spot

 builtin/pack-objects.c|  1 -
 pack-objects.c|  3 +++
 pack-objects.h|  2 +-
 t/t5321-pack-large-objects.sh | 32 
 4 files changed, 36 insertions(+), 2 deletions(-)
 create mode 100755 t/t5321-pack-large-objects.sh


base-commit: 5a0cc8aca797dbd7d2be3b67458ff880ed45cddf
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-47%2Fjamill%2Ffix-gc-segfault-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-47/jamill/fix-gc-segfault-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/47
-- 
gitgitgadget


[PATCH 2/3] pack-objects (mingw): demonstrate a segmentation fault with large deltas

2018-10-16 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

There is a problem in the way 9ac3f0e5b3e4 (pack-objects: fix
performance issues on packing large deltas, 2018-07-22) initializes that
mutex in the `packing_data` struct. The problem manifests in a
segmentation fault on Windows, when a mutex (AKA critical section) is
accessed without being initialized. (With pthreads, you apparently do
not really have to initialize them?)

This was reported in https://github.com/git-for-windows/git/issues/1839.

Signed-off-by: Johannes Schindelin 
---
 t/t5321-pack-large-objects.sh | 32 
 1 file changed, 32 insertions(+)
 create mode 100755 t/t5321-pack-large-objects.sh

diff --git a/t/t5321-pack-large-objects.sh b/t/t5321-pack-large-objects.sh
new file mode 100755
index 00..c36c66fbb4
--- /dev/null
+++ b/t/t5321-pack-large-objects.sh
@@ -0,0 +1,32 @@
+#!/bin/sh
+#
+# Copyright (c) 2018 Johannes Schindelin
+#
+
+test_description='git pack-object with "large" deltas
+
+'
+. ./test-lib.sh
+. "$TEST_DIRECTORY"/lib-pack.sh
+
+# Two similar-ish objects that we have computed deltas between.
+A=01d7713666f4de822776c7622c10f1b07de280dc
+B=e68fe8129b546b101aee9510c5328e7f21ca1d18
+
+test_expect_success 'setup' '
+   clear_packs &&
+   {
+   pack_header 2 &&
+   pack_obj $A $B &&
+   pack_obj $B
+   } >ab.pack &&
+   pack_trailer ab.pack &&
+   git index-pack --stdin 

[PATCH 1/3] Fix typo 'detla' -> 'delta'

2018-10-16 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

Signed-off-by: Johannes Schindelin 
---
 pack-objects.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pack-objects.h b/pack-objects.h
index 2ca39cfcfe..86ee93feb4 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -377,7 +377,7 @@ static inline unsigned long oe_delta_size(struct 
packing_data *pack,
return e->delta_size_;
 
/*
-* pack->detla_size[] can't be NULL because oe_set_delta_size()
+* pack->delta_size[] can't be NULL because oe_set_delta_size()
 * must have been called when a new delta is saved with
 * oe_set_delta().
 * If oe_delta() returns NULL (i.e. default state, which means
-- 
gitgitgadget



Re: [PATCH v2 10/13] tests: include detailed trace logs with --write-junit-xml upon failure

2018-10-16 Thread Johannes Schindelin
Hi Gábor,

On Tue, 16 Oct 2018, SZEDER Gábor wrote:

> On Tue, Oct 16, 2018 at 03:02:38PM +0200, Johannes Schindelin wrote:
>
> > So I would suggest to go forward with my proposed strategy for the
> > moment, right up until the time when we have had the resources to fix
> > t5570, for starters.
> 
> I don't really understand what the occasional failures in t5570 have
> to do with the amount of information a CI system should gather about
> failures in general.

I see it plenty of times that too many CI failures essentially render
every developer numb.

If every 3rd CI run causes a failure, and seemingly every of these
failures indicates a mistake in the regression test, rather than a
regression, developers stop paying attention.

Which is the exact opposite of what I want to achieve here.

Ciao,
Dscho

problem with not being able to enforce git content filters

2018-10-16 Thread Stas Bekman
Hi,

TL;DR

Our open source project dev team has a continuous problem with git
content filters, because developers don't always have them configured.

We need a way for git to support content filters w/o using user's
.gitconfig. Otherwise it leads to an inconsistent behavior and messed up
git checkouts.

=

Full story:

We use a version of the nbstripout content filter
(https://github.com/kynan/nbstripout), which removes user-specific
information from the jupyter notebooks during commit. But the problem
would be the same with any one way clean filter.

First the setup:
https://github.com/kynan/nbstripout#manual-filter-installation

=
Set up a git filter using nbstripout as follows:

git config filter.nbstripout.clean '/path/to/nbstripout'
git config filter.nbstripout.smudge cat
git config filter.nbstripout.required true

Create a file .gitattributes or .git/info/attributes with:

*.ipynb filter=nbstripout

Apply the filter for git diff of *.ipynb files:

git config diff.ipynb.textconv '/path/to/nbstripout -t'

In file .gitattributes or .git/info/attributes add:

*.ipynb diff=ipynb

=

The problem is that it can't be enforced.

When it's not enforced, we end up with some devs using it and others
don't, or more often is the same dev sometimes doesn't have it configured.

When a person has a stripped out notebook checked out, when another
person commits un-stripped out notebook, it leads to: invalid `git
status` reports, `git pull` breaks, `git stash` doesn't work, since it
tries to stash using the filters, and `git pull' can never succeed
because it thinks that it'll overwrite the local changes, but `git diff`
returns no changes.

So the only solution when this happens is to disable the filters, clean
up the mess, re-enable the filters. Many people just make a new clone -
ouch!

And the biggest problem is that it affects all users who may have the
filters enabled, e.g. because they worked on a PR, and not just devs -
i.e. the repercussions are much bigger than just a few devs affected.

We can't use server-side hooks to enforce this because the project is on
github.

And the devs honestly try to do their best to remember to configure the
filters, but for some reason they disappear for them, don't ask me why,
I don't know. This is an open source project team, not a work place.

You can see some related complaints here:
https://github.com/kynan/nbstripout/issues/65#issuecomment-430346894
https://stackoverflow.com/questions/51883227/git-pull-stash-conflicts-with-a-git-filter
and I can find you a whole bunch more if you need more evidence.

==

Proposed solution:

There needs to be a way for a project to define content filters w/o
going via user's .gitconfig configuration, since due to git's security
it can't be distributed to all users and must be enabled manually by
each user, which is just not the right solution in this case.

Of course, I'm open to other suggestions that may help this issue.

"Tell your developers they must configure the filters" is not it - I
tried it for a long time and in vain. If you look at our install
instructions: https://github.com/fastai/fastai#developer-install

  git clone https://github.com/fastai/fastai
  cd fastai
  tools/run-after-git-clone

It already includes an instruction to run a script which enables the
filters, but this doesn't seem to help (and no, it's not a problem with
the script). The devs report that the configuration is there for a
while, and then suddenly it is not there, I don't know why. Perhaps they
make a new clone and forget to re-enable the filters, perhaps they
disable them to clean up and forget to reenable them, I can't tell.

The bottom line it sucks and I hope that you can help with offering a
programmatic solution, rather than recommending creating a police
department.

Thank you for listening.

-- 

Stas Bekman   <'><   <'><
https://stasosphere.com  https://chestofbooks.com
https://experientialsexlab.com https://stason.org
https://stasosphere.com/experience-life/my-books


[PATCH v2 1/2] merge-recursive: improve auto-merging messages with path collisions

2018-10-16 Thread Elijah Newren
Each individual file involved in a rename could have also been modified
on both sides of history, meaning it may need to have content merges.
If two such files are renamed into the same location, then on top of the
two natural auto-merging messages we also have to two-way merge the
result, giving us messages that look like

  Auto-merging somefile.c (was somecase.c)
  Auto-merging somefile.c (was somefolder.c)
  Auto-merging somefile.c

However, despite the fact that I was the one who put the "(was %s)"
portions into the messages (and just a few months ago), I was still
initially confused when running into a rename/rename(2to1) case and
wondered if somefile.c had been merged three times.  Update this to
instead be:

  Auto-merging version of somefile.c from somecase.c
  Auto-merging version of somefile.c from someportfolio.c
  Auto-merging somefile.c

This is an admittedly long set of messages for a single path, but you
only get all three messages when dealing with the rare case of a
rename/rename(2to1) conflict where both sides of both original files
were also modified, in conflicting ways.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 5206d6cfb6..8a47e54e2f 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1674,8 +1674,8 @@ static int handle_rename_rename_2to1(struct merge_options 
*o,
remove_file(o, 1, a->path, o->call_depth || 
would_lose_untracked(a->path));
remove_file(o, 1, b->path, o->call_depth || 
would_lose_untracked(b->path));
 
-   path_side_1_desc = xstrfmt("%s (was %s)", path, a->path);
-   path_side_2_desc = xstrfmt("%s (was %s)", path, b->path);
+   path_side_1_desc = xstrfmt("version of %s from %s", path, a->path);
+   path_side_2_desc = xstrfmt("version of %s from %s", path, b->path);
if (merge_mode_and_contents(o, a, c1, >ren1_other, path_side_1_desc,
o->branch1, o->branch2, _c1) ||
merge_mode_and_contents(o, b, >ren2_other, c2, path_side_2_desc,
-- 
2.19.1.280.g0c175526bf



[PATCH v2 0/2] More merge cleanups

2018-10-16 Thread Elijah Newren
This series adds a few more cleanups on top of en/merge-cleanup.
Changes since v1:
  - Removed two patches that will instead be included in a follow-on
series, as suggested by Junio.
  - Incorporated commit message cleanups (capitalization and indents)
made by Junio to the previous round.

Elijah Newren (2):
  merge-recursive: improve auto-merging messages with path collisions
  merge-recursive: avoid showing conflicts with merge branch before HEAD

 merge-recursive.c | 37 ---
 t/t6036-recursive-corner-cases.sh |  8 +++
 2 files changed, 38 insertions(+), 7 deletions(-)

-- 
2.19.1.280.g0c175526bf



[PATCH v2 2/2] merge-recursive: avoid showing conflicts with merge branch before HEAD

2018-10-16 Thread Elijah Newren
We want to load unmerged entries from HEAD into the index at stage 2 and
from MERGE_HEAD into stage 3.  Similarly, folks expect merge conflicts
to look like

 HEAD
content from our side

content from their side
 MERGE_HEAD

not

 MERGE_HEAD
content from their side

content from our side
 HEAD

The correct order usually comes naturally and for free, but with renames
we often have data in the form {rename_branch, other_branch}, and
working relative to the rename first (e.g. for rename/add) is more
convenient elsewhere in the code.  Address the slight impedance
mismatch by having some functions re-call themselves with flipped
arguments when the branch order is reversed.

Note that setup_rename_conflict_info() has one asymmetry in it, in
setting dst_entry1->processed=0 but not doing similarly for
dst_entry2->processed.  When dealing with rename/rename and similar
conflicts, we do not want the processing to happen twice, so the
desire to only set one of the entries to unprocessed is intentional.
So, while this change modifies which branch's entry will be marked as
unprocessed, that dovetails nicely with putting HEAD first so that we
get the index stage entries and conflict markers in the right order.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c | 33 ++-
 t/t6036-recursive-corner-cases.sh |  8 
 2 files changed, 36 insertions(+), 5 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 8a47e54e2f..16980db7f9 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -228,7 +228,26 @@ static inline void setup_rename_conflict_info(enum 
rename_type rename_type,
  struct stage_data *src_entry1,
  struct stage_data *src_entry2)
 {
-   struct rename_conflict_info *ci = xcalloc(1, sizeof(struct 
rename_conflict_info));
+   struct rename_conflict_info *ci;
+
+   /*
+* When we have two renames involved, it's easiest to get the
+* correct things into stage 2 and 3, and to make sure that the
+* content merge puts HEAD before the other branch if we just
+* ensure that branch1 == o->branch1.  So, simply flip arguments
+* around if we don't have that.
+*/
+   if (dst_entry2 && branch1 != o->branch1) {
+   setup_rename_conflict_info(rename_type,
+  pair2,  pair1,
+  branch2,branch1,
+  dst_entry2, dst_entry1,
+  o,
+  src_entry2, src_entry1);
+   return;
+   }
+
+   ci = xcalloc(1, sizeof(struct rename_conflict_info));
ci->rename_type = rename_type;
ci->pair1 = pair1;
ci->branch1 = branch1;
@@ -1283,6 +1302,18 @@ static int merge_mode_and_contents(struct merge_options 
*o,
   const char *branch2,
   struct merge_file_info *result)
 {
+   if (o->branch1 != branch1) {
+   /*
+* It's weird getting a reverse merge with HEAD on the bottom
+* side of the conflict markers and the other branch on the
+* top.  Fix that.
+*/
+   return merge_mode_and_contents(o, one, b, a,
+  filename,
+  branch2, branch1,
+  extra_marker_size, result);
+   }
+
result->merge = 0;
result->clean = 1;
 
diff --git a/t/t6036-recursive-corner-cases.sh 
b/t/t6036-recursive-corner-cases.sh
index 59e52c5a09..e1cef58f2a 100755
--- a/t/t6036-recursive-corner-cases.sh
+++ b/t/t6036-recursive-corner-cases.sh
@@ -230,13 +230,13 @@ test_expect_success 'git detects differently handled 
merges conflict' '
:2:new_a :3:new_a &&
test_cmp expect actual &&
 
-   git cat-file -p B:new_a >ours &&
-   git cat-file -p C:new_a >theirs &&
+   git cat-file -p C:new_a >ours &&
+   git cat-file -p B:new_a >theirs &&
>empty &&
test_must_fail git merge-file \
-   -L "Temporary merge branch 2" \
-   -L "" \
-L "Temporary merge branch 1" \
+   -L "" \
+   -L "Temporary merge branch 2" \
ours empty theirs &&
sed -e "s/^\([<=>]\)/\1\1\1/" ours >expect &&
git cat-file -p :1:new_a >actual &&
-- 
2.19.1.280.g0c175526bf



Re: [PATCH v10 00/21] Convert "git stash" to C builtin

2018-10-16 Thread Thomas Gummerer
On 10/16, Johannes Schindelin wrote:
> Hi Thomas,
> 
> On Mon, 15 Oct 2018, Thomas Gummerer wrote:
> 
> >  2:  63f2e0e6f9 !  2:  2d45985676 strbuf.c: add `strbuf_join_argv()`
> > @@ -14,19 +14,17 @@
> > strbuf_setlen(sb, sb->len + sb2->len);
> >   }
> >   
> > -+const char *strbuf_join_argv(struct strbuf *buf,
> > -+   int argc, const char **argv, char delim)
> > ++void strbuf_join_argv(struct strbuf *buf,
> > ++int argc, const char **argv, char delim)
> 
> While the patch series does not use the return value, I have to ask
> whether it would really be useful to change it to return `void`. I could
> imagine that there may already be quite a few code paths that would love
> to use strbuf_join_argv(), *and* would benefit from the `const char *`
> return value.

Fair enough.  I did suggest changing the return type to void here, as
I found the API a bit odd compared to the rest of the strbuf API,
however after looking at this again I agree with you, and returning a
const char * here does seem more helpful.  Sorry about the confusion
Paul-Sebastian!

> In other words: just because the *current* patches do not make use of that
> quite convenient return value does not mean that we should remove that
> convenience.
>
> >  7:  a2abd1b4bd !  8:  974dbaa492 stash: convert apply to builtin
> > @@ -370,18 +370,20 @@
> >  +
> >  +  if (diff_tree_binary(, >w_commit)) {
> >  +  strbuf_release();
> > -+  return -1;
> > ++  return error(_("Could not generate diff 
> > %s^!."),
> > ++   
> > oid_to_hex(>w_commit));
> 
> Please start the argument of an `error()` call with a lower-case letter.

I think this comes from your fixup! commit ;) But I do agree, these should be
lower-case.

> >  +  }
> >  +
> >  +  ret = apply_cached();
> >  +  strbuf_release();
> >  +  if (ret)
> > -+  return -1;
> > ++  return error(_("Conflicts in index."
> > ++ "Try without --index."));
> 
> Same here.
> 
> >  +
> >  +  discard_cache();
> >  +  read_cache();
> >  +  if (write_cache_as_tree(_tree, 0, NULL))
> > -+  return -1;
> > ++  return error(_("Could not save index 
> > tree"));
> 
> And here.
> 
> > 15:  bd827be103 ! 15:  989db67e9a stash: convert create to builtin
> > @@ -119,7 +119,6 @@
> >  +static int check_changes(struct pathspec ps, int include_untracked)
> >  +{
> >  +  int result;
> > -+  int ret = 0;
> 
> I was curious about this change, and could not find it in the
> git-stash-v10 tag of https://github.com/ungps/git...

This line has been removed in v10, but did exist in v9, so
the git-stash-v10 should indeed not have this line.  I suggested
removing it in [*1*], because it breaks compilation with DEVELOPER=1
at this step.

> > 18:  1c501ad666 ! 18:  c90e30173a stash: convert save to builtin
> > @@ -72,8 +72,10 @@
> >  +   git_stash_helper_save_usage,
> >  +   PARSE_OPT_KEEP_DASHDASH);
> >  +
> > -+  if (argc)
> > -+  stash_msg = (char*) strbuf_join_argv(, argc, argv, 
> > ' ');
> > ++  if (argc) {
> > ++  strbuf_join_argv(, argc, argv, ' ');
> > ++  stash_msg = buf.buf;
> > ++  }
> 
> Aha! So there *was* a user of that return value. I really would prefer a
> non-void return value here.

Right, I'd argue we're mis-using the API here though.  do_push_stash
who we later pass stash_msg to takes ownership and later free's the
memory before returning.  This doesn't cause issues in the test suite
at the moment, because do_create_stash() doesn't always free stash_msg
before assigning a new value to the pointer, but would cause issues
when do_create_stash exits early.

Rather than the solution I proposed in I think it would be nicer to
use 'stash_msg = strbuf_detach(...)' above.

I'm still happy with the function returning buf->buf as const char *,
but I'm not sure we should use that return value here.

> > 19:  c4401b21db ! 19:  4360ea875d stash: convert `stash--helper.c` into 
> > `stash.c`
> > @@ -264,9 +320,9 @@
> >  -  argc = parse_options(argc, argv, prefix, options,
> >  -   git_stash_helper_create_usage,
> >  -   0);
> > -+  /* Startinf with argv[1], since argv[0] is "create" */
> > -+  stash_msg = (char*) strbuf_join_argv(_msg_buf, argc - 1,
> > -+

Re: [PATCH 17/19] submodule: use submodule repos for object lookup

2018-10-16 Thread Stefan Beller
On Thu, Oct 11, 2018 at 3:41 PM Jonathan Tan  wrote:
>
> > +/*
> > + * Initialize 'out' based on the provided submodule path.
> > + *
> > + * Unlike repo_submodule_init, this tolerates submodules not present
> > + * in .gitmodules. NEEDSWORK: The repo_submodule_init behavior is
> > + * preferrable. This function exists only to preserve historical behavior.
>
> What do you mean by "The repo_submodule_init behavior is preferable"? If
> we need to preserve historical behavior, then it seems that the most
> preferable one is something that meets our needs (like open_submodule()
> in this patch).
>
> > +static int open_submodule(struct repository *out, const char *path)
> > +{
> > + struct strbuf sb = STRBUF_INIT;
> > +
> > + if (submodule_to_gitdir(, path) || repo_init(out, sb.buf, NULL)) {
> > + strbuf_release();
> > + return -1;
> > + }
> > +
> > + out->submodule_prefix = xstrdup(path);
>
> Do we need to set submodule_prefix?
>
> > @@ -507,7 +533,7 @@ static void show_submodule_header(struct diff_options 
> > *o, const char *path,
> >   else if (is_null_oid(two))
> >   message = "(submodule deleted)";
> >
> > - if (add_submodule_odb(path)) {
> > + if (open_submodule(sub, path) < 0) {
>
> This function, as a side effect, writes the open repository to "sub" for
> its caller to use. I think it's better if its callers open "sub" and
> then pass it to show_submodule_header() if successful. If opening the
> submodule is expected to fail sometimes (like it seems here), then we
> can allow callers to also pass NULL, and document what happens when NULL
> is passed.

Thanks for the review of the whole series!

I have redone this series, addressing all your comments. I addressed
this comment differently than suggested, and put the submodule
repository argument at the end of the parameter list, such that it
goes with all the other arguments to be filled in.

I was about to resend the series, but test-merged with the other
submodule series in flight (origin/sb/submodule-recursive-fetch-gets-the-tip)
which had some conflicts that I can easily resolve by rebasing on top.

It conflicts a lot when merging to next, due to the latest patch
("Apply semantic patches from previous patches"), so I am not sure
how to proceed properly. Maybe we'd omit that patch and
run 'make coccicheck' on next to apply the semantic patches
there instead.


Re: [PATCH v2 06/13] Add a build definition for Azure DevOps

2018-10-16 Thread SZEDER Gábor
On Mon, Oct 15, 2018 at 03:12:06AM -0700, Johannes Schindelin via GitGitGadget 
wrote:
> diff --git a/azure-pipelines.yml b/azure-pipelines.yml
> new file mode 100644
> index 00..b5749121d2
> --- /dev/null
> +++ b/azure-pipelines.yml
> @@ -0,0 +1,319 @@
> +resources:
> +- repo: self
> +  fetchDepth: 1
> +
> +phases:
> +- phase: linux_clang
> +  displayName: linux-clang
> +  condition: succeeded()
> +  queue:
> +name: Hosted Ubuntu 1604
> +  steps:
> +  - bash: |
> +   test "$GITFILESHAREPWD" = '$(gitfileshare.pwd)' || 
> ci/mount-fileshare.sh //gitfileshare.file.core.windows.net/test-cache 
> gitfileshare "$GITFILESHAREPWD" "$HOME/test-cache" || exit 1
> +
> +   sudo apt-get update &&
> +   sudo apt-get -y install git gcc make libssl-dev libcurl4-openssl-dev 
> libexpat-dev tcl tk gettext git-email zlib1g-dev apache2-bin &&
> +
> +   export CC=clang || exit 1
> +
> +   ci/install-dependencies.sh

I think you would want to 'exit 1' when this script fails.
This applies to other build jobs (erm, phases?) below as well.

> +   ci/run-build-and-tests.sh || {
> +   ci/print-test-failures.sh
> +   exit 1
> +   }
> +
> +   test "$GITFILESHAREPWD" = '$(gitfileshare.pwd)' || sudo umount 
> "$HOME/test-cache" || exit 1
> +displayName: 'ci/run-build-and-tests.sh'
> +env:
> +  GITFILESHAREPWD: $(gitfileshare.pwd)
> +  - task: PublishTestResults@2
> +displayName: 'Publish Test Results **/TEST-*.xml'
> +inputs:
> +  mergeTestResults: true
> +  testRunTitle: 'linux-clang'
> +  platform: Linux
> +  publishRunAttachments: false
> +condition: succeededOrFailed()
> +
> +- phase: linux_gcc
> +  displayName: linux-gcc
> +  condition: succeeded()
> +  queue:
> +name: Hosted Ubuntu 1604
> +  steps:
> +  - bash: |
> +   test "$GITFILESHAREPWD" = '$(gitfileshare.pwd)' || 
> ci/mount-fileshare.sh //gitfileshare.file.core.windows.net/test-cache 
> gitfileshare "$GITFILESHAREPWD" "$HOME/test-cache" || exit 1
> +
> +   sudo apt-get update &&
> +   sudo apt-get -y install git gcc make libssl-dev libcurl4-openssl-dev 
> libexpat-dev tcl tk gettext git-email zlib1g-dev apache2-bin || exit 1
> +

On Travis CI the Linux GCC build job uses gcc-8 instead of whatever
the default is in that old-ish Ubuntu LTS; see 37fa4b3c78 (travis-ci:
run gcc-8 on linux-gcc jobs, 2018-05-19).

> +   ci/install-dependencies.sh
> +   ci/run-build-and-tests.sh || {
> +   ci/print-test-failures.sh
> +   exit 1
> +   }
> +
> +   test "$GITFILESHAREPWD" = '$(gitfileshare.pwd)' || sudo umount 
> "$HOME/test-cache" || exit 1
> +displayName: 'ci/run-build-and-tests.sh'
> +env:
> +  GITFILESHAREPWD: $(gitfileshare.pwd)
> +  - task: PublishTestResults@2
> +displayName: 'Publish Test Results **/TEST-*.xml'
> +inputs:
> +  mergeTestResults: true
> +  testRunTitle: 'linux-gcc'
> +  platform: Linux
> +  publishRunAttachments: false
> +condition: succeededOrFailed()
> +
> +- phase: osx_clang
> +  displayName: osx-clang
> +  condition: succeeded()
> +  queue:
> +name: Hosted macOS
> +  steps:
> +  - bash: |
> +   test "$GITFILESHAREPWD" = '$(gitfileshare.pwd)' || 
> ci/mount-fileshare.sh //gitfileshare.file.core.windows.net/test-cache 
> gitfileshare "$GITFILESHAREPWD" "$HOME/test-cache" || exit 1
> +
> +   export CC=clang
> +
> +   ci/install-dependencies.sh
> +   ci/run-build-and-tests.sh || {
> +   ci/print-test-failures.sh
> +   exit 1
> +   }
> +
> +   test "$GITFILESHAREPWD" = '$(gitfileshare.pwd)' || umount 
> "$HOME/test-cache" || exit 1
> +displayName: 'ci/run-build-and-tests.sh'
> +env:
> +  GITFILESHAREPWD: $(gitfileshare.pwd)
> +  - task: PublishTestResults@2
> +displayName: 'Publish Test Results **/TEST-*.xml'
> +inputs:
> +  mergeTestResults: true
> +  testRunTitle: 'osx-clang'
> +  platform: macOS
> +  publishRunAttachments: false
> +condition: succeededOrFailed()
> +
> +- phase: osx_gcc
> +  displayName: osx-gcc
> +  condition: succeeded()
> +  queue:
> +name: Hosted macOS
> +  steps:
> +  - bash: |
> +   test "$GITFILESHAREPWD" = '$(gitfileshare.pwd)' || 
> ci/mount-fileshare.sh //gitfileshare.file.core.windows.net/test-cache 
> gitfileshare "$GITFILESHAREPWD" "$HOME/test-cache" || exit 1
> +

Here you should 'export CC=gcc', because on macOS 'cc' is 'clang' by
default.

Note, however, that setting 'CC' in the environment alone has no
effect on the build process, it will still use 'cc'.  Keep an eye on
where this thread will lead to:

  https://public-inbox.org/git/20181016184537.gn19...@szeder.dev/T/#u

> +   ci/install-dependencies.sh
> +   ci/run-build-and-tests.sh || {
> +   ci/print-test-failures.sh
> +   exit 1
> +   }
> +
> +   test "$GITFILESHAREPWD" = '$(gitfileshare.pwd)' || umount 
> "$HOME/test-cache" || exit 1
> +displayName: 

receive.denyCurrentBranch=updateInstead updates working tree even when receive.denyNonFastForwards rejects push

2018-10-16 Thread Rajesh Madamanchi
Hi, I am looking to report the below behavior when seems incorrect to
me when receive.denyCurrentBranch is set to updateInstead and
receive.denyNonFastForwards is set to true. Below are the steps to
reproduce the scenario. Please excuse my ignorance if I'm missing
something fundamental.

Step 1 - Setup remote repository (remote host):
git config --global receive.denyCurrentBranch updateInstead
git config --global receive.denyNonFastForwards true
mkdir /tmp/hello
cd /tmp/hello
git init
echo hello > hello.txt
git add . && git commit -m "hello.txt"

Step 2 - Create 2 Clones (local host):
git clone ssh://REMOTEIP/tmp/hello /tmp/hello1
git clone ssh://REMOTEIP/tmp/hello /tmp/hello2

Step 3 - Push a commit from Clone 1
cd /tmp/hello1
echo hello1 > hello1.txt
git add . && git commit -m "hello1.txt"
git push
--> at this point server working tree contains hello1.txt which is expected

Step 4: Try to force push a commit from Clone 2
cd /tmp/hello2
echo hello2 > hello2.txt
git add . && git commit -m "hello2.txt"
git push
--> Remote rejects with message that remote contains work i do not have locally
git push --force
--> Remote rejects again with error: denying non-fast-forward
refs/heads/master (you should pull first)
--> At this point, since the push is rejected, I expect the servers
working tree to not contain any rejected changes. BUT the servers
working tree got updated to delete hello1.txt and create hello2.txt.
Push rejected but not really.

I also noticed the same behavior (incorrect) when the update hook
rejects changes on the server (but not the pre-receive hook).

Thank you


On overriding make variables from the environment...

2018-10-16 Thread SZEDER Gábor
Our Makefile has lines like these:

  CFLAGS = -g -O2 -Wall
  CC = cc
  AR = ar
  SPATCH = spatch

Note the use of '=', not '?='.  This means that these variables can be
overridden from the command line, i.e. 'make CC=gcc-X.Y' will build
with that particular GCC version, but not from the environment, i.e.
'CC=gcc-X.Y make' will still build with 'cc'.

This can be confusing for developers who come from other projects
where they used to run 'CC=whatever make'.

And our build jobs on Travis CI are badly affected by this.  We have
dedicated build jobs to build Git with GCC and Clang both on Linux and
OSX from the very beginning (522354d70f (Add Travis CI support,
2015-11-27)).  But guess how Travis CI specifies which compiler to
use!  With 'export CC=gcc' and 'export CC=clang', respectively.
Consequently, our Clang Linux build job has always used gcc, because
that's where 'cc' points at on Linux by default, while the GCC OSX
build job has always used Clang.  Oh, well.  Furthermore, see
37fa4b3c78 (travis-ci: run gcc-8 on linux-gcc jobs, 2018-05-19), where
Duy added an 'export CC=gcc-8' in an attempt to use a more modern
compiler, but this had no effect either.

I'm not sure what to do.  I'm fine with updating our 'ci/' scripts to
explicitly respect CC in the environment (either by running 'make
CC=$CC' or by writing $CC into 'config.mak').  Or I could update our
Makefile to use '?=' for specific variables, but:

  - I'm afraid to break somebody's setup relying on the current
behavior and CC having different values in the environment and in
'config.mak'.

  - Where to stop, IOW which variables should be set with '?='?
CFLAGS, LDFLAGS, CC, AR, ..., SPATCH, SPATCH_FLAGS?  Dunno.  We
already have 'STRIP ?= strip' and there are variables that are
checked explicitly (e.g. 'DEVELOPER=y make' works).

Note also that prior to b05701c5b4 (Make CFLAGS overridable from
make command line., 2005-08-06) our Makefile used 'CC?=gcc' as
well.



[PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree

2018-10-16 Thread Stefan Beller
This patch started as a refactoring to make 'get_next_submodule' more
readable, but upon doing so, I realized that "git fetch" of the submodule
actually doesn't need to be run in the submodules worktree. So let's run
it in its git dir instead.

That should pave the way towards fetching submodules that are currently
not checked out.

This patch leaks the cp->dir in get_next_submodule, as any further
callback in run_processes_parallel doesn't have access to the child
process any more. In an early iteration of this patch, the function
get_submodule_repo_for directly returned the string containing the
git directory, which would be a better design choice for this patch.

However the next patch both fixes the memory leak of cp->dir and also has
a use case for using the full repository handle of the submodule, so
it makes sense to introduce the get_submodule_repo_for here already.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 submodule.c | 51 +++--
 t/t5526-fetch-submodules.sh |  7 -
 2 files changed, 44 insertions(+), 14 deletions(-)

diff --git a/submodule.c b/submodule.c
index cbefe5f54d..30c06507e3 100644
--- a/submodule.c
+++ b/submodule.c
@@ -495,6 +495,12 @@ void prepare_submodule_repo_env(struct argv_array *out)
 DEFAULT_GIT_DIR_ENVIRONMENT);
 }
 
+static void prepare_submodule_repo_env_in_gitdir(struct argv_array *out)
+{
+   prepare_submodule_repo_env_no_git_dir(out);
+   argv_array_pushf(out, "%s=.", GIT_DIR_ENVIRONMENT);
+}
+
 /* Helper function to display the submodule header line prior to the full
  * summary output. If it can locate the submodule objects directory it will
  * attempt to lookup both the left and right commits and put them into the
@@ -1241,6 +1247,29 @@ static int get_fetch_recurse_config(const struct 
submodule *submodule,
return spf->default_option;
 }
 
+static struct repository *get_submodule_repo_for(struct repository *r,
+const struct submodule *sub)
+{
+   struct repository *ret = xmalloc(sizeof(*ret));
+
+   if (repo_submodule_init(ret, r, sub)) {
+   /*
+* No entry in .gitmodules? Technically not a submodule,
+* but historically we supported repositories that happen to be
+* in-place where a gitlink is. Keep supporting them.
+*/
+   struct strbuf gitdir = STRBUF_INIT;
+   strbuf_repo_worktree_path(, r, "%s/.git", sub->path);
+   if (repo_init(ret, gitdir.buf, NULL)) {
+   strbuf_release();
+   return NULL;
+   }
+   strbuf_release();
+   }
+
+   return ret;
+}
+
 static int get_next_submodule(struct child_process *cp,
  struct strbuf *err, void *data, void **task_cb)
 {
@@ -1248,12 +1277,11 @@ static int get_next_submodule(struct child_process *cp,
struct submodule_parallel_fetch *spf = data;
 
for (; spf->count < spf->r->index->cache_nr; spf->count++) {
-   struct strbuf submodule_path = STRBUF_INIT;
-   struct strbuf submodule_git_dir = STRBUF_INIT;
struct strbuf submodule_prefix = STRBUF_INIT;
const struct cache_entry *ce = spf->r->index->cache[spf->count];
-   const char *git_dir, *default_argv;
+   const char *default_argv;
const struct submodule *submodule;
+   struct repository *repo;
struct submodule default_submodule = SUBMODULE_INIT;
 
if (!S_ISGITLINK(ce->ce_mode))
@@ -1288,16 +1316,12 @@ static int get_next_submodule(struct child_process *cp,
continue;
}
 
-   strbuf_repo_worktree_path(_path, spf->r, "%s", 
ce->name);
-   strbuf_addf(_git_dir, "%s/.git", submodule_path.buf);
strbuf_addf(_prefix, "%s%s/", spf->prefix, ce->name);
-   git_dir = read_gitfile(submodule_git_dir.buf);
-   if (!git_dir)
-   git_dir = submodule_git_dir.buf;
-   if (is_directory(git_dir)) {
+   repo = get_submodule_repo_for(spf->r, submodule);
+   if (repo) {
child_process_init(cp);
-   cp->dir = strbuf_detach(_path, NULL);
-   prepare_submodule_repo_env(>env_array);
+   prepare_submodule_repo_env_in_gitdir(>env_array);
+   cp->dir = xstrdup(repo->gitdir);
cp->git_cmd = 1;
if (!spf->quiet)
strbuf_addf(err, "Fetching submodule %s%s\n",
@@ -1307,10 +1331,11 @@ static int get_next_submodule(struct child_process *cp,
argv_array_push(>args, default_argv);
argv_array_push(>args, 

[PATCH 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip

2018-10-16 Thread Stefan Beller
This is based on ao/submodule-wo-gitmodules-checked-out.

This resends origin/sb/submodule-recursive-fetch-gets-the-tip, resolving
the issues pointed out via 
origin/xxx/sb-submodule-recursive-fetch-gets-the-tip-in-pu
by basing this series on origin/ao/submodule-wo-gitmodules-checked-out

A range-diff below shows how picking a different base changed the patches,
apart from that no further adjustments have been made.

Thanks,
Stefan

Stefan Beller (9):
  sha1-array: provide oid_array_filter
  submodule.c: fix indentation
  submodule.c: sort changed_submodule_names before searching it
  submodule.c: move global changed_submodule_names into fetch submodule
struct
  submodule.c: do not copy around submodule list
  repository: repo_submodule_init to take a submodule struct
  submodule: fetch in submodules git directory instead of in worktree
  fetch: retry fetching submodules if needed objects were not fetched
  builtin/fetch: check for submodule updates for non branch fetches

 Documentation/technical/api-oid-array.txt|   5 +
 builtin/fetch.c  |  14 +-
 builtin/grep.c   |  17 +-
 builtin/ls-files.c   |  12 +-
 repository.c |  27 +-
 repository.h |  11 +-
 sha1-array.c |  17 ++
 sha1-array.h |   3 +
 submodule.c  | 275 +++
 t/helper/test-submodule-nested-repo-config.c |   8 +-
 t/t5526-fetch-submodules.sh  |  23 +-
 11 files changed, 315 insertions(+), 97 deletions(-)

git range-diff origin/xxx/sb-submodule-recursive-fetch-gets-the-tip-in-pu...
[...]
585:  ac1f98a0df <   -:  -- doc: move git-rev-parse from porcelain to 
plumbing
586:  7cf1a0fbef =   1:  a035323c49 sha1-array: provide oid_array_filter
587:  01077381d0 =   2:  30ed20b4f0 submodule.c: fix indentation
588:  4b0cdf5899 =   3:  cd590ea88d submodule.c: sort changed_submodule_names 
before searching it
589:  78e5099ecc !   4:  ce959811ba submodule.c: move global 
changed_submodule_names into fetch submodule struct
@@ -12,7 +12,7 @@
 --- a/submodule.c
 +++ b/submodule.c
 @@
- #include "commit-reach.h"
+ #include "object-store.h"
  
  static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF;
 -static struct string_list changed_submodule_names = STRING_LIST_INIT_DUP;
590:  d813f18bb3 =   5:  151f9a8ad4 submodule.c: do not copy around submodule 
list
591:  a077d63af7 !   6:  3a97743fa2 repository: repo_submodule_init to take a 
submodule struct
@@ -15,7 +15,6 @@
 Also move its documentation into the header file.
 
 Signed-off-by: Stefan Beller 
-Signed-off-by: Junio C Hamano 
 
 diff --git a/builtin/grep.c b/builtin/grep.c
 --- a/builtin/grep.c
@@ -31,12 +30,16 @@
 +
int hit;
  
-   if (!is_submodule_active(superproject, path))
+   /*
+@@
return 0;
+   }
  
--  if (repo_submodule_init(, superproject, path))
-+  if (repo_submodule_init(, superproject, sub))
+-  if (repo_submodule_init(, superproject, path)) {
++  if (repo_submodule_init(, superproject, sub)) {
+   grep_read_unlock();
return 0;
+   }
  
 -  repo_read_gitmodules();
 +  repo_read_gitmodules();
@@ -44,9 +47,9 @@
/*
 * NEEDSWORK: This adds the submodule's object directory to the list of
 @@
+* store is no longer global and instead is a member of the repository
 * object.
 */
-   grep_read_lock();
 -  add_to_alternates_memory(submodule.objects->objectdir);
 +  add_to_alternates_memory(subrepo.objects->objectdir);
grep_read_unlock();
@@ -100,19 +103,6 @@
  
  static void show_ce(struct repository *repo, struct dir_struct *dir,
 
-diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
 a/builtin/submodule--helper.c
-+++ b/builtin/submodule--helper.c
-@@
-   if (!sub)
-   BUG("We could get the submodule handle before?");
- 
--  if (repo_submodule_init(, the_repository, path))
-+  if (repo_submodule_init(, the_repository, sub))
-   die(_("could not get a repository handle for submodule '%s'"), 
path);
- 
-   if (!repo_config_get_string(, "core.worktree", )) {
-
 diff --git a/repository.c b/repository.c
 --- a/repository.c
 +++ b/repository.c
@@ -197,3 +187,32 @@
  void repo_clear(struct repository *repo);
  
  /*
+
+diff --git a/t/helper/test-submodule-nested-repo-config.c 
b/t/helper/test-submodule-nested-repo-config.c
+--- a/t/helper/test-submodule-nested-repo-config.c
 b/t/helper/test-submodule-nested-repo-config.c
+@@
+ 
+ int cmd__submodule_nested_repo_config(int argc, const char 

[PATCH 2/9] submodule.c: fix indentation

2018-10-16 Thread Stefan Beller
The submodule subsystem is really bad at staying within 80 characters.
Fix it while we are here.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 submodule.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/submodule.c b/submodule.c
index 2b7082b2db..e145ebbb16 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1258,7 +1258,8 @@ static int get_next_submodule(struct child_process *cp,
if (!submodule) {
const char *name = default_name_or_path(ce->name);
if (name) {
-   default_submodule.path = default_submodule.name 
= name;
+   default_submodule.path = name;
+   default_submodule.name = name;
submodule = _submodule;
}
}
@@ -1268,8 +1269,10 @@ static int get_next_submodule(struct child_process *cp,
default:
case RECURSE_SUBMODULES_DEFAULT:
case RECURSE_SUBMODULES_ON_DEMAND:
-   if (!submodule || 
!unsorted_string_list_lookup(_submodule_names,
-submodule->name))
+   if (!submodule ||
+   !unsorted_string_list_lookup(
+   _submodule_names,
+   submodule->name))
continue;
default_argv = "on-demand";
break;
-- 
2.19.0



[PATCH 3/9] submodule.c: sort changed_submodule_names before searching it

2018-10-16 Thread Stefan Beller
We can string_list_insert() to maintain sorted-ness of the
list as we find new items, or we can string_list_append() to
build an unsorted list and sort it at the end just once.

As we do not rely on the sortedness while building the
list, we pick the "append and sort at the end" as it
has better worst case execution times.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 submodule.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/submodule.c b/submodule.c
index e145ebbb16..9fbfcfcfe1 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1270,7 +1270,7 @@ static int get_next_submodule(struct child_process *cp,
case RECURSE_SUBMODULES_DEFAULT:
case RECURSE_SUBMODULES_ON_DEMAND:
if (!submodule ||
-   !unsorted_string_list_lookup(
+   !string_list_lookup(
_submodule_names,
submodule->name))
continue;
@@ -1364,6 +1364,7 @@ int fetch_populated_submodules(struct repository *r,
/* default value, "--submodule-prefix" and its value are added later */
 
calculate_changed_submodule_paths();
+   string_list_sort(_submodule_names);
run_processes_parallel(max_parallel_jobs,
   get_next_submodule,
   fetch_start_failure,
-- 
2.19.0



[PATCH 9/9] builtin/fetch: check for submodule updates for non branch fetches

2018-10-16 Thread Stefan Beller
Gerrit, the code review tool, has a different workflow than our mailing
list based approach. Usually users upload changes to a Gerrit server and
continuous integration and testing happens by bots. Sometimes however a
user wants to checkout a change locally and look at it locally. For this
use case, Gerrit offers a command line snippet to copy and paste to your
terminal, which looks like

  git fetch https:///gerrit refs/changes/ &&
  git checkout FETCH_HEAD

For Gerrit changes that contain changing submodule gitlinks, it would be
easy to extend both the fetch and checkout with the '--recurse-submodules'
flag, such that this command line snippet would produce the state of a
change locally.

However the functionality added in the previous patch, which would
ensure that we fetch the objects in the submodule that the gitlink pointed
at, only works for remote tracking branches so far, not for FETCH_HEAD.

Make sure that fetching a superproject to its FETCH_HEAD, also respects
the existence checks for objects in the submodule recursion.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 builtin/fetch.c | 5 -
 t/t5526-fetch-submodules.sh | 2 +-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 95c44bf6ff..ea6ecd123e 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -887,11 +887,14 @@ static int store_updated_refs(const char *raw_url, const 
char *remote_name,
rc |= update_local_ref(ref, what, rm, ,
   summary_width);
free(ref);
-   } else
+   } else {
+   check_for_new_submodule_commits(>old_oid);
format_display(, '*',
   *kind ? kind : "branch", NULL,
   *what ? what : "HEAD",
   "FETCH_HEAD", summary_width);
+   }
+
if (note.len) {
if (verbosity >= 0 && !shown_url) {
fprintf(stderr, _("From %.*s\n"),
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index af12c50e7d..a509eabb04 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -615,7 +615,7 @@ test_expect_success "fetch new commits on-demand when they 
are not reachable" '
git update-ref refs/changes/2 $D &&
(
cd downstream &&
-   git fetch --recurse-submodules --recurse-submodules-default 
on-demand origin refs/changes/2:refs/heads/my_branch &&
+   git fetch --recurse-submodules origin refs/changes/2 &&
git -C submodule cat-file -t $C &&
git checkout --recurse-submodules FETCH_HEAD
)
-- 
2.19.0



[PATCH 8/9] fetch: retry fetching submodules if needed objects were not fetched

2018-10-16 Thread Stefan Beller
Currently when git-fetch is asked to recurse into submodules, it dispatches
a plain "git-fetch -C " (with some submodule related options
such as prefix and recusing strategy, but) without any information of the
remote or the tip that should be fetched.

This works surprisingly well in some workflows (such as using submodules
as a third party library), while not so well in other scenarios, such
as in a Gerrit topic-based workflow, that can tie together changes
(potentially across repositories) on the server side. One of the parts
of such a Gerrit workflow is to download a change when wanting to examine
it, and you'd want to have its submodule changes that are in the same
topic downloaded as well. However these submodule changes reside in their
own repository in their own ref (refs/changes/).

Retry fetching a submodule by object name if the object id that the
superproject points to, cannot be found.

This retrying does not happen when the "git fetch" done at the
superproject is not storing the fetched results in remote
tracking branches (i.e. instead just recording them to
FETCH_HEAD) in this step. A later patch will fix this.

builtin/fetch used to only inspect submodules when they were fetched
"on-demand", as in either on/off case it was clear whether the submodule
needs to be fetched. However to know whether we need to try fetching the
object ids, we need to identify the object names, which is done in this
function check_for_new_submodule_commits(), so we'll also run that code
in case the submodule recursion is set to "on".

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 builtin/fetch.c |   9 +-
 submodule.c | 185 ++--
 t/t5526-fetch-submodules.sh |  16 
 3 files changed, 177 insertions(+), 33 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 61bec5d213..95c44bf6ff 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -700,8 +700,7 @@ static int update_local_ref(struct ref *ref,
what = _("[new ref]");
}
 
-   if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
-   (recurse_submodules != RECURSE_SUBMODULES_ON))
+   if (recurse_submodules != RECURSE_SUBMODULES_OFF)
check_for_new_submodule_commits(>new_oid);
r = s_update_ref(msg, ref, 0);
format_display(display, r ? '!' : '*', what,
@@ -716,8 +715,7 @@ static int update_local_ref(struct ref *ref,
strbuf_add_unique_abbrev(, >object.oid, 
DEFAULT_ABBREV);
strbuf_addstr(, "..");
strbuf_add_unique_abbrev(, >new_oid, 
DEFAULT_ABBREV);
-   if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
-   (recurse_submodules != RECURSE_SUBMODULES_ON))
+   if (recurse_submodules != RECURSE_SUBMODULES_OFF)
check_for_new_submodule_commits(>new_oid);
r = s_update_ref("fast-forward", ref, 1);
format_display(display, r ? '!' : ' ', quickref.buf,
@@ -731,8 +729,7 @@ static int update_local_ref(struct ref *ref,
strbuf_add_unique_abbrev(, >object.oid, 
DEFAULT_ABBREV);
strbuf_addstr(, "...");
strbuf_add_unique_abbrev(, >new_oid, 
DEFAULT_ABBREV);
-   if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
-   (recurse_submodules != RECURSE_SUBMODULES_ON))
+   if (recurse_submodules != RECURSE_SUBMODULES_OFF)
check_for_new_submodule_commits(>new_oid);
r = s_update_ref("forced-update", ref, 1);
format_display(display, r ? '!' : '+', quickref.buf,
diff --git a/submodule.c b/submodule.c
index 30c06507e3..7246b776f3 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1141,8 +1141,12 @@ struct submodule_parallel_fetch {
int result;
 
struct string_list changed_submodule_names;
+   struct get_next_submodule_task **retry;
+   int retry_nr, retry_alloc;
 };
-#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, 
STRING_LIST_INIT_DUP }
+#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, \
+ STRING_LIST_INIT_DUP, \
+ NULL, 0, 0}
 
 static void calculate_changed_submodule_paths(
struct submodule_parallel_fetch *spf)
@@ -1247,6 +1251,56 @@ static int get_fetch_recurse_config(const struct 
submodule *submodule,
return spf->default_option;
 }
 
+struct get_next_submodule_task {
+   struct repository *repo;
+   const struct submodule *sub;
+   unsigned free_sub : 1; /* Do we need to free the submodule? */
+   struct oid_array *commits;
+};
+
+static const struct submodule *get_default_submodule(const char *path)
+{
+   struct submodule *ret = NULL;
+   const char *name = default_name_or_path(path);
+
+   if (!name)
+   return NULL;
+
+   ret = 

[PATCH 6/9] repository: repo_submodule_init to take a submodule struct

2018-10-16 Thread Stefan Beller
When constructing a struct repository for a submodule for some revision
of the superproject where the submodule is not contained in the index,
it may not be present in the working tree currently either. In that
siutation giving a 'path' argument is not useful. Upgrade the
repo_submodule_init function to take a struct submodule instead.

While we are at it, overhaul the repo_submodule_init function by renaming
the submodule repository struct, which is to be initialized, to a name
that is not confused with the struct submodule as easily.

Also move its documentation into the header file.

Signed-off-by: Stefan Beller 
---
 builtin/grep.c   | 17 +++-
 builtin/ls-files.c   | 12 +
 repository.c | 27 
 repository.h | 11 ++--
 t/helper/test-submodule-nested-repo-config.c |  8 +++---
 5 files changed, 41 insertions(+), 34 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 7da8fef31a..ba7634258a 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -418,7 +418,10 @@ static int grep_submodule(struct grep_opt *opt, struct 
repository *superproject,
  const struct object_id *oid,
  const char *filename, const char *path)
 {
-   struct repository submodule;
+   struct repository subrepo;
+   const struct submodule *sub = submodule_from_path(superproject,
+ _oid, path);
+
int hit;
 
/*
@@ -434,12 +437,12 @@ static int grep_submodule(struct grep_opt *opt, struct 
repository *superproject,
return 0;
}
 
-   if (repo_submodule_init(, superproject, path)) {
+   if (repo_submodule_init(, superproject, sub)) {
grep_read_unlock();
return 0;
}
 
-   repo_read_gitmodules();
+   repo_read_gitmodules();
 
/*
 * NEEDSWORK: This adds the submodule's object directory to the list of
@@ -451,7 +454,7 @@ static int grep_submodule(struct grep_opt *opt, struct 
repository *superproject,
 * store is no longer global and instead is a member of the repository
 * object.
 */
-   add_to_alternates_memory(submodule.objects->objectdir);
+   add_to_alternates_memory(subrepo.objects->objectdir);
grep_read_unlock();
 
if (oid) {
@@ -476,14 +479,14 @@ static int grep_submodule(struct grep_opt *opt, struct 
repository *superproject,
 
init_tree_desc(, data, size);
hit = grep_tree(opt, pathspec, , , base.len,
-   object->type == OBJ_COMMIT, );
+   object->type == OBJ_COMMIT, );
strbuf_release();
free(data);
} else {
-   hit = grep_cache(opt, , pathspec, 1);
+   hit = grep_cache(opt, , pathspec, 1);
}
 
-   repo_clear();
+   repo_clear();
return hit;
 }
 
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 7f9919a362..4d1649c1b3 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -206,17 +206,19 @@ static void show_files(struct repository *repo, struct 
dir_struct *dir);
 static void show_submodule(struct repository *superproject,
   struct dir_struct *dir, const char *path)
 {
-   struct repository submodule;
+   struct repository subrepo;
+   const struct submodule *sub = submodule_from_path(superproject,
+ _oid, path);
 
-   if (repo_submodule_init(, superproject, path))
+   if (repo_submodule_init(, superproject, sub))
return;
 
-   if (repo_read_index() < 0)
+   if (repo_read_index() < 0)
die("index file corrupt");
 
-   show_files(, dir);
+   show_files(, dir);
 
-   repo_clear();
+   repo_clear();
 }
 
 static void show_ce(struct repository *repo, struct dir_struct *dir,
diff --git a/repository.c b/repository.c
index 5dd1486718..aabe64ee5d 100644
--- a/repository.c
+++ b/repository.c
@@ -166,30 +166,23 @@ int repo_init(struct repository *repo,
return -1;
 }
 
-/*
- * Initialize 'submodule' as the submodule given by 'path' in parent repository
- * 'superproject'.
- * Return 0 upon success and a non-zero value upon failure.
- */
-int repo_submodule_init(struct repository *submodule,
+int repo_submodule_init(struct repository *subrepo,
struct repository *superproject,
-   const char *path)
+   const struct submodule *sub)
 {
-   const struct submodule *sub;
struct strbuf gitdir = STRBUF_INIT;
struct strbuf worktree = STRBUF_INIT;
int ret = 0;
 
-   sub = submodule_from_path(superproject, _oid, path);
if (!sub) {
ret = -1;
goto 

[PATCH 1/9] sha1-array: provide oid_array_filter

2018-10-16 Thread Stefan Beller
Helped-by: Junio C Hamano 
Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 Documentation/technical/api-oid-array.txt |  5 +
 sha1-array.c  | 17 +
 sha1-array.h  |  3 +++
 3 files changed, 25 insertions(+)

diff --git a/Documentation/technical/api-oid-array.txt 
b/Documentation/technical/api-oid-array.txt
index 9febfb1d52..c97428c2c3 100644
--- a/Documentation/technical/api-oid-array.txt
+++ b/Documentation/technical/api-oid-array.txt
@@ -48,6 +48,11 @@ Functions
is not sorted, this function has the side effect of sorting
it.
 
+`oid_array_filter`::
+   Apply the callback function `want` to each entry in the array,
+   retaining only the entries for which the function returns true.
+   Preserve the order of the entries that are retained.
+
 Examples
 
 
diff --git a/sha1-array.c b/sha1-array.c
index 265941fbf4..d505a004bb 100644
--- a/sha1-array.c
+++ b/sha1-array.c
@@ -77,3 +77,20 @@ int oid_array_for_each_unique(struct oid_array *array,
}
return 0;
 }
+
+void oid_array_filter(struct oid_array *array,
+ for_each_oid_fn want,
+ void *cb_data)
+{
+   unsigned nr = array->nr, src, dst;
+   struct object_id *oids = array->oid;
+
+   for (src = dst = 0; src < nr; src++) {
+   if (want([src], cb_data)) {
+   if (src != dst)
+   oidcpy([dst], [src]);
+   dst++;
+   }
+   }
+   array->nr = dst;
+}
diff --git a/sha1-array.h b/sha1-array.h
index 232bf95017..55d016c4bf 100644
--- a/sha1-array.h
+++ b/sha1-array.h
@@ -22,5 +22,8 @@ int oid_array_for_each(struct oid_array *array,
 int oid_array_for_each_unique(struct oid_array *array,
  for_each_oid_fn fn,
  void *data);
+void oid_array_filter(struct oid_array *array,
+ for_each_oid_fn want,
+ void *cbdata);
 
 #endif /* SHA1_ARRAY_H */
-- 
2.19.0



[PATCH 4/9] submodule.c: move global changed_submodule_names into fetch submodule struct

2018-10-16 Thread Stefan Beller
The `changed_submodule_names` are only used for fetching, so let's make it
part of the struct that is passed around for fetching submodules.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 submodule.c | 42 +++---
 1 file changed, 23 insertions(+), 19 deletions(-)

diff --git a/submodule.c b/submodule.c
index 9fbfcfcfe1..6b4cee82bf 100644
--- a/submodule.c
+++ b/submodule.c
@@ -24,7 +24,7 @@
 #include "object-store.h"
 
 static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF;
-static struct string_list changed_submodule_names = STRING_LIST_INIT_DUP;
+
 static int initialized_fetch_ref_tips;
 static struct oid_array ref_tips_before_fetch;
 static struct oid_array ref_tips_after_fetch;
@@ -1124,7 +1124,22 @@ void check_for_new_submodule_commits(struct object_id 
*oid)
oid_array_append(_tips_after_fetch, oid);
 }
 
-static void calculate_changed_submodule_paths(void)
+struct submodule_parallel_fetch {
+   int count;
+   struct argv_array args;
+   struct repository *r;
+   const char *prefix;
+   int command_line_option;
+   int default_option;
+   int quiet;
+   int result;
+
+   struct string_list changed_submodule_names;
+};
+#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, 
STRING_LIST_INIT_DUP }
+
+static void calculate_changed_submodule_paths(
+   struct submodule_parallel_fetch *spf)
 {
struct argv_array argv = ARGV_ARRAY_INIT;
struct string_list changed_submodules = STRING_LIST_INIT_DUP;
@@ -1162,7 +1177,8 @@ static void calculate_changed_submodule_paths(void)
continue;
 
if (!submodule_has_commits(path, commits))
-   string_list_append(_submodule_names, 
name->string);
+   string_list_append(>changed_submodule_names,
+  name->string);
}
 
free_submodules_oids(_submodules);
@@ -1199,18 +1215,6 @@ int submodule_touches_in_range(struct object_id 
*excl_oid,
return ret;
 }
 
-struct submodule_parallel_fetch {
-   int count;
-   struct argv_array args;
-   struct repository *r;
-   const char *prefix;
-   int command_line_option;
-   int default_option;
-   int quiet;
-   int result;
-};
-#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0}
-
 static int get_fetch_recurse_config(const struct submodule *submodule,
struct submodule_parallel_fetch *spf)
 {
@@ -1271,7 +1275,7 @@ static int get_next_submodule(struct child_process *cp,
case RECURSE_SUBMODULES_ON_DEMAND:
if (!submodule ||
!string_list_lookup(
-   _submodule_names,
+   >changed_submodule_names,
submodule->name))
continue;
default_argv = "on-demand";
@@ -1363,8 +1367,8 @@ int fetch_populated_submodules(struct repository *r,
argv_array_push(, "--recurse-submodules-default");
/* default value, "--submodule-prefix" and its value are added later */
 
-   calculate_changed_submodule_paths();
-   string_list_sort(_submodule_names);
+   calculate_changed_submodule_paths();
+   string_list_sort(_submodule_names);
run_processes_parallel(max_parallel_jobs,
   get_next_submodule,
   fetch_start_failure,
@@ -1373,7 +1377,7 @@ int fetch_populated_submodules(struct repository *r,
 
argv_array_clear();
 out:
-   string_list_clear(_submodule_names, 1);
+   string_list_clear(_submodule_names, 1);
return spf.result;
 }
 
-- 
2.19.0



[PATCH 5/9] submodule.c: do not copy around submodule list

2018-10-16 Thread Stefan Beller
'calculate_changed_submodule_paths' uses a local list to compute the
changed submodules, and then produces the result by copying appropriate
items into the result list.

Instead use the result list directly and prune items afterwards
using string_list_remove_empty_items.

By doing so we'll have access to the util pointer for longer that
contains the commits that we need to fetch, which will be
useful in a later patch.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 submodule.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/submodule.c b/submodule.c
index 6b4cee82bf..cbefe5f54d 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1142,8 +1142,7 @@ static void calculate_changed_submodule_paths(
struct submodule_parallel_fetch *spf)
 {
struct argv_array argv = ARGV_ARRAY_INIT;
-   struct string_list changed_submodules = STRING_LIST_INIT_DUP;
-   const struct string_list_item *name;
+   struct string_list_item *name;
 
/* No need to check if there are no submodules configured */
if (!submodule_from_path(the_repository, NULL, NULL))
@@ -1160,9 +1159,9 @@ static void calculate_changed_submodule_paths(
 * Collect all submodules (whether checked out or not) for which new
 * commits have been recorded upstream in "changed_submodule_names".
 */
-   collect_changed_submodules(_submodules, );
+   collect_changed_submodules(>changed_submodule_names, );
 
-   for_each_string_list_item(name, _submodules) {
+   for_each_string_list_item(name, >changed_submodule_names) {
struct oid_array *commits = name->util;
const struct submodule *submodule;
const char *path = NULL;
@@ -1176,12 +1175,14 @@ static void calculate_changed_submodule_paths(
if (!path)
continue;
 
-   if (!submodule_has_commits(path, commits))
-   string_list_append(>changed_submodule_names,
-  name->string);
+   if (submodule_has_commits(path, commits)) {
+   oid_array_clear(commits);
+   *name->string = '\0';
+   }
}
 
-   free_submodules_oids(_submodules);
+   string_list_remove_empty_items(>changed_submodule_names, 1);
+
argv_array_clear();
oid_array_clear(_tips_before_fetch);
oid_array_clear(_tips_after_fetch);
@@ -1377,7 +1378,7 @@ int fetch_populated_submodules(struct repository *r,
 
argv_array_clear();
 out:
-   string_list_clear(_submodule_names, 1);
+   free_submodules_oids(_submodule_names);
return spf.result;
 }
 
-- 
2.19.0



Re: [PATCH 2/4] merge-recursive: increase marker length with depth of recursion

2018-10-16 Thread Elijah Newren
On Mon, Oct 15, 2018 at 7:17 PM Junio C Hamano  wrote:
>
> Elijah Newren  writes:
>
> > Would you like me to edit the commit message to include this more
> > difficult case?
>
> Neither.  If the "marker length" change is required in a separate
> series that will build on top of the current 4-patch series, I think
> dropping this step from the current series and make it a part of the
> series that deals with rename/rename would make more sense.

Okay, I'll resubmit this series without this patch or associated
testcase, and include those in the later file collision series.


Re: [PATCH v4 9/9] Documentation/config: add odb..promisorRemote

2018-10-16 Thread Jonathan Nieder
Hi Christian,

On Tue, Sep 25, 2018, Christian Couder wrote:

> In the cover letter there is a "Discussion" section which is about
> this, but I agree that it might not be very clear.
>
> The main issue that this patch series tries to solve is that
> extensions.partialclone config option limits the partial clone and
> promisor features to only one remote. One related issue is that it
> also prevents to have other kind of promisor/partial clone/odb
> remotes. By other kind I mean remotes that would not necessarily be
> git repos, but that could store objects (that's where ODB, for Object
> DataBase, comes from) and could provide those objects to Git through a
> helper (or driver) script or program.

Thanks for this explanation.  I took the opportunity to learn more
while you were in the bay area for the google summer of code mentor
summit and learned a little more, which was very helpful to me.

The broader picture is that this is meant to make Git natively handle
large blobs in a nicer way.  The design in this series has a few
components:

 1. Teaching partial clone to attempt to fetch missing objects from
multiple remotes instead of only one.  This is useful because you
can have a server that is nearby and cheaper to serve from (some
kind of local cache server) that you make requests to first before
falling back to the canonical source of objects.

 2. Simplifying the protocol for fetching missing objects so that it
can be satisfied by a lighter weight object storage system than
a full Git server.  The ODB helpers introduced in this series are
meant to speak such a simpler protocol since they are only used
for one-off requests of a collection of missing objects instead of
needing to understand refs, Git's negotiation, etc.

 3. (possibly, though not in this series) Making the criteria for what
objects can be missing more aggressive, so that I can "git add"
a large file and work with it using Git without even having a
second copy of that object in my local object store.

For (2), I would like to see us improve the remote helper
infrastructure instead of introducing a new ODB helper.  Remote
helpers are already permitted to fetch some objects without listing
refs --- perhaps we will want to

 i. split listing refs to a separate capability, so that a remote
helper can advertise that it doesn't support that.  (Alternatively
the remote could advertise that it has no refs.)

 ii. Use the "long-running process" mechanism to improve how Git
 communicates with a remote helper.

For (1), things get more tricky.  In an object store from a partial
clone today, we relax the ordinary "closure under reachability"
invariant but in a minor way.  We'll need to work out how this works
with multiple promisor remotes.

The idea today is that there are two kinds of packs: promisor packs
(from the promisor remote) and non-promisor packs.  Promisor packs are
allowed to have reachability edges (for example a tree->blob edge)
that point to a missing object, since the promisor remote has promised
that we will be able to access that object on demand.  Non-promisor
packs are also allowed to have reachability edges that point to a
missing object, as long as there is a reachability edge from an object
in a promisor pack to the same object (because of the same promise).
See "Handling Missing Objects" in Documentation/technical/partial-clone.txt
for more details.

To prevent older versions of Git from being confused by partial clone
repositories, they use the repositoryFormatVersion mechanism:

[core]
repositoryFormatVersion = 1
[extensions]
partialClone = ...

If we change the invariant, we will need to use a new extensions.* key
to ensure that versions of Git that are not aware of the new invariant
do not operate on the repository.

A promisor pack is indicated by there being a .promisor file next to
the usual .pack file.  Currently the .promisor file is empty.  The
previous idea was that once we want more metadata (e.g. for the sake of
multiple promisor remotes), we could write it in that file.  For
example, remotes could be associated to a  and the
.promisor file could indicate which  has promised to serve
requests for objects reachable from objects in this pack.

That will complicate the object access code as well, since currently
we only find who has promised an object during "git fsck" and similar
operations.  During everyday access we do not care which promisor
pack caused the object to be promised, since there is only one promisor
remote to fetch from anyway.

So much for the current setup.  For (1), I believe you are proposing to
still have only one effective , so it doesn't necessarily
require modifying the extensions.* configuration.  Instead, the idea is
that when trying to access an object, we would follow one of a list of
steps:

 1. First, check the local object store. If it's there, we're done.
 2. Second, try 

[PATCH] submodule helper: convert relative URL to absolute URL if needed

2018-10-16 Thread Stefan Beller
The submodule helper update_clone called by "git submodule update",
clones submodules if needed. As submodules used to have the URL indicating
if they were active, the step to resolve relative URLs was done in the
"submodule init" step. Nowadays submodules can be configured active without
calling an explicit init, e.g. via configuring submodule.active.

When trying to obtain submodules that are set active this way, we'll
fallback to the URL found in the .gitmodules, which may be relative to the
superproject, but we do not resolve it, yet:

git clone https://gerrit.googlesource.com/gerrit
cd gerrit && grep url .gitmodules
url = ../plugins/codemirror-editor
...
git config submodule.active .
git submodule update
fatal: repository '../plugins/codemirror-editor' does not exist
fatal: clone of '../plugins/codemirror-editor' into submodule path 
'/tmp/gerrit/plugins/codemirror-editor' failed
Failed to clone 'plugins/codemirror-editor'. Retry scheduled
[...]
fatal: clone of '../plugins/codemirror-editor' into submodule path 
'/tmp/gerrit/plugins/codemirror-editor' failed
Failed to clone 'plugins/codemirror-editor' a second time, aborting
[...]

To resolve the issue, factor out the function that resolves the relative
URLs in "git submodule init" (in the submodule helper in the init_submodule
function) and call it at the appropriate place in the update_clone helper.

Reported-by: Jaewoong Jung 
Signed-off-by: Stefan Beller 
---
 builtin/submodule--helper.c | 51 -
 t/t7400-submodule-basic.sh  | 24 +
 2 files changed, 58 insertions(+), 17 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index f6fb8991f3..13c2e4b556 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -584,6 +584,26 @@ static int module_foreach(int argc, const char **argv, 
const char *prefix)
return 0;
 }
 
+static char *compute_submodule_clone_url(const char *rel_url)
+{
+   char *remoteurl, *relurl;
+   char *remote = get_default_remote();
+   struct strbuf remotesb = STRBUF_INIT;
+
+   strbuf_addf(, "remote.%s.url", remote);
+   if (git_config_get_string(remotesb.buf, )) {
+   warning(_("could not look up configuration '%s'. Assuming this 
repository is its own authoritative upstream."), remotesb.buf);
+   remoteurl = xgetcwd();
+   }
+   relurl = relative_url(remoteurl, rel_url, NULL);
+
+   free(remote);
+   free(remoteurl);
+   strbuf_release();
+
+   return relurl;
+}
+
 struct init_cb {
const char *prefix;
unsigned int flags;
@@ -634,21 +654,9 @@ static void init_submodule(const char *path, const char 
*prefix,
/* Possibly a url relative to parent */
if (starts_with_dot_dot_slash(url) ||
starts_with_dot_slash(url)) {
-   char *remoteurl, *relurl;
-   char *remote = get_default_remote();
-   struct strbuf remotesb = STRBUF_INIT;
-   strbuf_addf(, "remote.%s.url", remote);
-   free(remote);
-
-   if (git_config_get_string(remotesb.buf, )) {
-   warning(_("could not lookup configuration '%s'. 
Assuming this repository is its own authoritative upstream."), remotesb.buf);
-   remoteurl = xgetcwd();
-   }
-   relurl = relative_url(remoteurl, url, NULL);
-   strbuf_release();
-   free(remoteurl);
-   free(url);
-   url = relurl;
+   char *oldurl = url;
+   url = compute_submodule_clone_url(oldurl);
+   free(oldurl);
}
 
if (git_config_set_gently(sb.buf, url))
@@ -1514,6 +1522,7 @@ static int prepare_to_clone_next_submodule(const struct 
cache_entry *ce,
struct strbuf sb = STRBUF_INIT;
const char *displaypath = NULL;
int needs_cloning = 0;
+   int need_free_url = 0;
 
if (ce_stage(ce)) {
if (suc->recursive_prefix)
@@ -1562,8 +1571,14 @@ static int prepare_to_clone_next_submodule(const struct 
cache_entry *ce,
 
strbuf_reset();
strbuf_addf(, "submodule.%s.url", sub->name);
-   if (repo_config_get_string_const(the_repository, sb.buf, ))
-   url = sub->url;
+   if (repo_config_get_string_const(the_repository, sb.buf, )) {
+   if (starts_with_dot_slash(sub->url) ||
+   starts_with_dot_dot_slash(sub->url)) {
+   url = compute_submodule_clone_url(sub->url);
+   need_free_url = 1;
+   } else
+   url = sub->url;
+   }
 
strbuf_reset();
strbuf_addf(, "%s/.git", 

Re: What's cooking in git.git (Oct 2018, #01; Wed, 10)

2018-10-16 Thread Stefan Beller
On Tue, Oct 16, 2018 at 6:39 AM Phillip Wood  wrote:
> > If you mean "--color-moved-ws=no" (or "--no-color-moved-ws") as a
> > way to countermand an earlier --color-moved-ws= on the
> > command line, I fully agree that it is a good idea.
>
> Oh I assumed --no-color-moved-ws was allowed but it isn't it. Allowing
> --color-moved-ws=no as well would match what is allowed for
> --color-moved. I'll try and look at that.

Thanks for taking a look!
Stefan


Re: [PATCH v2 13/13] commit-graph: specify OID version for SHA-256

2018-10-16 Thread Duy Nguyen
On Tue, Oct 16, 2018 at 6:01 PM Derrick Stolee  wrote:
>
> On 10/16/2018 11:35 AM, Duy Nguyen wrote:
> > On Mon, Oct 15, 2018 at 4:23 AM brian m. carlson
> >  wrote:
> >> Since the commit-graph code wants to serialize the hash algorithm into
> >> the data store, specify a version number for each supported algorithm.
> >> Note that we don't use the values of the constants themselves, as they
> >> are internal and could change in the future.
> >>
> >> Signed-off-by: brian m. carlson 
> >> ---
> >>   commit-graph.c | 9 -
> >>   1 file changed, 8 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/commit-graph.c b/commit-graph.c
> >> index 7a28fbb03f..e587c21bb6 100644
> >> --- a/commit-graph.c
> >> +++ b/commit-graph.c
> >> @@ -45,7 +45,14 @@ char *get_commit_graph_filename(const char *obj_dir)
> >>
> >>   static uint8_t oid_version(void)
> >>   {
> >> -   return 1;
> >> +   switch (hash_algo_by_ptr(the_hash_algo)) {
> >> +   case GIT_HASH_SHA1:
> >> +   return 1;
> >> +   case GIT_HASH_SHA256:
> >> +   return 2;
> > Should we just increase this field to uint32_t and store format_id
> > instead? That will keep oid version unique in all data formats.
> Both the commit-graph and multi-pack-index store a single byte for the
> hash version, so that ship has sailed (without incrementing the full
> file version number in each format).

And it's probably premature to add the oid version field when multiple
hash support has not been fully realized. Now we have different ways
of storing hash id and need separate mappings.

I would go for incrementing file version. Otherwise maybe we just
update format_id to be one byte instead, and the way of storing hash
version in commit-graph will be used everywhere.

> It may be good to make this method accessible to both formats. I'm not
> sure if Brian's branch is built on top of the multi-pack-index code.
> Probably best to see if ds/multi-pack-verify is in the history.
>
> Thanks,
> -Stolee



-- 
Duy


Re: [PATCH v2 10/13] tests: include detailed trace logs with --write-junit-xml upon failure

2018-10-16 Thread SZEDER Gábor
On Tue, Oct 16, 2018 at 03:02:38PM +0200, Johannes Schindelin wrote:
> Hi Gábor,
> 
> On Tue, 16 Oct 2018, SZEDER Gábor wrote:
> 
> > On Mon, Oct 15, 2018 at 03:12:12AM -0700, Johannes Schindelin via 
> > GitGitGadget wrote:
> > > From: Johannes Schindelin 
> > > 
> > > The JUnit XML format lends itself to be presented in a powerful UI,
> > > where you can drill down to the information you are interested in very
> > > quickly.
> > > 
> > > For test failures, this usually means that you want to see the detailed
> > > trace of the failing tests.
> > > 
> > > With Travis CI, we passed the `--verbose-log` option to get those
> > > traces. However, that seems excessive, as we do not need/use the logs in
> > 
> > As someone who has dug into a few occasional failures found by Travis
> > CI, I'd say that the output of '--verbose-log -x' is not excessive,
> > but downright essential.
> 
> I agree that the output is essential for drilling down into failures. This
> paragraph, however, talks about the general case: where there are *no*
> failures. See here:

But you don't know in advance whether there will be any failures or
not, so it only makes sense to run all tests with '--verbose-log -x'
by default, just in case a Heisenbug decides to make an appearance.

> > > almost all of those cases: only when a test fails do we have a way to
> > > include the trace.
> > > 
> > > So let's do something different when using Azure DevOps: let's run all
> > > the tests with `--quiet` first, and only if a failure is encountered,
> > > try to trace the commands as they are executed.
> > > 
> > > Of course, we cannot turn on `--verbose-log` after the fact. So let's
> > > just re-run the test with all the same options, adding `--verbose-log`.
> > > And then munging the output file into the JUnit XML on the fly.
> > > 
> > > Note: there is an off chance that re-running the test in verbose mode
> > > "fixes" the failures (and this does happen from time to time!). That is
> > > a possibility we should be able to live with.
> > 
> > Any CI system worth its salt should provide as much information about
> > any failures as possible, especially when it was lucky enough to
> > stumble upon a rare and hard to reproduce non-deterministic failure.
> 
> I would agree with you if more people started to pay attention to our CI
> failures. And if we had some sort of a development model where a CI
> failure would halt development on that particular topic until the failure
> is fixed, with the responsibility assigned to somebody to fix it.
> 
> This is not the case here, though. pu is broken for ages, at least on
> Windows, and even a *single* topic is enough to do that. And this is even
> worse with flakey tests. I cannot remember *how often* I saw CI failures
> in t5570-git-daemon.sh, for example. It is rare enough that it is obvious
> that this is a problem of the *regression test*, rather than a problem of
> the code that is to be tested.

Some occasional failures in t5570 are actually caused by issues in Git
on certain platforms:

  
https://public-inbox.org/git/CAM0VKj=mcs+cmogzf_xypeb+qzrfmumh52-pv_ndmza9x+r...@mail.gmail.com/T/#u

> So I would suggest to go forward with my proposed strategy for the moment,
> right up until the time when we have had the resources to fix t5570, for
> starters.

I don't really understand what the occasional failures in t5570 have
to do with the amount of information a CI system should gather about
failures in general.  Or how many people pay attention to it, or what
kind of development model we have, for that matter.  The way I see it
these are unrelated issues, and a CI system should always provide as
much information about failures as possible.  If only a few people pay
attention to it, then for the sake of those few.


> > > Ideally, we would label this as "Passed upon rerun", and Azure
> > > Pipelines even know about that outcome, but it is not available when
> > > using the JUnit XML format for now:
> > > https://github.com/Microsoft/azure-pipelines-agent/blob/master/src/Agent.Worker/TestResults/JunitResultReader.cs
> > > 
> > > Signed-off-by: Johannes Schindelin 
> > 



Re: [PATCH v2 13/13] commit-graph: specify OID version for SHA-256

2018-10-16 Thread Derrick Stolee

On 10/16/2018 11:35 AM, Duy Nguyen wrote:

On Mon, Oct 15, 2018 at 4:23 AM brian m. carlson
 wrote:

Since the commit-graph code wants to serialize the hash algorithm into
the data store, specify a version number for each supported algorithm.
Note that we don't use the values of the constants themselves, as they
are internal and could change in the future.

Signed-off-by: brian m. carlson 
---
  commit-graph.c | 9 -
  1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 7a28fbb03f..e587c21bb6 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -45,7 +45,14 @@ char *get_commit_graph_filename(const char *obj_dir)

  static uint8_t oid_version(void)
  {
-   return 1;
+   switch (hash_algo_by_ptr(the_hash_algo)) {
+   case GIT_HASH_SHA1:
+   return 1;
+   case GIT_HASH_SHA256:
+   return 2;

Should we just increase this field to uint32_t and store format_id
instead? That will keep oid version unique in all data formats.
Both the commit-graph and multi-pack-index store a single byte for the 
hash version, so that ship has sailed (without incrementing the full 
file version number in each format).


It may be good to make this method accessible to both formats. I'm not 
sure if Brian's branch is built on top of the multi-pack-index code. 
Probably best to see if ds/multi-pack-verify is in the history.


Thanks,
-Stolee


Re: [PATCH v2 04/13] cache: make hashcmp and hasheq work with larger hashes

2018-10-16 Thread Duy Nguyen
On Mon, Oct 15, 2018 at 4:21 AM brian m. carlson
 wrote:
> diff --git a/cache.h b/cache.h
> index a13d14ce0a..0b88c3a344 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1024,16 +1024,12 @@ extern const struct object_id null_oid;
>  static inline int hashcmp(const unsigned char *sha1, const unsigned char 
> *sha2)
>  {
> /*
> -* This is a temporary optimization hack. By asserting the size here,
> -* we let the compiler know that it's always going to be 20, which 
> lets
> -* it turn this fixed-size memcmp into a few inline instructions.
> -*
> -* This will need to be extended or ripped out when we learn about
> -* hashes of different sizes.
> +* Teach the compiler that there are only two possibilities of hash 
> size
> +* here, so that it can optimize for this case as much as possible.
>  */
> -   if (the_hash_algo->rawsz != 20)
> -   BUG("hash size not yet supported by hashcmp");
> -   return memcmp(sha1, sha2, the_hash_algo->rawsz);
> +   if (the_hash_algo->rawsz == GIT_MAX_RAWSZ)

It's tangent. But performance is probably another good reason to
detach the_hash_algo from the_repository so we have one less
dereference to do. (the other good reason is these hash operations
should work in "no-repo" commands as well, where the_repository does
not really make sense).

> +   return memcmp(sha1, sha2, GIT_MAX_RAWSZ);
> +   return memcmp(sha1, sha2, GIT_SHA1_RAWSZ);
>  }
>
>  static inline int oidcmp(const struct object_id *oid1, const struct 
> object_id *oid2)
> @@ -1043,7 +1039,13 @@ static inline int oidcmp(const struct object_id *oid1, 
> const struct object_id *o
>
>  static inline int hasheq(const unsigned char *sha1, const unsigned char 
> *sha2)
>  {
> -   return !hashcmp(sha1, sha2);
> +   /*
> +* We write this here instead of deferring to hashcmp so that the
> +* compiler can properly inline it and avoid calling memcmp.
> +*/
> +   if (the_hash_algo->rawsz == GIT_MAX_RAWSZ)
> +   return !memcmp(sha1, sha2, GIT_MAX_RAWSZ);
> +   return !memcmp(sha1, sha2, GIT_SHA1_RAWSZ);
>  }
>
>  static inline int oideq(const struct object_id *oid1, const struct object_id 
> *oid2)



-- 
Duy


Re: [PATCH v2 00/13] Base SHA-256 implementation

2018-10-16 Thread Duy Nguyen
On Mon, Oct 15, 2018 at 4:21 AM brian m. carlson
 wrote:
>
> This series provides an actual SHA-256 implementation and wires it up,
> along with some housekeeping patches to make it suitable for testing.
>
> New in this version is a patch which reverts the change to limit hashcmp
> to 20 bytes.  I've taken care to permit the compiler to inline as much
> as possible for efficiency, but it would be helpful to see what the
> performance impact of these changes is on real-life workflows, or with
> MSVC and other non-GCC and non-clang compilers.  The resulting change is
> uglier and more duplicative than I wanted, but oh, well.
>
> I didn't attempt to pull in the full complement of code from libtomcrypt
> to try to show the changes made, since that would have involved
> importing a significant quantity of code in order to make things work.
>
> I realize the increase to GIT_MAX_HEXSZ will result in an increase in
> memory usage, but we're going to need to do it at some point, and the
> sooner the code is in the codebase, the sooner people can play around
> with it and test it.
>
> This piece should be the final piece of preparatory work required for a
> fully functioning SHA-256-only Git.  Additional work should be able to
> come into the testsuite and codebase without needing to build on work in
> any series after this one.

Thanks again for keeping working on this. I'm still sick and can't
really do a deep review. With that in mind, the patches look good.
-- 
Duy


Re: [PATCH v2 12/13] hash: add an SHA-256 implementation using OpenSSL

2018-10-16 Thread Duy Nguyen
On Mon, Oct 15, 2018 at 4:22 AM brian m. carlson
 wrote:
>
> We already have OpenSSL routines available for SHA-1, so add routines
> for SHA-256 as well.

Since we have "hash-speed" tool now, it would be great to keep some
numbers of these hash implementations in the commit message (and maybe
sha1 as well just for comparison).

>
> Signed-off-by: brian m. carlson 
> ---
>  Makefile | 7 +++
>  hash.h   | 2 ++
>  2 files changed, 9 insertions(+)
>
> diff --git a/Makefile b/Makefile
> index 3d91555a81..3164e2aeee 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -183,6 +183,8 @@ all::
>  #
>  # Define GCRYPT_SHA256 to use the SHA-256 routines in libgcrypt.
>  #
> +# Define OPENSSL_SHA256 to use the SHA-256 routines in OpenSSL.
> +#
>  # Define NEEDS_CRYPTO_WITH_SSL if you need -lcrypto when using -lssl 
> (Darwin).
>  #
>  # Define NEEDS_SSL_WITH_CRYPTO if you need -lssl when using -lcrypto 
> (Darwin).
> @@ -1638,6 +1640,10 @@ endif
>  endif
>  endif
>
> +ifdef OPENSSL_SHA256
> +   EXTLIBS += $(LIB_4_CRYPTO)
> +   BASIC_CFLAGS += -DSHA256_OPENSSL
> +else
>  ifdef GCRYPT_SHA256
> BASIC_CFLAGS += -DSHA256_GCRYPT
> EXTLIBS += -lgcrypt
> @@ -1645,6 +1651,7 @@ else
> LIB_OBJS += sha256/block/sha256.o
> BASIC_CFLAGS += -DSHA256_BLK
>  endif
> +endif
>
>  ifdef SHA1_MAX_BLOCK_SIZE
> LIB_OBJS += compat/sha1-chunked.o
> diff --git a/hash.h b/hash.h
> index 9df562f2f6..9df06d56b4 100644
> --- a/hash.h
> +++ b/hash.h
> @@ -17,6 +17,8 @@
>
>  #if defined(SHA256_GCRYPT)
>  #include "sha256/gcrypt.h"
> +#elif defined(SHA256_OPENSSL)
> +#include 
>  #else
>  #include "sha256/block/sha256.h"
>  #endif



-- 
Duy


Re: [PATCH v2 13/13] commit-graph: specify OID version for SHA-256

2018-10-16 Thread Duy Nguyen
On Mon, Oct 15, 2018 at 4:23 AM brian m. carlson
 wrote:
>
> Since the commit-graph code wants to serialize the hash algorithm into
> the data store, specify a version number for each supported algorithm.
> Note that we don't use the values of the constants themselves, as they
> are internal and could change in the future.
>
> Signed-off-by: brian m. carlson 
> ---
>  commit-graph.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index 7a28fbb03f..e587c21bb6 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -45,7 +45,14 @@ char *get_commit_graph_filename(const char *obj_dir)
>
>  static uint8_t oid_version(void)
>  {
> -   return 1;
> +   switch (hash_algo_by_ptr(the_hash_algo)) {
> +   case GIT_HASH_SHA1:
> +   return 1;
> +   case GIT_HASH_SHA256:
> +   return 2;

Should we just increase this field to uint32_t and store format_id
instead? That will keep oid version unique in all data formats.

> +   default:
> +   BUG("unknown hash algorithm");
> +   }
>  }
>
>  static struct commit_graph *alloc_commit_graph(void)
-- 
Duy


Re: [PATCH v2] branch: trivial style fix

2018-10-16 Thread Jeff King
On Tue, Oct 16, 2018 at 10:19:20PM +0800, Tao Qingyun wrote:

> Signed-off-by: Tao Qingyun 
> ---
>  builtin/branch.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/builtin/branch.c b/builtin/branch.c
> index c396c41533..0aa3dac27b 100644
> --- a/builtin/branch.c
> +++ b/builtin/branch.c
> @@ -716,8 +716,7 @@ int cmd_branch(int argc, const char **argv, const char 
> *prefix)
>   print_columns(, colopts, NULL);
>   string_list_clear(, 0);
>   return 0;
> - }
> - else if (edit_description) {
> + } else if (edit_description) {
>   const char *branch_name;
>   struct strbuf branch_ref = STRBUF_INIT;

Yep, this looks reasonable. Normally I would complain about an empty
commit message, but there is really not much else to say. ;)

-Peff


Re: [PATCH] branch: trivial style fix

2018-10-16 Thread Jeff King
On Tue, Oct 16, 2018 at 04:27:44PM +0200, Ævar Arnfjörð Bjarmason wrote:

> On Tue, Oct 16, 2018 at 4:16 PM Tao Qingyun  wrote:
> >
> > >Sorry for the slow reply. For some reason I do not think your message
> > >here made it to the list (but I don't see anything obviously wrong with
> > >it).
> > Yes, I cann't send message to the list using my email clinet, I don't
> > know why. The only way I can make it is using `git send-email`(including
> > this message).
> 
> It's almost certainly because your message contains a HTML part. See
> if you can find how to disable that in your mailer and send plain-text
> only. The vger.kernel.org mailer just refuses all HTML E-Mail as a
> spam heuristic.

That was my guess, too, but the message that I got did not have such a
part. Weird.

-Peff


Re: [PATCH 2/3] http: add support for disabling SSL revocation checks in cURL

2018-10-16 Thread Jeff King
On Tue, Oct 16, 2018 at 02:25:57PM +0200, Johannes Schindelin wrote:

> > > That ">=" is hard to grok.  I think you meant it to be pronounced
> > > "requries at least", but that is not a common reading.  People more
> > > commonly pronounce it "is greater than or equal to".
> > 
> > This seemed oddly familiar:
> > 
> >   
> > https://public-inbox.org/git/8da9d436-88b9-7959-dd9c-65bdd376b...@mail.mipt.ru/
> > 
> > Since this one is clearly copied from there, it may be worth fixing the
> > original.
> 
> Good memory. I just integrated the patch here. It was not signed off, but
> it is too obvious to be protected under copyright, so I re-did it, adding
> a nice commit message.

Yay, thank you (I considered doing the same, but was scratching my head
over how to deal with authorship and signoff).

-Peff


Re: [PATCH v2 01/13] sha1-file: rename algorithm to "sha1"

2018-10-16 Thread Duy Nguyen
On Mon, Oct 15, 2018 at 4:21 AM brian m. carlson
 wrote:
>
> The transition plan anticipates us using a syntax such as "^{sha1}" for
> disambiguation.  Since this is a syntax some people will be typing a
> lot, it makes sense to provide a short, easy-to-type syntax.  Omitting
> the dash doesn't create any ambiguity, but it does make it shorter and

"but" or "and"? I think both clauses are on the same side ... or did
you mean omitting the dash does create ambiguity?

> easier to type, especially for touch typists.  In addition, the
> transition plan already uses "sha1" in this context.
>
> Rename the name of SHA-1 implementation to "sha1".
>
> Note that this change creates no backwards compatibility concerns, since
> we haven't yet used this field in any serialized data formats.

But we're not going to use this _string_ in any data format either
because we'll stick to format_id field anyway, right?
-- 
Duy


Re: Ignored files being silently overwritten when switching branches

2018-10-16 Thread Duy Nguyen
On Tue, Oct 16, 2018 at 11:12 AM Ævar Arnfjörð Bjarmason
 wrote:
>
>
> On Tue, Oct 16 2018, Jeff King wrote:
>
> > On Mon, Oct 15, 2018 at 01:01:50PM +, Per Lundberg wrote:
> >
> >> Sorry if this question has been asked before; I skimmed through the list
> >> archives and the FAQ but couldn't immediately find it - please point me
> >> in the right direction if it has indeed been discussed before.
> >
> > It is a frequently asked question, but it doesn't seem to be in any FAQ
> > that I could find. The behavior you're seeing is intended. See this
> > message (and the rest of the thread) for discussion:
> >
> >   https://public-inbox.org/git/7viq39avay@alter.siamese.dyndns.org/
> >
> >> So my question is: is this by design or should this be considered a bug
> >> in git? Of course, it depends largely on what .gitignore is being used
> >> for - if we are talking about files which can easily be regenerated
> >> (build artifacts, node_modules folders etc.) I can totally understand
> >> the current behavior, but when dealing with more sensitive & important
> >> content it's a bit inconvenient.
> >
> > Basically: yes. It would be nice to have that "do not track this, but do
> > not trash it either" state for a file, but Git does not currently
> > support that.
>
> There's some patches in that thread that could be picked up by someone
> interested. I think the approach mentioned by Matthieu Moy here makes
> the most sense:
> https://public-inbox.org/git/vpqd3t9656k@bauges.imag.fr/
>
> I don't think the rationale mentioned by Junio in
> https://public-inbox.org/git/7v4oepaup7@alter.siamese.dyndns.org/ is
> very convincing.

Just fyi I also have some wip changes that add the forth "precious"
class in addition to tracked, untracked and ignored [1]. If someone
has time it could be another option to pick up.

[1] 
https://gitlab.com/pclouds/git/commit/0e7f7afa1879b055369ebd3f1224311c43c8a32b
-- 
Duy


  1   2   >