date:20181102

[PATCH v2] build: link with curl-defined linker flags

2018-11-02 Thread James Knight

Adjusting the build process to rely more on curl-config to populate
linker flags instead of manually populating flags based off detected
features.

Originally, a configure-invoked build would check for SSL-support in the
target curl library. If enabled, NEEDS_SSL_WITH_CURL would be set and
used in the Makefile to append additional libraries to link against. As
for systems building solely with make, the defines NEEDS_IDN_WITH_CURL
and NEEDS_SSL_WITH_CURL could be set to indirectly enable respective
linker flags. Since both configure.ac and Makefile already rely on
curl-config utility to provide curl-related build information, adjusting
the respective assets to populate required linker flags using the
utility (unless explicitly configured).

Signed-off-by: James Knight 
---
Changes v1 -> v2:
 - Improved support for detecting curl linker flags when not using a
configure-based build (mentioned by Junio C Hamano).
 - Adding a description on how to explicitly use the CURL_LDFLAGS
define when not using configure (suggested by Junio C Hamano).

The original patch made a (bad) assumption that builds would always
invoke ./configure before attempting to build Git. To support a
make-invoked build, CURL_LDFLAGS can also be populated using the defined
curl-config utility. This change also comes with the assumption that
since both ./configure and Makefile are using curl-config to determine
aspects of the build, it should be also fine to use the same utility to
provide the linker flags to compile against (please indicate so if this
is another bad assumption). With this change, the explicit configuration
of CURL_LDFLAGS inside config.mak.uname for Minix and NONSTOP_KERNEL
have been dropped.
---
 Makefile | 30 +++---
 config.mak.uname |  3 ---
 configure.ac | 17 +++--
 3 files changed, 22 insertions(+), 28 deletions(-)

diff --git a/Makefile b/Makefile
index 
b08d5ea258c69a78745dfa73fe698c11d021858a..36da17dc1f9b1a70c9142604afe989f1eb8ee87f
 100644
--- a/Makefile
+++ b/Makefile
@@ -59,6 +59,13 @@ all::
 # Define CURL_CONFIG to curl's configuration program that prints information
 # about the library (e.g., its version number).  The default is 'curl-config'.
 #
+# Define CURL_LDFLAGS to specify flags that you need to link when using 
libcurl,
+# if you do not want to rely on the libraries provided by CURL_CONFIG.  The
+# default value is a result of `curl-config --libs`.  An example value for
+# CURL_LDFLAGS is as follows:
+#
+# CURL_LDFLAGS=-lcurl
+#
 # Define NO_EXPAT if you do not have expat installed.  git-http-push is
 # not built, and you cannot push using http:// and https:// transports (dumb).
 #
@@ -183,10 +190,6 @@ all::
 #
 # Define NEEDS_SSL_WITH_CRYPTO if you need -lssl when using -lcrypto (Darwin).
 #
-# Define NEEDS_SSL_WITH_CURL if you need -lssl with -lcurl (Minix).
-#
-# Define NEEDS_IDN_WITH_CURL if you need -lidn when using -lcurl (Minix).
-#
 # Define NEEDS_LIBICONV if linking with libc is not enough (Darwin).
 #
 # Define NEEDS_LIBINTL_BEFORE_LIBICONV if you need libintl before libiconv.
@@ -1305,20 +1308,17 @@ else
ifdef CURLDIR
# Try "-Wl,-rpath=$(CURLDIR)/$(lib)" in such a case.
BASIC_CFLAGS += -I$(CURLDIR)/include
-   CURL_LIBCURL = -L$(CURLDIR)/$(lib) 
$(CC_LD_DYNPATH)$(CURLDIR)/$(lib) -lcurl
+   CURL_LIBCURL = -L$(CURLDIR)/$(lib) 
$(CC_LD_DYNPATH)$(CURLDIR)/$(lib)
else
-   CURL_LIBCURL = -lcurl
-   endif
-   ifdef NEEDS_SSL_WITH_CURL
-   CURL_LIBCURL += -lssl
-   ifdef NEEDS_CRYPTO_WITH_SSL
-   CURL_LIBCURL += -lcrypto
-   endif
-   endif
-   ifdef NEEDS_IDN_WITH_CURL
-   CURL_LIBCURL += -lidn
+   CURL_LIBCURL =
endif
 
+ifdef CURL_LDFLAGS
+   CURL_LIBCURL += $(CURL_LDFLAGS)
+else
+   CURL_LIBCURL += $(shell $(CURL_CONFIG) --libs)
+endif
+
REMOTE_CURL_PRIMARY = git-remote-http$X
REMOTE_CURL_ALIASES = git-remote-https$X git-remote-ftp$X 
git-remote-ftps$X
REMOTE_CURL_NAMES = $(REMOTE_CURL_PRIMARY) $(REMOTE_CURL_ALIASES)
diff --git a/config.mak.uname b/config.mak.uname
index 
8acdeb71fdab3b3e8e3c536110eb6de13f14e8ff..19e6633040b1db4a148d7b33c4e9d374fe7f87ba
 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -431,8 +431,6 @@ ifeq ($(uname_S),Minix)
NO_NSEC = YesPlease
NEEDS_LIBGEN =
NEEDS_CRYPTO_WITH_SSL = YesPlease
-   NEEDS_IDN_WITH_CURL = YesPlease
-   NEEDS_SSL_WITH_CURL = YesPlease
NEEDS_RESOLV =
NO_HSTRERROR = YesPlease
NO_MMAP = YesPlease
@@ -458,7 +456,6 @@ ifeq ($(uname_S),NONSTOP_KERNEL)
# Missdetected, hence commented out, see below.
#NO_CURL = YesPlease
# Added manually, see above.
-   NEEDS_SSL_WITH_CURL = YesPlease
HAVE_LIBCHARSET_H = YesPlease
HAVE_STRINGS_H = YesPlease
NEEDS_LIBICONV = YesPlease

[PATCH] git-worktree.txt: correct linkgit command name

2018-11-02 Thread Nguyễn Thái Ngọc Duy

Noticed-by: SZEDER Gábor 
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 On top of nd/per-worktree-ref-iteration

 Documentation/git-worktree.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/git-worktree.txt b/Documentation/git-worktree.txt
index 9117e4fb50..69d55f1350 100644
--- a/Documentation/git-worktree.txt
+++ b/Documentation/git-worktree.txt
@@ -230,7 +230,7 @@ GIT_COMMON_DIR/worktrees/foo/HEAD and
 GIT_COMMON_DIR/worktrees/bar/refs/bisect/bad.
 
 To access refs, it's best not to look inside GIT_DIR directly. Instead
-use commands such as linkgit:git-revparse[1] or linkgit:git-update-ref[1]
+use commands such as linkgit:git-rev-parse[1] or linkgit:git-update-ref[1]
 which will handle refs correctly.
 
 DETAILS
-- 
2.19.1.1005.gac84295441

Re: [PATCH v1] add: speed up cmd_add() by utilizing read_cache_preload()

2018-11-02 Thread Duy Nguyen

On Sat, Nov 3, 2018 at 1:38 AM Junio C Hamano  wrote:
>
> Duy Nguyen  writes:
>
> > On Fri, Nov 2, 2018 at 2:32 PM Ben Peart  wrote:
> >>
> >> From: Ben Peart 
> >>
> >> During an "add", a call is made to run_diff_files() which calls
> >> check_remove() for each index-entry.  The preload_index() code distributes
> >> some of the costs across multiple threads.
> >
> > Instead of doing this site by site. How about we make read_cache()
> > always do multithread preload?
>
> I suspect that it would be a huge performance killer.
>
> Many codepaths do not even want to know if the working tree files
> have been modified, even though they need to know what's in the
> index.  Think "git commit-tree", "git diff --cached", etc.

Ah. I keep forgetting read_cache_preload is loading the index _and_
refreshing. I thought the two had some different semantics but failed
to see it last time.
-- 
Duy

Re: Git Test Coverage Report (Friday, Nov 2)

2018-11-02 Thread Junio C Hamano

Derrick Stolee  writes:

> Uncovered code in 'next' not in 'master'
> 
>
> pretty.c
> 4de9394dcb 1264) if (c->signature_check.primary_key_fingerprint)
> 4de9394dcb 1265) strbuf_addstr(sb,
> c->signature_check.primary_key_fingerprint);
> 4de9394dcb 1266) break;

Perhaps a patch along this line can be appended to the
mg/gpg-fingerprint topic that ends at 4de9394d ("gpg-interface.c:
obtain primary key fingerprint as well", 2018-10-22) to cover this
entry in the report.  

I do not know how involved it would be to set up a new test case
that demonstrates a case where %GF and %GP are different, but if it
is very involved perhaps it is not worth adding such a case.

 t/t7510-signed-commit.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/t/t7510-signed-commit.sh b/t/t7510-signed-commit.sh
index 19ccae2869..9ecafedcc4 100755
--- a/t/t7510-signed-commit.sh
+++ b/t/t7510-signed-commit.sh
@@ -176,8 +176,9 @@ test_expect_success GPG 'show good signature with custom 
format' '
13B6F51ECDDE430D
C O Mitter 
73D758744BE721698EC54E8713B6F51ECDDE430D
+   73D758744BE721698EC54E8713B6F51ECDDE430D
EOF
-   git log -1 --format="%G?%n%GK%n%GS%n%GF" sixth-signed >actual &&
+   git log -1 --format="%G?%n%GK%n%GS%n%GF%n%GP" sixth-signed >actual &&
test_cmp expect actual
 '

Git Test Coverage Report (Friday, Nov 2)

2018-11-02 Thread Derrick Stolee

Here is the coverage report for today. Some builds were timing out, so I 
removed the tests with number 9000 or more from the build [1]. Hopefully 
this is a temporary measure.


Thanks,

-Stolee

[1] 
https://dev.azure.com/git/git/_build/results?buildId=250&_a=summary=logs


---

pu: 44234e885f450a57812806cefe0d4aa3f43f2800
jch: 35594a3ecc09908facb9790ba3817ad08d2af8e1
next: 96ac06677ac950c97d331f0ef4892125027eff39
master: d582ea202b626dcc6c3b01e1e11a296d9badd730
master@{1}: 4ede3d42dfb57f9a41ac96a1f216c62eb7566cc2

Uncovered code in 'pu' not in 'jch'
--

builtin/blame.c
44234e885f builtin/blame.c    200) 
repo_unuse_commit_buffer(the_repository, commit, message);
74e8221b52 builtin/blame.c    924) blame_date_width = sizeof("Thu Oct 19 
16:00");

74e8221b52 builtin/blame.c    925) break;

builtin/describe.c
44234e885f builtin/describe.c 257) repo_parse_commit(the_repository, p);

builtin/fsck.c
255a564e58 builtin/fsck.c 622) fprintf_ln(stderr, _("Checking %s link"), 
head_ref_name);

255a564e58 builtin/fsck.c 627) return error(_("invalid %s"), head_ref_name);

builtin/grep.c
76e9bdc437 builtin/grep.c  429) grep_read_unlock();

builtin/pack-objects.c
44234e885f builtin/pack-objects.c 2832) if 
(!repo_has_object_file(the_repository, >oid) && 
is_promisor_object(>oid))


builtin/reflog.c
c9ef0d95eb builtin/reflog.c 585) all_worktrees = 0;
c9ef0d95eb builtin/reflog.c 621) continue;

date.c
74e8221b52  113) die("Timestamp too large for this system: %"PRItime, time);
74e8221b52  216) if (tm->tm_mon == human_tm->tm_mon) {
74e8221b52  217) if (tm->tm_mday > human_tm->tm_mday) {
74e8221b52  219) } else if (tm->tm_mday == human_tm->tm_mday) {
74e8221b52  220) hide.date = hide.wday = 1;
74e8221b52  221) } else if (tm->tm_mday + 5 > human_tm->tm_mday) {
74e8221b52  223) hide.date = 1;
74e8221b52  231) gettimeofday(, NULL);
74e8221b52  232) show_date_relative(time, tz, , buf);
74e8221b52  233) return;
74e8221b52  246) hide.seconds = 1;
74e8221b52  247) hide.tz |= !hide.date;
74e8221b52  248) hide.wday = hide.time = !hide.year;
74e8221b52  262) strbuf_rtrim(buf);
74e8221b52  287) gettimeofday(, NULL);
74e8221b52  290) human_tz = local_time_tzoffset(now.tv_sec, _tm);
74e8221b52  886) static int auto_date_style(void)
74e8221b52  888) return (isatty(1) || pager_in_use()) ? DATE_HUMAN : 
DATE_NORMAL;

74e8221b52  909) return DATE_HUMAN;
74e8221b52  911) return auto_date_style();

fast-import.c
44234e885f 2935) buf = repo_read_object_file(the_repository, oid, , 
);

44234e885f 3041) buf = repo_read_object_file(the_repository, oid, ,

fsck.c
44234e885f  858) repo_unuse_commit_buffer(the_repository, commit, buffer);
44234e885f  878) repo_read_object_file(the_repository,
44234e885f  879)   >object.oid, , );

http-push.c
44234e885f 1635) if (!repo_has_object_file(the_repository, _oid))
44234e885f 1642) if (!repo_has_object_file(the_repository, 
_ref->old_oid))


merge-recursive.c
4cdc48e412 1585) return -1;
4cdc48e412 1588) return -1;
4cdc48e412 1594) return -1;
4cdc48e412 1596) if (update_file(o, 1, b_oid, b_mode, collide_path))
4cdc48e412 1597) return -1;
4cdc48e412 1664) return -1;
4cdc48e412 1667) return -1;
4cdc48e412 1670) return -1;
b58ae691c0 1703) return -1;
387361a6a7 1738) return -1;
387361a6a7 1786) return -1;
387361a6a7 1795) new_path = unique_path(o, a->path, ci->branch1);
387361a6a7 1796) output(o, 1, _("Refusing to lose untracked file"
387361a6a7 1802) return -1;
387361a6a7 1805) return -1;
387361a6a7 1815) return -1;
387361a6a7 1831) return -1;
387361a6a7 1834) return -1;

negotiator/default.c
44234e885f  71) if (repo_parse_commit(the_repository, commit))

refs.c
3a3b9d8cde  657) return 0;

refs/files-backend.c
remote.c
879b6a9e6f 1140) return error(_("dst ref %s receives from more than one 
src."),


revision.c
44234e885f  726) if (repo_parse_commit(the_repository, p) < 0)

sequencer.c
44234e885f 1624) repo_unuse_commit_buffer(the_repository, head_commit,
44234e885f 3868) repo_unuse_commit_buffer(the_repository,

sha1-array.c
bba406749a 91) oidcpy([dst], [src]);

submodule-config.c
bcbc780d14 740) return CONFIG_INVALID_KEY;
45f5ef3d77 755) warning(_("Could not update .gitmodules entry %s"), key);

submodule.c
b303ef65e7  524) the_repository->submodule_prefix :
e2419f7e30 1378) strbuf_release();
7454fe5cb6 1501) struct get_next_submodule_task *task = task_cb;
7454fe5cb6 1505) get_next_submodule_task_release(task);
7454fe5cb6 1532) return 0;
7454fe5cb6 1536) goto out;
7454fe5cb6 1551) return 0;

tree.c
44234e885f 108) if (repo_parse_commit(the_repository, commit))

worktree.c
3a3b9d8cde 495) return -1;
3a3b9d8cde 508) return -1;
3a3b9d8cde 517) return -1;
ab3e1f78ae 537) break;

wrapper.c
5efde212fc  70) die("Out of memory, malloc failed (tried to allocate %" 
PRIuMAX " bytes)",
5efde212fc  73) error("Out of memory, malloc failed (tried to allocate 
%" PRIuMAX " bytes)",


Commits introducing uncovered code:
Ævar Arnfjörð Bjarmason  879b6a9e6: i18n: remote.c: mark

Re: [PATCH] diff: differentiate error handling in parse_color_moved_ws

2018-11-02 Thread Junio C Hamano

Stefan Beller  writes:

>  
> -static int parse_color_moved_ws(const char *arg)
> +static unsigned parse_color_moved_ws(const char *arg)
>  {
>   int ret = 0;
>   struct string_list l = STRING_LIST_INIT_DUP;
> @@ -312,15 +312,19 @@ static int parse_color_moved_ws(const char *arg)
>   ret |= XDF_IGNORE_WHITESPACE;
>   else if (!strcmp(sb.buf, "allow-indentation-change"))
>   ret |= COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE;
> - else
> + else {
> + ret |= COLOR_MOVED_WS_ERROR;
>   error(_("ignoring unknown color-moved-ws mode '%s'"), 
> sb.buf);
> + }
> ...  
>   } else if (skip_prefix(arg, "--color-moved-ws=", )) {
> - options->color_moved_ws_handling = parse_color_moved_ws(arg);
> + unsigned cm = parse_color_moved_ws(arg);
> + if (cm & COLOR_MOVED_WS_ERROR)
> + die("bad --color-moved-ws argument: %s", arg);
> + options->color_moved_ws_handling = cm;

Excellent.

Will queue.  Perhaps a test or two can follow to ensure a bad value
from config does not kill while a command line does?

Thanks.

Re: [RFC/PATCH 0/5] stop installing old libexec aliases like "git-init"

2018-11-02 Thread Junio C Hamano

Ævar Arnfjörð Bjarmason   writes:

> Although I'm on the fence with the approach in 1/5. Should this be a
> giant getopt switch statement like that in a helper script?
>
> An alternative would be to write out a shell file similar to
> GIT-BUILD-OPTIONS and source that from this thing. I don't know,
> what do you all think?

Not really.  Why do we iterate over these in a shell loop, rather
than having make to figure them out, just like we do when we "loop
over the source files and turn them into object files" without using
a shell loop?  What's so special about enumerating the installation
targets and iterating over the enumeration to perform an action on
each of them?  I think that is the first question we should be
asking before patch 1/5, which already assumes that it has been
decided that it must be done with a shell loop.

I think "first install 'git' itself, and then make these
other things derived from it" should let $(MAKE) install
things in parallel just like it can naturally do many things
in parallel, and the dependency rule to do so should not be
so bad, I suspect.

This is a tangent, but I have long been wishing that somebody would
notice that output during install and (dist)clean without V=1 is so
different from the normal targets and do something about it, and
hoped that that somebody finally turned out to be you doing so in
this series X-<.

> I'd like to say it's ready, but I've spotted some fallout:

I still have not recovered from the trauma I suffered post 1.6.0
era, so I would rather *not* engage in a long discussion like this
one (it is a long thread; reserve a solid hour to read it through if
you are interested),

https://public-inbox.org/git/alpine.lfd.1.10.0808261435470.3...@nehalem.linux-foundation.org/

which would be needed to defend the choice, if we decide to omit
installing the git-foo on disk in a released version.

I personally have no objection to offer a knob that can e used to
force installation of symlinks without falling back to other
methods.  I think it would be ideal to do so without special casing
symbolic links---rather, it would be ideal if it were a single knob
INSTALL_GIT_FOO_METHOD=(symlinks|hardlinks|copies) that says "I want
them to be installed as (symlinks|hardlinks|copies), no fallbacks".

> Ævar Arnfjörð Bjarmason (5):
>   Makefile: move long inline shell loops in "install" into helper
>   Makefile: conform some of the code to our coding standards
>   Makefile: stop hiding failures during "install"
>   Makefile: add NO_INSTALL_SYMLINKS_FALLBACK switch
>   Makefile: Add a NO_INSTALL_BUILTIN_EXECDIR_ALIASES flag
>
>  Makefile |  65 +++--
>  install_programs | 124 +++
>  2 files changed, 151 insertions(+), 38 deletions(-)
>  create mode 100755 install_programs

Re: [PATCH 1/1] poll: use GetTickCount64() to avoid wrap-around issues

2018-11-02 Thread Junio C Hamano

Junio C Hamano  writes:

> Johannes Sixt  writes:
>
>>> Yep, correct on all counts. I'm in favor of changing the commit message to
>>> only say that this patch removes Warning C28159.
>>
>> How about this fixup instead?
>
> Isn't that already in 'next'?  I didn't check, though.

Well, it turnsout that I already prepared one but not pushed it out
yet.  I'll eject this topic and rebuild the integration branches,
and wait until Dscho says something, to avoid having to redo the
integration cycle again.

Thanks.

Re: [PATCH 1/1] poll: use GetTickCount64() to avoid wrap-around issues

2018-11-02 Thread Junio C Hamano

Johannes Sixt  writes:

>> Yep, correct on all counts. I'm in favor of changing the commit message to
>> only say that this patch removes Warning C28159.
>
> How about this fixup instead?

Isn't that already in 'next'?  I didn't check, though.

Re: [PATCH v1] add: speed up cmd_add() by utilizing read_cache_preload()

2018-11-02 Thread Junio C Hamano

Duy Nguyen  writes:

> On Fri, Nov 2, 2018 at 2:32 PM Ben Peart  wrote:
>>
>> From: Ben Peart 
>>
>> During an "add", a call is made to run_diff_files() which calls
>> check_remove() for each index-entry.  The preload_index() code distributes
>> some of the costs across multiple threads.
>
> Instead of doing this site by site. How about we make read_cache()
> always do multithread preload?

I suspect that it would be a huge performance killer. 

Many codepaths do not even want to know if the working tree files
have been modified, even though they need to know what's in the
index.  Think "git commit-tree", "git diff --cached", etc.

Re: [PATCH 3/9] diff: avoid generating unused hunk header lines

2018-11-02 Thread Junio C Hamano

Jeff King  writes:

>> to have *some* names there, for the sake of a
>> simply described coding style without many exceptions
>> (especially those exceptions that rely on judgement).
>
> Fair enough.

For the record, there aren't "many" exceptions to the rule that was
suggested earlier: if you refer to parameters by name in the comment
to explain the interface, name them and use these names [*1*].

> Squashable patch is below; it goes on 34c829621e (diff: avoid generating
> unused hunk header lines, 2018-11-02).
>
> Junio, let me know if you'd prefer a re-send of the series, but it
> doesn't look necessary otherwise (so far).

Giving them proper names would serve as a readily usable sample for
those who write new hunk-line callbacks that do not ignore them, so
what you wrote is good.  Let me squash it in.

Thanks.  If this were to add bunch of "unused$n" names to the
parameters to this no-op function, only to "have *some* names
there", I would have said that it is not even worth the time to
discuss it, though.

> diff --git a/xdiff-interface.h b/xdiff-interface.h
> index 7b0ccbdd1d..8af245eed9 100644
> --- a/xdiff-interface.h
> +++ b/xdiff-interface.h
> @@ -39,7 +39,9 @@ extern int git_xmerge_style;
>   * Can be used as a no-op hunk_fn for xdi_diff_outf(), since a NULL
>   * one just sends the hunk line to the line_fn callback).
>   */
> -void discard_hunk_line(void *, long, long, long, long, const char *, long);
> +void discard_hunk_line(void *priv,
> +long ob, long on, long nb, long nn,
> +const char *func, long funclen);
>  
>  /*
>   * Compare the strings l1 with l2 which are of size s1 and s2 respectively.
>
> -Peff

[Footnote]

*1* You may say that the "if you refer to parameters by name" part
relies on judgment, but I think it is a good thing---it makes poor
judgment stand out.

When describing a function that takes two parameters of the same
type, for example, you could explain the interface as "take two ints
and return true if first int is smaller than the second int" in
order to leave the parameters unnamed, but if you gave such a
description, it is so obvious that you are _avoiding_ names.  Not
using names when you do not have to is good, but nobody with half an
intelligence would think it is good to give a bad description just
to avoid using names.

[RFC/PATCH 0/5] stop installing old libexec aliases like "git-init"

2018-11-02 Thread Ævar Arnfjörð Bjarmason

I think up to patch 4 here should be near a state that's ready for
inclusion.

Although I'm on the fence with the approach in 1/5. Should this be a
giant getopt switch statement like that in a helper script? An
alternative would be to write out a shell file similar to
GIT-BUILD-OPTIONS and source that from this thing. I don't know, what
do you all think?

The idea with 4/5 was to make this symlink mode the default in
config.mak.uname and have a blacklist of systems like Windows that
couldn't deal with it.

Since my ad874608d8 ("Makefile: optionally symlink libexec/git-core
binaries to bin/git", 2018-03-13) I see that e.g. Debian and GitLab
have started shipping with the INSTALL_SYMLINKS flag, so making that
unconditional is the next logical step.

The 5th one is more radical. See
https://public-inbox.org/git/87woyfdkoi@evledraar.gmail.com/ from
back in March for context.

I'd like to say it's ready, but I've spotted some fallout:

 * Help like "git ninit" suggesting "git init" doesn't work, this is
   because load_command_list() in help.c doesn't look out our
   in-memory idea of builtins, it reads the libexecdir, so if we don't
   have the programs there it doesn't know about it.

 * GIT_TEST_INSTALLED breaks entirely under this, as early as the
   heuristic for "are we built?" being "do we have git-init in
   libexecdir?". I tried a bit to make this work, but there's a lot of
   dependencies there.

 * We still (and this is also true of my ad874608d8) hardlink
   everything in the build dir via a different part of the Makefile,
   ideally we should do exactly the same thing there so also normal
   tests and not just GIT_TEST_INSTALLED (if that worked) would test
   in the same mode.

   I gave making that work a bit of a try and gave up in the Makefile
   jungle.

Ævar Arnfjörð Bjarmason (5):
  Makefile: move long inline shell loops in "install" into helper
  Makefile: conform some of the code to our coding standards
  Makefile: stop hiding failures during "install"
  Makefile: add NO_INSTALL_SYMLINKS_FALLBACK switch
  Makefile: Add a NO_INSTALL_BUILTIN_EXECDIR_ALIASES flag

 Makefile |  65 +++--
 install_programs | 124 +++
 2 files changed, 151 insertions(+), 38 deletions(-)
 create mode 100755 install_programs

-- 
2.19.1.930.g4563a0d9d0

[RFC/PATCH 2/5] Makefile: conform some of the code to our coding standards

2018-11-02 Thread Ævar Arnfjörð Bjarmason

This code is still very much unlike our usual style since it was
lifted from the Makefile, but we can at least make some of it use the
usual style and line spacing.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 install_programs | 28 +---
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/install_programs b/install_programs
index e287108112..dcd25f 100755
--- a/install_programs
+++ b/install_programs
@@ -50,17 +50,21 @@ do
esac
shift
 done &&
-{ test "$bindir/" = "$execdir/" ||
-  for p in $list_bindir_standalone; do
-   $RM "$execdir/$p" &&
-   test -n "$INSTALL_SYMLINKS" &&
-   ln -s "$destdir_from_execdir/$bindir_relative/$p" "$execdir/$p" ||
-   { test -z "$NO_INSTALL_HARDLINKS$NO_CROSS_DIRECTORY_HARDLINKS" &&
- ln "$bindir/$p" "$execdir/$p" 2>/dev/null ||
- cp "$bindir/$p" "$execdir/$p" || exit; }
-  done;
-} &&
-for p in $list_bindir_git_dashed; do
+
+if test "$bindir/" != "$execdir/"
+then
+   for p in $list_bindir_standalone; do
+   $RM "$execdir/$p" &&
+   test -n "$INSTALL_SYMLINKS" &&
+   ln -s "$destdir_from_execdir/$bindir_relative/$p" "$execdir/$p" 
||
+   { test -z "$NO_INSTALL_HARDLINKS$NO_CROSS_DIRECTORY_HARDLINKS" 
&&
+ ln "$bindir/$p" "$execdir/$p" 2>/dev/null ||
+ cp "$bindir/$p" "$execdir/$p" || exit; }
+   done
+fi &&
+
+for p in $list_bindir_git_dashed
+do
$RM "$bindir/$p" &&
test -n "$INSTALL_SYMLINKS" &&
ln -s "git$X" "$bindir/$p" ||
@@ -69,6 +73,7 @@ for p in $list_bindir_git_dashed; do
  ln -s "git$X" "$bindir/$p" 2>/dev/null ||
  cp "$bindir/git$X" "$bindir/$p" || exit; }
 done &&
+
 for p in $list_execdir_git_dashed; do
$RM "$execdir/$p" &&
test -n "$INSTALL_SYMLINKS" &&
@@ -78,6 +83,7 @@ for p in $list_execdir_git_dashed; do
  ln -s "git$X" "$execdir/$p" 2>/dev/null ||
  cp "$execdir/git$X" "$execdir/$p" || exit; }
 done &&
+
 for p in $list_execdir_curl_aliases; do
$RM "$execdir/$p" &&
test -n "$INSTALL_SYMLINKS" &&
-- 
2.19.1.930.g4563a0d9d0

[RFC/PATCH 5/5] Makefile: Add a NO_INSTALL_BUILTIN_EXECDIR_ALIASES flag

2018-11-02 Thread Ævar Arnfjörð Bjarmason

Back when git was initially written the likes of "git-add", "git-init"
etc. were installed in the user's $PATH. A few years later everything,
with a few exceptions like git-upload-pack and git-receive-pack, was
expected to be invoked as "git $cmd".

Now something like a decade later we're still installing these old
commands in gitexecdir. This is so someone with a shellscript that
still targets e.g. "git-init" can add $(git --exec-path) to their
$PATH and not have to change their script.

Let's add an option to break this backwards compatibility. Now with
NO_INSTALL_BUILTIN_EXECDIR_ALIASES=YesPlease there's only 3 programs
in the bindir that are hardlinked to "git" (receive-pack,
upload-archive & upload-pack), and 3 in the
gitexecdir (git-remote-{ftp,ftps,https} linked to git-remote-http).

There's no cross-directory links anymore, so the
"NO_CROSS_DIRECTORY_HARDLINKS" flag becomes redundant under this new
option.

1. https://public-inbox.org/git/87woyfdkoi@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Makefile |  8 
 install_programs | 36 +---
 2 files changed, 29 insertions(+), 15 deletions(-)

diff --git a/Makefile b/Makefile
index 07c8b74353..a849a7b6d1 100644
--- a/Makefile
+++ b/Makefile
@@ -346,6 +346,13 @@ all::
 # INSTALL_SYMLINKS if you'd prefer not to have the install procedure
 # fallack on hardlinking or copying if "ln -s" fails.
 #
+# Define NO_INSTALL_BUILTIN_EXECDIR_ALIASES if you'd like to skip
+# installing legacy such as "git-init" and "git-add" in the
+# gitexecdir. Unless you're on a system where "which git-init" is
+# expected to returns something set this. Users have been expected to
+# use the likes of "git init" for ages now, these programs were only
+# provided for legacy compatibility.
+#
 # Define NO_CROSS_DIRECTORY_HARDLINKS if you plan to distribute the installed
 # programs as a tar, where bin/ and libexec/ might be on different file 
systems.
 #
@@ -2823,6 +2830,7 @@ endif
--flag-no-install-hardlinks="$(NO_INSTALL_HARDLINKS)" \

--flag-no-cross-directory-hardlinks="$(NO_CROSS_DIRECTORY_HARDLINKS)" \

--flag-no-install-symlinks-fallback="$(NO_INSTALL_SYMLINKS_FALLBACK)" \
+   
--flag-no-install-builtin-execdir-aliases="$(NO_INSTALL_BUILTIN_EXECDIR_ALIASES)"
 \
--list-bindir-standalone="git$X $(filter 
$(install_bindir_programs),$(ALL_PROGRAMS))" \
--list-bindir-git-dashed="$(filter 
$(install_bindir_programs),$(BUILT_INS))" \
--list-execdir-git-dashed="$(BUILT_INS)" \
diff --git a/install_programs b/install_programs
index 51e08019dd..8d89cd9984 100755
--- a/install_programs
+++ b/install_programs
@@ -33,6 +33,9 @@ do
--flag-no-install-symlinks-fallback=*)

NO_INSTALL_SYMLINKS_FALLBACK="${1#--flag-no-install-symlinks-fallback=}"
;;
+   --flag-no-install-builtin-execdir-aliases=*)
+   
NO_INSTALL_BUILTIN_EXECDIR_ALIASES="${1#--flag-no-install-builtin-execdir-aliases=}"
+   ;;
--list-bindir-standalone=*)
list_bindir_standalone="${1#--list-bindir-standalone=}"
;;
@@ -54,7 +57,7 @@ do
shift
 done &&
 
-if test "$bindir/" != "$execdir/"
+if test "$bindir/" != "$execdir/" -a -z "$NO_INSTALL_BUILTIN_EXECDIR_ALIASES"
 then
for p in $list_bindir_standalone; do
$RM "$execdir/$p" &&
@@ -87,20 +90,23 @@ do
fi
 done &&
 
-for p in $list_execdir_git_dashed; do
-   $RM "$execdir/$p" &&
-   if test -n "$INSTALL_SYMLINKS" -a -n "$NO_INSTALL_SYMLINKS_FALLBACK"
-   then
-   ln -s "$destdir_from_execdir/$bindir_relative/git$X" 
"$execdir/$p"
-   else
-   test -n "$INSTALL_SYMLINKS" &&
-   ln -s "$destdir_from_execdir/$bindir_relative/git$X" 
"$execdir/$p" ||
-   { test -z "$NO_INSTALL_HARDLINKS" &&
- ln "$execdir/git$X" "$execdir/$p" ||
- ln -s "git$X" "$execdir/$p" ||
- cp "$execdir/git$X" "$execdir/$p" || exit; }
-   fi
-done &&
+if test -z "$NO_INSTALL_BUILTIN_EXECDIR_ALIASES"
+then
+   for p in $list_execdir_git_dashed; do
+   $RM "$execdir/$p" &&
+   if test -n "$INSTALL_SYMLINKS" -a -n 
"$NO_INSTALL_SYMLINKS_FALLBACK"
+   then
+   ln -s "$destdir_from_execdir/$bindir_relative/git$X" 
"$execdir/$p"
+   else
+   test -n "$INSTALL_SYMLINKS" &&
+   ln -s "$destdir_from_execdir/$bindir_relative/git$X" 
"$execdir/$p" ||
+   { test -z "$NO_INSTALL_HARDLINKS" &&
+ ln "$execdir/git$X" "$execdir/$p" ||
+ ln -s "git$X" "$execdir/$p" ||
+ cp "$execdir/git$X" "$execdir/$p" || exit; }
+   fi
+   done
+fi &&
 
 for p in $list_execdir_curl_aliases; do

[RFC/PATCH 3/5] Makefile: stop hiding failures during "install"

2018-11-02 Thread Ævar Arnfjörð Bjarmason

Change the fallback mechanism where we try to create hardlinks and
ultimately fall back on a plain copy to emit the errors it encounters
instead of hiding them away and silently falling back to copying.

Hiding these errors dates back to 3e073dc561 ("Makefile: always
provide a fallback when hardlinks fail", 2008-08-25) when the existing
"hardlink or copy" logic was amended to hide the errors.

At that time "make install" hadn't yet been taught any of the
NO_*_HARDLINK options, that happened later in 3426e34fed ("Add
NO_CROSS_DIRECTORY_HARDLINKS support to the Makefile", 2009-05-11) and
was finally finished to roughly the current form in
70de5e65e8 ("Makefile: NO_INSTALL_HARDLINKS", 2012-05-02).

If someone is building a git in an environment where hard linking
fails, they can now specify some combination of
NO_INSTALL_HARDLINKS=YesPlease and NO_INSTALL_HARDLINKS=YesPlease, it
doesn't make sense anymore to not only implicitly fall back to
copying, but to do so silently.

This change leaves no way to not get errors spewed if we're trying and
failing to e.g. make symlinks and having to fall back on "cp". I think
that's OK.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 install_programs | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/install_programs b/install_programs
index dcd25f..367e9a6cdf 100755
--- a/install_programs
+++ b/install_programs
@@ -58,7 +58,7 @@ then
test -n "$INSTALL_SYMLINKS" &&
ln -s "$destdir_from_execdir/$bindir_relative/$p" "$execdir/$p" 
||
{ test -z "$NO_INSTALL_HARDLINKS$NO_CROSS_DIRECTORY_HARDLINKS" 
&&
- ln "$bindir/$p" "$execdir/$p" 2>/dev/null ||
+ ln "$bindir/$p" "$execdir/$p" ||
  cp "$bindir/$p" "$execdir/$p" || exit; }
done
 fi &&
@@ -69,8 +69,8 @@ do
test -n "$INSTALL_SYMLINKS" &&
ln -s "git$X" "$bindir/$p" ||
{ test -z "$NO_INSTALL_HARDLINKS" &&
- ln "$bindir/git$X" "$bindir/$p" 2>/dev/null ||
- ln -s "git$X" "$bindir/$p" 2>/dev/null ||
+ ln "$bindir/git$X" "$bindir/$p" ||
+ ln -s "git$X" "$bindir/$p" ||
  cp "$bindir/git$X" "$bindir/$p" || exit; }
 done &&
 
@@ -79,8 +79,8 @@ for p in $list_execdir_git_dashed; do
test -n "$INSTALL_SYMLINKS" &&
ln -s "$destdir_from_execdir/$bindir_relative/git$X" "$execdir/$p" ||
{ test -z "$NO_INSTALL_HARDLINKS" &&
- ln "$execdir/git$X" "$execdir/$p" 2>/dev/null ||
- ln -s "git$X" "$execdir/$p" 2>/dev/null ||
+ ln "$execdir/git$X" "$execdir/$p" ||
+ ln -s "git$X" "$execdir/$p" ||
  cp "$execdir/git$X" "$execdir/$p" || exit; }
 done &&
 
@@ -89,7 +89,7 @@ for p in $list_execdir_curl_aliases; do
test -n "$INSTALL_SYMLINKS" &&
ln -s "git-remote-http$X" "$execdir/$p" ||
{ test -z "$NO_INSTALL_HARDLINKS" &&
- ln "$execdir/git-remote-http$X" "$execdir/$p" 2>/dev/null ||
- ln -s "git-remote-http$X" "$execdir/$p" 2>/dev/null ||
+ ln "$execdir/git-remote-http$X" "$execdir/$p" ||
+ ln -s "git-remote-http$X" "$execdir/$p" ||
  cp "$execdir/git-remote-http$X" "$execdir/$p" || exit; }
 done
-- 
2.19.1.930.g4563a0d9d0

[RFC/PATCH 4/5] Makefile: add NO_INSTALL_SYMLINKS_FALLBACK switch

2018-11-02 Thread Ævar Arnfjörð Bjarmason

Add a switch for use in conjunction with the INSTALL_SYMLINKS flag
added in ad874608d8 ("Makefile: optionally symlink libexec/git-core
binaries to bin/git", 2018-03-13).

Now it's possible to install git with:

INSTALL_SYMLINKS=YesPlease NO_INSTALL_SYMLINKS_FALLBACK=YesPlease

And know for sure that there's not going to be any silent fallbacks on
hardlinks or copying of files if symlinking fails.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Makefile |  5 
 install_programs | 69 
 2 files changed, 51 insertions(+), 23 deletions(-)

diff --git a/Makefile b/Makefile
index aa6ca1fa68..07c8b74353 100644
--- a/Makefile
+++ b/Makefile
@@ -342,6 +342,10 @@ all::
 # within the same directory in some cases, INSTALL_SYMLINKS will
 # always symlink to the final target directly.
 #
+# Define NO_INSTALL_SYMLINKS_FALLBACK if in conjunction with
+# INSTALL_SYMLINKS if you'd prefer not to have the install procedure
+# fallack on hardlinking or copying if "ln -s" fails.
+#
 # Define NO_CROSS_DIRECTORY_HARDLINKS if you plan to distribute the installed
 # programs as a tar, where bin/ and libexec/ might be on different file 
systems.
 #
@@ -2818,6 +2822,7 @@ endif
--flag-install-symlinks="$(INSTALL_SYMLINKS)" \
--flag-no-install-hardlinks="$(NO_INSTALL_HARDLINKS)" \

--flag-no-cross-directory-hardlinks="$(NO_CROSS_DIRECTORY_HARDLINKS)" \
+   
--flag-no-install-symlinks-fallback="$(NO_INSTALL_SYMLINKS_FALLBACK)" \
--list-bindir-standalone="git$X $(filter 
$(install_bindir_programs),$(ALL_PROGRAMS))" \
--list-bindir-git-dashed="$(filter 
$(install_bindir_programs),$(BUILT_INS))" \
--list-execdir-git-dashed="$(BUILT_INS)" \
diff --git a/install_programs b/install_programs
index 367e9a6cdf..51e08019dd 100755
--- a/install_programs
+++ b/install_programs
@@ -30,6 +30,9 @@ do
--flag-no-cross-directory-hardlinks=*)

NO_CROSS_DIRECTORY_HARDLINKS="${1#--flag-no-cross-directory-hardlinks=}"
;;
+   --flag-no-install-symlinks-fallback=*)
+   
NO_INSTALL_SYMLINKS_FALLBACK="${1#--flag-no-install-symlinks-fallback=}"
+   ;;
--list-bindir-standalone=*)
list_bindir_standalone="${1#--list-bindir-standalone=}"
;;
@@ -55,41 +58,61 @@ if test "$bindir/" != "$execdir/"
 then
for p in $list_bindir_standalone; do
$RM "$execdir/$p" &&
-   test -n "$INSTALL_SYMLINKS" &&
-   ln -s "$destdir_from_execdir/$bindir_relative/$p" "$execdir/$p" 
||
-   { test -z "$NO_INSTALL_HARDLINKS$NO_CROSS_DIRECTORY_HARDLINKS" 
&&
- ln "$bindir/$p" "$execdir/$p" ||
- cp "$bindir/$p" "$execdir/$p" || exit; }
+   if test -n "$INSTALL_SYMLINKS" -a -n 
"$NO_INSTALL_SYMLINKS_FALLBACK"
+   then
+   ln -s "$destdir_from_execdir/$bindir_relative/$p" 
"$execdir/$p"
+   else
+   test -n "$INSTALL_SYMLINKS" &&
+   ln -s "$destdir_from_execdir/$bindir_relative/$p" 
"$execdir/$p" ||
+   { test -z 
"$NO_INSTALL_HARDLINKS$NO_CROSS_DIRECTORY_HARDLINKS" &&
+ ln "$bindir/$p" "$execdir/$p" ||
+ cp "$bindir/$p" "$execdir/$p" || exit; }
+   fi
done
 fi &&
 
 for p in $list_bindir_git_dashed
 do
$RM "$bindir/$p" &&
-   test -n "$INSTALL_SYMLINKS" &&
-   ln -s "git$X" "$bindir/$p" ||
-   { test -z "$NO_INSTALL_HARDLINKS" &&
- ln "$bindir/git$X" "$bindir/$p" ||
- ln -s "git$X" "$bindir/$p" ||
- cp "$bindir/git$X" "$bindir/$p" || exit; }
+   if test -n "$INSTALL_SYMLINKS" -a -n "$NO_INSTALL_SYMLINKS_FALLBACK"
+   then
+   ln -s "git$X" "$bindir/$p"
+   else
+   test -n "$INSTALL_SYMLINKS" &&
+   ln -s "git$X" "$bindir/$p" ||
+   { test -z "$NO_INSTALL_HARDLINKS" &&
+ ln "$bindir/git$X" "$bindir/$p" ||
+ ln -s "git$X" "$bindir/$p" ||
+ cp "$bindir/git$X" "$bindir/$p" || exit; }
+   fi
 done &&
 
 for p in $list_execdir_git_dashed; do
$RM "$execdir/$p" &&
-   test -n "$INSTALL_SYMLINKS" &&
-   ln -s "$destdir_from_execdir/$bindir_relative/git$X" "$execdir/$p" ||
-   { test -z "$NO_INSTALL_HARDLINKS" &&
- ln "$execdir/git$X" "$execdir/$p" ||
- ln -s "git$X" "$execdir/$p" ||
- cp "$execdir/git$X" "$execdir/$p" || exit; }
+   if test -n "$INSTALL_SYMLINKS" -a -n "$NO_INSTALL_SYMLINKS_FALLBACK"
+   then
+   ln -s "$destdir_from_execdir/$bindir_relative/git$X" 
"$execdir/$p"
+   else
+   test -n "$INSTALL_SYMLINKS" &&
+   ln -s "$destdir_from_execdir/$bindir_relative/git$X" 
"$execdir/$p" ||
+   {

[RFC/PATCH 1/5] Makefile: move long inline shell loops in "install" into helper

2018-11-02 Thread Ævar Arnfjörð Bjarmason

Move a 37 line for-loop mess out of "install" and into a helper
script. This started out fairly innocent but over the years has grown
into a hard-to-maintain monster, and my recent ad874608d8 ("Makefile:
optionally symlink libexec/git-core binaries to bin/git", 2018-03-13)
certainly didn't help.

The shell code is ported pretty much as-is (with getopts added), it'll
be fixed & prettified in subsequent commits.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Makefile | 52 
 install_programs | 89 
 2 files changed, 103 insertions(+), 38 deletions(-)
 create mode 100755 install_programs

diff --git a/Makefile b/Makefile
index bbfbb4292d..aa6ca1fa68 100644
--- a/Makefile
+++ b/Makefile
@@ -2808,44 +2808,20 @@ endif
bindir=$$(cd '$(DESTDIR_SQ)$(bindir_SQ)' && pwd) && \
execdir=$$(cd '$(DESTDIR_SQ)$(gitexec_instdir_SQ)' && pwd) && \
destdir_from_execdir_SQ=$$(echo '$(gitexecdir_relative_SQ)' | sed -e 
's|[^/][^/]*|..|g') && \
-   { test "$$bindir/" = "$$execdir/" || \
- for p in git$X $(filter $(install_bindir_programs),$(ALL_PROGRAMS)); 
do \
-   $(RM) "$$execdir/$$p" && \
-   test -n "$(INSTALL_SYMLINKS)" && \
-   ln -s "$$destdir_from_execdir_SQ/$(bindir_relative_SQ)/$$p" 
"$$execdir/$$p" || \
-   { test -z 
"$(NO_INSTALL_HARDLINKS)$(NO_CROSS_DIRECTORY_HARDLINKS)" && \
- ln "$$bindir/$$p" "$$execdir/$$p" 2>/dev/null || \
- cp "$$bindir/$$p" "$$execdir/$$p" || exit; } \
- done; \
-   } && \
-   for p in $(filter $(install_bindir_programs),$(BUILT_INS)); do \
-   $(RM) "$$bindir/$$p" && \
-   test -n "$(INSTALL_SYMLINKS)" && \
-   ln -s "git$X" "$$bindir/$$p" || \
-   { test -z "$(NO_INSTALL_HARDLINKS)" && \
- ln "$$bindir/git$X" "$$bindir/$$p" 2>/dev/null || \
- ln -s "git$X" "$$bindir/$$p" 2>/dev/null || \
- cp "$$bindir/git$X" "$$bindir/$$p" || exit; } \
-   done && \
-   for p in $(BUILT_INS); do \
-   $(RM) "$$execdir/$$p" && \
-   test -n "$(INSTALL_SYMLINKS)" && \
-   ln -s "$$destdir_from_execdir_SQ/$(bindir_relative_SQ)/git$X" 
"$$execdir/$$p" || \
-   { test -z "$(NO_INSTALL_HARDLINKS)" && \
- ln "$$execdir/git$X" "$$execdir/$$p" 2>/dev/null || \
- ln -s "git$X" "$$execdir/$$p" 2>/dev/null || \
- cp "$$execdir/git$X" "$$execdir/$$p" || exit; } \
-   done && \
-   remote_curl_aliases="$(REMOTE_CURL_ALIASES)" && \
-   for p in $$remote_curl_aliases; do \
-   $(RM) "$$execdir/$$p" && \
-   test -n "$(INSTALL_SYMLINKS)" && \
-   ln -s "git-remote-http$X" "$$execdir/$$p" || \
-   { test -z "$(NO_INSTALL_HARDLINKS)" && \
- ln "$$execdir/git-remote-http$X" "$$execdir/$$p" 2>/dev/null 
|| \
- ln -s "git-remote-http$X" "$$execdir/$$p" 2>/dev/null || \
- cp "$$execdir/git-remote-http$X" "$$execdir/$$p" || exit; } \
-   done && \
+   ./install_programs \
+   --X="$$X" \
+   --RM="$(RM)" \
+   --bindir="$$bindir" \
+   --bindir-relative="$(bindir_relative_SQ)" \
+   --execdir="$$execdir" \
+   --destdir-from-execdir="$$destdir_from_execdir_SQ" \
+   --flag-install-symlinks="$(INSTALL_SYMLINKS)" \
+   --flag-no-install-hardlinks="$(NO_INSTALL_HARDLINKS)" \
+   
--flag-no-cross-directory-hardlinks="$(NO_CROSS_DIRECTORY_HARDLINKS)" \
+   --list-bindir-standalone="git$X $(filter 
$(install_bindir_programs),$(ALL_PROGRAMS))" \
+   --list-bindir-git-dashed="$(filter 
$(install_bindir_programs),$(BUILT_INS))" \
+   --list-execdir-git-dashed="$(BUILT_INS)" \
+   --list-execdir-curl-aliases="$(REMOTE_CURL_ALIASES)" && \
./check_bindir "z$$bindir" "z$$execdir" "$$bindir/git-add$X"
 
 .PHONY: install-gitweb install-doc install-man install-man-perl install-html 
install-info install-pdf
diff --git a/install_programs b/install_programs
new file mode 100755
index 00..e287108112
--- /dev/null
+++ b/install_programs
@@ -0,0 +1,89 @@
+#!/bin/sh
+
+while test $# != 0
+do
+   case "$1" in
+   --X=*)
+   X="${1#--X=}"
+   ;;
+   --RM=*)
+   RM="${1#--RM=}"
+   ;;
+   --bindir=*)
+   bindir="${1#--bindir=}"
+   ;;
+   --bindir-relative=*)
+   bindir_relative="${1#--bindir-relative=}"
+   ;;
+   --execdir=*)
+   execdir="${1#--execdir=}"
+   ;;
+   --destdir-from-execdir=*)
+   destdir_from_execdir="${1#--destdir-from-execdir=}"
+   ;;
+   --flag-install-symlinks=*)
+

[PATCH] diff: differentiate error handling in parse_color_moved_ws

2018-11-02 Thread Stefan Beller

As we check command line options more strictly and allow configuration
variables to be parsed more leniently, we need take different actions
based on whether an unknown value is given on the command line or in the
config.

Move the die() call out of parse_color_moved_ws into the parsing
of command line options. As the function returns a bit field, change
its signature to return an unsigned instead of an int; add a new bit
to signal errors. Once the error is signaled, we discard the other
bits, such that it doesn't matter if the error bit overlaps with any
other bit.

Signed-off-by: Stefan Beller 
---

This is a fresh attempt to cleanup the sloppy part that was mentioned
in https://public-inbox.org/git/xmqqa7nkf6o4@gitster-ct.c.googlers.com/

Another thing to follow up is to have color-moved-ws imply color-moved.

Thanks,
Stefan


 diff.c | 21 ++---
 diff.h |  3 ++-
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/diff.c b/diff.c
index 8647db3d30..f21f8b0332 100644
--- a/diff.c
+++ b/diff.c
@@ -291,7 +291,7 @@ static int parse_color_moved(const char *arg)
return error(_("color moved setting must be one of 'no', 
'default', 'blocks', 'zebra', 'dimmed-zebra', 'plain'"));
 }
 
-static int parse_color_moved_ws(const char *arg)
+static unsigned parse_color_moved_ws(const char *arg)
 {
int ret = 0;
struct string_list l = STRING_LIST_INIT_DUP;
@@ -312,15 +312,19 @@ static int parse_color_moved_ws(const char *arg)
ret |= XDF_IGNORE_WHITESPACE;
else if (!strcmp(sb.buf, "allow-indentation-change"))
ret |= COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE;
-   else
+   else {
+   ret |= COLOR_MOVED_WS_ERROR;
error(_("ignoring unknown color-moved-ws mode '%s'"), 
sb.buf);
+   }
 
strbuf_release();
}
 
if ((ret & COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) &&
-   (ret & XDF_WHITESPACE_FLAGS))
-   die(_("color-moved-ws: allow-indentation-change cannot be 
combined with other white space modes"));
+   (ret & XDF_WHITESPACE_FLAGS)) {
+   error(_("color-moved-ws: allow-indentation-change cannot be 
combined with other white space modes"));
+   ret |= COLOR_MOVED_WS_ERROR;
+   }
 
string_list_clear(, 0);
 
@@ -341,8 +345,8 @@ int git_diff_ui_config(const char *var, const char *value, 
void *cb)
return 0;
}
if (!strcmp(var, "diff.colormovedws")) {
-   int cm = parse_color_moved_ws(value);
-   if (cm < 0)
+   unsigned cm = parse_color_moved_ws(value);
+   if (cm & COLOR_MOVED_WS_ERROR)
return -1;
diff_color_moved_ws_default = cm;
return 0;
@@ -5035,7 +5039,10 @@ int diff_opt_parse(struct diff_options *options,
die("bad --color-moved argument: %s", arg);
options->color_moved = cm;
} else if (skip_prefix(arg, "--color-moved-ws=", )) {
-   options->color_moved_ws_handling = parse_color_moved_ws(arg);
+   unsigned cm = parse_color_moved_ws(arg);
+   if (cm & COLOR_MOVED_WS_ERROR)
+   die("bad --color-moved-ws argument: %s", arg);
+   options->color_moved_ws_handling = cm;
} else if (skip_to_optional_arg_default(arg, "--color-words", 
>word_regex, NULL)) {
options->use_color = 1;
options->word_diff = DIFF_WORDS_COLOR;
diff --git a/diff.h b/diff.h
index ce5e8a8183..9e8061ca29 100644
--- a/diff.h
+++ b/diff.h
@@ -225,7 +225,8 @@ struct diff_options {
 
/* XDF_WHITESPACE_FLAGS regarding block detection are set at 2, 3, 4 */
#define COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE (1<<5)
-   int color_moved_ws_handling;
+   #define COLOR_MOVED_WS_ERROR (1<<0)
+   unsigned color_moved_ws_handling;
 
struct repository *repo;
 };
-- 
2.19.1.930.g4563a0d9d0-goog

Re: Using --word-diff breaks --color-moved

2018-11-02 Thread Stefan Beller

On Thu, Nov 1, 2018 at 6:19 PM james harvey  wrote:

> > This sounds like you are asking for two things:
> > (1) make color-moved work with words (somehow)
> > (2) allow the user to fine tune the heuristics for a block,
> > such that default=zebra would still work.
>
> I was asking for #1.

I currently have no time to look into that,
but you're welcome to do so. :-)
I'd be happy to review the patches!

Thanks,
Stefan

Re: [PATCH 3/9] diff: avoid generating unused hunk header lines

2018-11-02 Thread Jeff King

On Fri, Nov 02, 2018 at 12:50:05PM -0700, Stefan Beller wrote:

> > +/*
> > + * Can be used as a no-op hunk_fn for xdi_diff_outf(), since a NULL
> > + * one just sends the hunk line to the line_fn callback).
> > + */
> > +void discard_hunk_line(void *, long, long, long, long, const char *, long);
> 
> Recently we had the discussion on style and naming things.
> On the one hand I don't know what these 4 different longs do,
> so I'd wish for some descriptive variable names in here.
> On the other hand the docs explain clearly why I don't need
> to care (a no-op ignores all of the parameters, no need
> to take care of their order)

Right, I actually did have the same thought while writing it. And ended
up following that line of reasoning (since it's just a placeholder, it
doesn't matter).  But I'm not opposed to putting in the names.

> So to revive that discussion, I would strongly prefer
> to have *some* names there, for the sake of a
> simply described coding style without many exceptions
> (especially those exceptions that rely on judgement).

Fair enough.

Squashable patch is below; it goes on 34c829621e (diff: avoid generating
unused hunk header lines, 2018-11-02).

Junio, let me know if you'd prefer a re-send of the series, but it
doesn't look necessary otherwise (so far).

> Apart from that, I read the whole series, and found
> it a pleasant read.

Thanks!

diff --git a/xdiff-interface.h b/xdiff-interface.h
index 7b0ccbdd1d..8af245eed9 100644
--- a/xdiff-interface.h
+++ b/xdiff-interface.h
@@ -39,7 +39,9 @@ extern int git_xmerge_style;
  * Can be used as a no-op hunk_fn for xdi_diff_outf(), since a NULL
  * one just sends the hunk line to the line_fn callback).
  */
-void discard_hunk_line(void *, long, long, long, long, const char *, long);
+void discard_hunk_line(void *priv,
+  long ob, long on, long nb, long nn,
+  const char *func, long funclen);
 
 /*
  * Compare the strings l1 with l2 which are of size s1 and s2 respectively.

-Peff

Re: [PATCH v4 00/10] Improve path collision conflict resolutions

2018-11-02 Thread Elijah Newren

On Fri, Nov 2, 2018 at 12:09 PM Derrick Stolee  wrote:
>
> On 11/2/2018 2:53 PM, Elijah Newren wrote:
> > Major question:
> >* You'll note that I edited the last two patches to mark them as RFC.
> >  To be honest, I'm not sure what to do with these.  They improve code
> >  coverage of new code, but the same gaps existed in the old code;
> >  they only show up in the coverage-diff because I essentially moved
> >  code around to a new improved function.  Since the new code doesn't
> >  really add new abilities but rather just shifts the handling of
> >  these conflicts to a common function, they shouldn't need any more
> >  testcases than previously and modifying the existing patches thus
> >  feels slightly misleading.  That line of thought leads me to believe
> >  that perhaps putting them in a separate combined patch of their own
> >  with a decent commit message is the right way to go.  On the other
> >  hand, squashing them to the commits they're marked as fixups for
> >  shows which logical part of the code the tests are related to, which
> >  seems like a good thing.  So what's the right way to handle these?
>
> I appreciate the effort you made to improve test coverage! It's
> unfortunate that this portion wasn't covered earlier, because we could
> have broken it and not noticed until a release.

Yes, I agree...except that I think we might not have noticed until a
couple releases down the road; these things tend to not come up a lot
in practice.  (Which may make it even more important to pay attention
to code coverage.)

> I think making them separate commits is fine, and the comment on the
> test case is helpful. The fact that you only had to change the commit
> timestamps in order to get the coverage makes me reexamine the code and
> realize that maybe the "right" thing to do is to reduce our code clones.
> (This is also how I was looking at the wrong block of the patch when
> talking about it not being covered.) I'll look at the patch and see if I
> can contribute a concrete code suggestion there.

Yeah, I had the same feeling, _again_, while re-looking at the tests
and code as well.  I think the history of how we got here goes
something like this:

  * there is some fairly simple code to handle these rename/rename
(1to2 and 2to1) cases, with logic for handling each side being a
neary-copy of each other.
  * someone does some analysis about trying to remove duplication and
notes that there are 3-4 pieces that change; adding logic to change
out those pieces and rewrite it as a loop adds some complexity, which
isn't worth it given the simple code
  * additional issues are discovered, such as D/F conflicts or
inappropriate handling of untracked or dirty files, and due to
merge-recursive's bad design[1], the fixes have to be sprinkled
*everywhere* throughout the whole code base.  Lather, rinse, repeat a
few times.
  * Now, although the original analysis of removing the duplication
was correct given the amount of code that exited at the time, the
weights have changed as new code was added to both codepaths.  But the
original analysis is long since forgotten, the code is more complex,
and we have to think about whether fixing it now is worth it if we're
going to rewrite it all anyway to fix that fundamental design flaw[2].

[1] https://public-inbox.org/git/xmqqd147kpdm@gitster.mtv.corp.google.com/
[2] https://public-inbox.org/git/xmqqk1ydkbx0@gitster.mtv.corp.google.com/

> Aside: I hope that I am not being annoying by poking around with the
> test coverage reports. It does give me a way to target my review
> efforts, especially into changes that touch areas outside my expertise
> (like this one).

Not annoying at all; I think it's a very valuable thing you're doing.
And I think these tests make things better (there have definitely been
cases in the past where a merge one way would work and a merge the
other way would have funny bugs).  I was just unsure about what the
best way to present it amongst the other changes was.

[PATCH] merge-recursive: combine error handling

2018-11-02 Thread Derrick Stolee

In handle_rename_rename_1to2(), we have duplicated error handling
around colliding paths. Specifically, when we want to write out
the file and there is a directory or untracked file in the way,
we need to create a temporary file to hold the contents. This has
some special output to alert the user, and this output is
duplicated for each side of the conflict.

Simplify the call by generating this new path in a helper
function.

Signed-off-by: Derrick Stolee 
---

Elijah,

Here is a patch that combines the logic to avoid code clones, and also
more easily covers code blocks. Of course, your additional test helps
the branch coverage.

Thanks,
-Stolee

 merge-recursive.c | 53 ---
 1 file changed, 27 insertions(+), 26 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 5986b6..5e36bef162 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1709,6 +1709,27 @@ static int handle_rename_add(struct merge_options *o,
 ci->dst_entry1->stages[other_stage].mode);
 }
 
+static char *find_path_for_conflict(struct merge_options *o,
+   const char *path,
+   const char *branch1,
+   const char *branch2)
+{
+   char *new_path = NULL;
+   if (dir_in_way(path, !o->call_depth, 0)) {
+   new_path = unique_path(o, path, branch1);
+   output(o, 1, _("%s is a directory in %s adding "
+  "as %s instead"),
+  path, branch2, new_path);
+   } else if (would_lose_untracked(path)) {
+   new_path = unique_path(o, path, branch1);
+   output(o, 1, _("Refusing to lose untracked file"
+  " at %s; adding as %s instead"),
+  path, new_path);
+   }
+
+   return new_path;
+}
+
 static int handle_rename_rename_1to2(struct merge_options *o,
 struct rename_conflict_info *ci)
 {
@@ -1783,19 +1804,9 @@ static int handle_rename_rename_1to2(struct 
merge_options *o,
  >oid, add->mode) < 0)
return -1;
} else {
-   char *new_path = NULL;
-   if (dir_in_way(a->path, !o->call_depth, 0)) {
-   new_path = unique_path(o, a->path, ci->branch1);
-   output(o, 1, _("%s is a directory in %s adding "
-  "as %s instead"),
-  a->path, ci->branch2, new_path);
-   } else if (would_lose_untracked(a->path)) {
-   new_path = unique_path(o, a->path, ci->branch1);
-   output(o, 1, _("Refusing to lose untracked file"
-  " at %s; adding as %s instead"),
-  a->path, new_path);
-   }
-
+   char *new_path = find_path_for_conflict(o, a->path,
+   ci->branch1,
+   ci->branch2);
if (update_file(o, 0, , mfi.mode, new_path ? 
new_path : a->path))
return -1;
free(new_path);
@@ -1812,19 +1823,9 @@ static int handle_rename_rename_1to2(struct 
merge_options *o,
  , mfi.mode) < 0)
return -1;
} else {
-   char *new_path = NULL;
-   if (dir_in_way(b->path, !o->call_depth, 0)) {
-   new_path = unique_path(o, b->path, ci->branch2);
-   output(o, 1, _("%s is a directory in %s adding "
-  "as %s instead"),
-  b->path, ci->branch1, new_path);
-   } else if (would_lose_untracked(b->path)) {
-   new_path = unique_path(o, b->path, ci->branch2);
-   output(o, 1, _("Refusing to lose untracked file"
-  " at %s; adding as %s instead"),
-  b->path, new_path);
-   }
-
+   char *new_path = find_path_for_conflict(o, b->path,
+   ci->branch2,
+   ci->branch1);
if (update_file(o, 0, , mfi.mode, new_path ? 
new_path : b->path))
return -1;
free(new_path);
-- 
2.19.1.1235.gbda564b1a2

Re: [PATCH 3/9] diff: avoid generating unused hunk header lines

2018-11-02 Thread Stefan Beller

> +/*
> + * Can be used as a no-op hunk_fn for xdi_diff_outf(), since a NULL
> + * one just sends the hunk line to the line_fn callback).
> + */
> +void discard_hunk_line(void *, long, long, long, long, const char *, long);

Recently we had the discussion on style and naming things.
On the one hand I don't know what these 4 different longs do,
so I'd wish for some descriptive variable names in here.
On the other hand the docs explain clearly why I don't need
to care (a no-op ignores all of the parameters, no need
to take care of their order)

So to revive that discussion, I would strongly prefer
to have *some* names there, for the sake of a
simply described coding style without many exceptions
(especially those exceptions that rely on judgement).

Today I read [1], which describes the analog in the
mechanical world: To evolve and have more impact
you need tighter requirements on some parts. And
I would roughly translate that to our use case as
not having to worry about style (it's ironic I even type
out this email... if we could just run clang format or
some other tightly controlling formatter/linter, I'd be
much happier as our focus should be elsewhere,
such as UX or performance).

Apart from that, I read the whole series, and found
it a pleasant read.

Thanks,
Stefan

[1] 
https://www.nybooks.com/articles/2018/10/25/precision-accuracy-perfectionism/

Re: [PATCH v4 00/10] Improve path collision conflict resolutions

2018-11-02 Thread Derrick Stolee


On 11/2/2018 2:53 PM, Elijah Newren wrote:

Major question:
   * You'll note that I edited the last two patches to mark them as RFC.
 To be honest, I'm not sure what to do with these.  They improve code
 coverage of new code, but the same gaps existed in the old code;
 they only show up in the coverage-diff because I essentially moved
 code around to a new improved function.  Since the new code doesn't
 really add new abilities but rather just shifts the handling of
 these conflicts to a common function, they shouldn't need any more
 testcases than previously and modifying the existing patches thus
 feels slightly misleading.  That line of thought leads me to believe
 that perhaps putting them in a separate combined patch of their own
 with a decent commit message is the right way to go.  On the other
 hand, squashing them to the commits they're marked as fixups for
 shows which logical part of the code the tests are related to, which
 seems like a good thing.  So what's the right way to handle these?


I appreciate the effort you made to improve test coverage! It's 
unfortunate that this portion wasn't covered earlier, because we could 
have broken it and not noticed until a release.


I think making them separate commits is fine, and the comment on the 
test case is helpful. The fact that you only had to change the commit 
timestamps in order to get the coverage makes me reexamine the code and 
realize that maybe the "right" thing to do is to reduce our code clones. 
(This is also how I was looking at the wrong block of the patch when 
talking about it not being covered.) I'll look at the patch and see if I 
can contribute a concrete code suggestion there.


Aside: I hope that I am not being annoying by poking around with the 
test coverage reports. It does give me a way to target my review 
efforts, especially into changes that touch areas outside my expertise 
(like this one).


Thanks,

-Stolee

[PATCH v4 10/10] fixup! merge-recursive: improve rename/rename(1to2)/add[/add] handling

2018-11-02 Thread Elijah Newren

NOTE: This test added solely to improve code coverage of new code added in
this series.

Signed-off-by: Elijah Newren 
---
 t/t6043-merge-rename-directories.sh | 37 +
 1 file changed, 37 insertions(+)

diff --git a/t/t6043-merge-rename-directories.sh 
b/t/t6043-merge-rename-directories.sh
index 5c01a0c14a..62c564707b 100755
--- a/t/t6043-merge-rename-directories.sh
+++ b/t/t6043-merge-rename-directories.sh
@@ -3163,6 +3163,43 @@ test_expect_success '10c-check: Overwrite untracked with 
dir rename/rename(1to2)
)
 '
 
+test_expect_success '10c-check: Overwrite untracked with dir 
rename/rename(1to2), other direction' '
+   (
+   cd 10c &&
+
+   git reset --hard &&
+   git clean -fdqx &&
+
+   git checkout B^0 &&
+   mkdir y &&
+   echo important >y/c &&
+
+   test_must_fail git merge -s recursive A^0 >out 2>err &&
+   test_i18ngrep "CONFLICT (rename/rename)" out &&
+   test_i18ngrep "Refusing to lose untracked file at y/c; adding 
as y/c~HEAD instead" out &&
+
+   git ls-files -s >out &&
+   test_line_count = 6 out &&
+   git ls-files -u >out &&
+   test_line_count = 3 out &&
+   git ls-files -o >out &&
+   test_line_count = 3 out &&
+
+   git rev-parse >actual \
+   :0:y/a :0:y/b :0:x/d :1:x/c :3:w/c :2:y/c &&
+   git rev-parse >expect \
+O:z/a  O:z/b  O:x/d  O:x/c  O:x/c  O:x/c &&
+   test_cmp expect actual &&
+
+   git hash-object y/c~HEAD >actual &&
+   git rev-parse O:x/c >expect &&
+   test_cmp expect actual &&
+
+   echo important >expect &&
+   test_cmp expect y/c
+   )
+'
+
 # Testcase 10d, Delete untracked w/ dir rename/rename(2to1)
 #   Commit O: z/{a,b,c_1},x/{d,e,f_2}
 #   Commit A: y/{a,b},x/{d,e,f_2,wham_1} + untracked y/wham
-- 
2.19.0.232.gd14c2061fc

Re: submodule support in git-bundle

2018-11-02 Thread Stefan Beller

> > This offloading-to-CDN (or "mostly resumable clone" in the
> > sense that the communication with the server is minimal, and
> > you get most of your data via resumable http range-requests)
> > sounds like complete offtopic, but is one of the requirements
> > for the repo to submodule migration, hence I came to speak of it.
>
> Hm.. so what you're saying is, we could have a pack file that lists
> other (real) pack files and for the bundle case they are all in the
> same file. And "download from $THERE" in this case is "download at
> this file offset"? That might actually work.

We're conflating 2 things here.
This idea of CDN offloading has nothing to do with submodules, it's
just a general thing to improve the fetch protocol.
And the pointed at file doesn't need to be a "real" packfile, as long
as the bytestream at the end looks like a real packfile. For example
the bytes to get from $THERE would not need to have a pack header
(or if it had, I would ask you to omit the first bytes containing the header)
as I can give the header myself.

The idea for submodules is more along the lines of having "just"
multiple pack files in the stream. For the bundle case we would
probably not have redirection to $THERE in there, as it should
be self contained completely (we don't know if the bundle recipient
can access $THERE in a timely manner).


> > Did you have other things in mind, on a higher level?
> > e.g. querying the bundle and creating submodule bundles
> > based off the superproject bundle? 'git bundle create' could
> > learn the --recurse-submodules option, which then produces
> > multiple bundle files without changing the file formats.
>
> This is probably the simplest way to support submodules.

Yep, that sounds simplest, but I think it makes for bad UX.
(Multiple files, need to be kept in some order and applied correctly)


> I just
> haven't really thought much about it (the problem just came up to me
> like 2 hours ago). Two problems with this are convenience (I don't
> want to handle multiple files) and submodule info (which pack should
> be unbundled on which submodule?). But I suppose if "git bundle"
> produces a tarball of these bundle files then you solve both.

The tarball makes it one file and would naturally provide some
order. It feels iffy, I'd rather have multiple packs in the bundle.

> But of course there may be other and better options like what you
> described above. If in long term we have "pack with hyperlinks" anyway
> for resumable clone and other fancy stuff then reusing the same
> mechanism for bundles makes sense, less maintenance burden.

I think of the hyperlinks in packs as an orthogonal feature, but closely
nearby in code and implementation, which is why I brought it up.

[PATCH v4 07/10] merge-recursive: use handle_file_collision for add/add conflicts

2018-11-02 Thread Elijah Newren

This results in no-net change of behavior, it simply ensures that all
file-collision conflict handling types are being handled the same by
calling the same function.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c | 29 +
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index ead6054a75..c78b347112 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -3355,14 +3355,27 @@ static int process_entry(struct merge_options *o,
clean_merge = -1;
}
} else if (a_oid && b_oid) {
-   /* Case C: Added in both (check for same permissions) and */
-   /* case D: Modified in both, but differently. */
-   int is_dirty = 0; /* unpack_trees would have bailed if dirty */
-   clean_merge = handle_content_merge(o, path, is_dirty,
-  o_oid, o_mode,
-  a_oid, a_mode,
-  b_oid, b_mode,
-  NULL);
+   if (!o_oid) {
+   /* Case C: Added in both (check for same permissions) */
+   output(o, 1,
+  _("CONFLICT (add/add): Merge conflict in %s"),
+  path);
+   clean_merge = handle_file_collision(o,
+   path, NULL, NULL,
+   o->branch1,
+   o->branch2,
+   a_oid, a_mode,
+   b_oid, b_mode);
+   } else {
+   /* case D: Modified in both, but differently. */
+   int is_dirty = 0; /* unpack_trees would have bailed if 
dirty */
+   clean_merge = handle_content_merge(o, path,
+  is_dirty,
+  o_oid, o_mode,
+  a_oid, a_mode,
+  b_oid, b_mode,
+  NULL);
+   }
} else if (!o_oid && !a_oid && !b_oid) {
/*
 * this entry was deleted altogether. a_mode == 0 means
-- 
2.19.0.232.gd14c2061fc

[PATCH v4 04/10] merge-recursive: new function for better colliding conflict resolutions

2018-11-02 Thread Elijah Newren

There are three conflict types that represent two (possibly entirely
unrelated) files colliding at the same location:
  * add/add
  * rename/add
  * rename/rename(2to1)

These three conflict types already share more similarity than might be
immediately apparent from their description: (1) the handling of the
rename variants already involves removing any entries from the index
corresponding to the original file names[*], thus only leaving entries
in the index for the colliding path; (2) likewise, any trace of the
original file name in the working tree is also removed.  So, in all
three cases we're left with how to represent two colliding files in both
the index and the working copy.

[*] Technically, this isn't quite true because rename/rename(2to1)
conflicts in the recursive (o->call_depth > 0) case do an "unrename"
since about seven years ago.  But even in that case, Junio felt
compelled to explain that my decision to "unrename" wasn't necessarily
the only or right answer -- search for "Comment from Junio" in t6036 for
details.

My initial motivation for looking at these three conflict types was that
if the handling of these three conflict types is the same, at least in
the limited set of cases where a renamed file is unmodified on the side
of history where the file is not renamed, then a significant performance
improvement for rename detection during merges is possible.  However,
while that served as motivation to look at these three types of
conflicts, the actual goal of this new function is to try to improve the
handling for all three cases, not to merely make them the same as each
other in that special circumstance.

=== Handling the working tree ===

The previous behavior for these conflict types in regards to the
working tree (assuming the file collision occurs at 'foo') was:
  * add/add does a two-way merge of the two files and records it as 'foo'.
  * rename/rename(2to1) records the two different files into two new
uniquely named files (foo~HEAD and foo~$MERGE), while removing 'foo'
from the working tree.
  * rename/add records the two different files into two different
locations, recording the add at foo~$SIDE and, oddly, recording
the rename at foo (why is the rename more important than the add?)

So, the question for what to write to the working tree boils down to
whether the two colliding files should be two-way merged and recorded in
place, or recorded into separate files.  As per discussion on the git
mailing lit, two-way merging was deemed to always be preferred, as that
makes these cases all more like content conflicts that users can handle
from within their favorite editor, IDE, or merge tool.  Note that since
renames already involve a content merge, rename/add and
rename/rename(2to1) conflicts could result in nested conflict markers.

=== Handling of the index ===

For a typical rename, unpack_trees() would set up the index in the
following fashion:
   old_path  new_path
   stage1: 5ca1ab1e  
   stage2: f005ba11  
   stage3:   b0a710ad
And merge-recursive would rewrite this to
   new_path
   stage1: 5ca1ab1e
   stage2: f005ba11
   stage3: b0a710ad
Removing old_path from the index means the user won't have to `git rm
old_path` manually every time a renamed path has a content conflict.
It also means they can use `git checkout [--ours|--theirs|--conflict|-m]
new_path`, `git diff [--ours|--theirs]` and various other commands that
would be difficult otherwise.

This strategy becomes a problem when we have a rename/add or
rename/rename(2to1) conflict, however, because then we have only three
slots to store blob sha1s and we need either four or six.  Previously,
this was handled by continuing to delete old_path from the index, and
just outright ignoring any blob shas from old_path.  That had the
downside of deleting any trace of changes made to old_path on the other
side of history.  This function instead does a three-way content merge of
the renamed file, and stores the blob sha1 for that at either stage2 or
stage3 for new_path (depending on which side the rename came from).  That
has the advantage of bringing information about changes on both sides and
still allows for easy resolution (no need to git rm old_path, etc.), but
does have the downside that if the content merge had conflict markers,
then what we store in the index is the sha1 of a blob with conflict
markers.  While that is a downside, it seems less problematic than the
downsides of any obvious alternatives, and certainly makes more sense
than the previous handling.  Further, it has a precedent in that when we
do recursive merges, we may accept a file with conflict markers as the
resolution for the merge of the merge-bases, which will then show up in
the index of the outer merge at stage 1 if a conflict exists at the outer
level.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c | 121 ++
 1 file changed, 121 insertions(+)

diff

[PATCH v4 03/10] merge-recursive: increase marker length with depth of recursion

2018-11-02 Thread Elijah Newren

Later patches in this series will modify file collision conflict
handling (e.g. from rename/add and rename/rename(2to1) conflicts) so
that multiply nested conflict markers can arise even before considering
conflicts in the virtual merge base.  Including the virtual merge base
will provide a way to get triply (or higher) nested conflict markers.
This new way to get nested conflict markers will force the need for a
more general mechanism to extend the length of conflict markers in order
to differentiate between different nestings.

Along with this change to conflict marker length handling, we want to
make sure that we don't regress handling for other types of conflicts
with nested conflict markers.  Add a more involved testcase using
merge.conflictstyle=diff3, where not only does the virtual merge base
contain conflicts, but its virtual merge base does as well (i.e. a case
with triply nested conflict markers).  While there are multiple
reasonable ways to handle nested conflict markers in the virtual merge
base for this type of situation, the easiest approach that dovetails
well with the new needs for the file collision conflict handling is to
require that the length of the conflict markers increase with each
subsequent nesting.

Subsequent patches which change the rename/add and rename/rename(2to1)
conflict handling will modify the extra_marker_size flag appropriately
for their new needs.

Signed-off-by: Elijah Newren 
---
 ll-merge.c|   4 +-
 ll-merge.h|   1 +
 merge-recursive.c |  25 +++--
 t/t6036-recursive-corner-cases.sh | 151 ++
 4 files changed, 172 insertions(+), 9 deletions(-)

diff --git a/ll-merge.c b/ll-merge.c
index 0e2800f7bb..aabc1b5c2e 100644
--- a/ll-merge.c
+++ b/ll-merge.c
@@ -384,7 +384,9 @@ int ll_merge(mmbuffer_t *result_buf,
if (opts->virtual_ancestor) {
if (driver->recursive)
driver = find_ll_merge_driver(driver->recursive);
-   marker_size += 2;
+   }
+   if (opts->extra_marker_size) {
+   marker_size += opts->extra_marker_size;
}
return driver->fn(driver, result_buf, path, ancestor, ancestor_label,
  ours, our_label, theirs, their_label,
diff --git a/ll-merge.h b/ll-merge.h
index b72b19921e..5b4e158502 100644
--- a/ll-merge.h
+++ b/ll-merge.h
@@ -11,6 +11,7 @@ struct ll_merge_options {
unsigned virtual_ancestor : 1;
unsigned variant : 2;   /* favor ours, favor theirs, or union merge */
unsigned renormalize : 1;
+   unsigned extra_marker_size;
long xdl_opts;
 };
 
diff --git a/merge-recursive.c b/merge-recursive.c
index 73b5710386..f795c92a69 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1058,7 +1058,8 @@ static int merge_3way(struct merge_options *o,
  const struct diff_filespec *a,
  const struct diff_filespec *b,
  const char *branch1,
- const char *branch2)
+ const char *branch2,
+ const int extra_marker_size)
 {
mmfile_t orig, src1, src2;
struct ll_merge_options ll_opts = {0};
@@ -1066,6 +1067,7 @@ static int merge_3way(struct merge_options *o,
int merge_status;
 
ll_opts.renormalize = o->renormalize;
+   ll_opts.extra_marker_size = extra_marker_size;
ll_opts.xdl_opts = o->xdl_opts;
 
if (o->call_depth) {
@@ -1300,6 +1302,7 @@ static int merge_mode_and_contents(struct merge_options 
*o,
   const char *filename,
   const char *branch1,
   const char *branch2,
+  const int extra_marker_size,
   struct merge_file_info *result)
 {
if (o->branch1 != branch1) {
@@ -1310,7 +1313,8 @@ static int merge_mode_and_contents(struct merge_options 
*o,
 */
return merge_mode_and_contents(o, one, b, a,
   filename,
-  branch2, branch1, result);
+  branch2, branch1,
+  extra_marker_size, result);
}
 
result->merge = 0;
@@ -1351,7 +1355,8 @@ static int merge_mode_and_contents(struct merge_options 
*o,
int ret = 0, merge_status;
 
merge_status = merge_3way(o, _buf, one, a, b,
- branch1, branch2);
+ branch1, branch2,
+ extra_marker_size);
 
if ((merge_status < 0) || !result_buf.ptr)
ret = err(o, _("Failed to execute internal 
merge"));
@@ -1640,7

[PATCH v4 01/10] Add testcases for consistency in file collision conflict handling

2018-11-02 Thread Elijah Newren

Add testcases dealing with file collisions for the following types of
conflicts:
  * add/add
  * rename/add
  * rename/rename(2to1)

All these conflict types simplify down to two files "colliding"
and should thus be handled similarly.  This means that rename/add and
rename/rename(2to1) conflicts need to be modified to behave the same as
add/add conflicts currently do: the colliding files should be two-way
merged (instead of the current behavior of writing the two colliding
files out to separate temporary unique pathnames).  Add testcases which
check this; subsequent commits will fix the conflict handling to make
these tests pass.

Signed-off-by: Elijah Newren 
---
 t/t6042-merge-rename-corner-cases.sh | 162 +++
 1 file changed, 162 insertions(+)

diff --git a/t/t6042-merge-rename-corner-cases.sh 
b/t/t6042-merge-rename-corner-cases.sh
index b97aca7fa2..b6fed2cb9a 100755
--- a/t/t6042-merge-rename-corner-cases.sh
+++ b/t/t6042-merge-rename-corner-cases.sh
@@ -937,4 +937,166 @@ test_expect_failure 'mod6-check: chains of 
rename/rename(1to2) and rename/rename
)
 '
 
+test_conflicts_with_adds_and_renames() {
+   sideL=$1
+   sideR=$2
+   expect=$3
+
+   # Setup:
+   #  L
+   # / \
+   #   master   ?
+   # \ /
+   #  R
+   #
+   # Where:
+   #   Both L and R have files named 'three' which collide.  Each of
+   #   the colliding files could have been involved in a rename, in
+   #   which case there was a file named 'one' or 'two' that was
+   #   modified on the opposite side of history and renamed into the
+   #   collision on this side of history.
+   #
+   # Questions:
+   #   1) The index should contain both a stage 2 and stage 3 entry
+   #  for the colliding file.  Does it?
+   #   2) When renames are involved, the content merges are clean, so
+   #  the index should reflect the content merges, not merely the
+   #  version of the colliding file from the prior commit.  Does
+   #  it?
+   #   3) There should be a file in the worktree named 'three'
+   #  containing the two-way merged contents of the content-merged
+   #  versions of 'three' from each of the two colliding
+   #  files.  Is it present?
+   #   4) There should not be any three~* files in the working
+   #  tree
+   test_expect_success "setup simple $sideL/$sideR conflict" '
+   test_create_repo simple_${sideL}_${sideR} &&
+   (
+   cd simple_${sideL}_${sideR} &&
+
+   # Create some related files now
+   for i in $(test_seq 1 10)
+   do
+   echo Random base content line $i
+   done >file_v1 &&
+   cp file_v1 file_v2 &&
+   echo modification >>file_v2 &&
+
+   cp file_v1 file_v3 &&
+   echo more stuff >>file_v3 &&
+   cp file_v3 file_v4 &&
+   echo yet more stuff >>file_v4 &&
+
+   # Use a tag to record both these files for simple
+   # access, and clean out these untracked files
+   git tag file_v1 $(git hash-object -w file_v1) &&
+   git tag file_v2 $(git hash-object -w file_v2) &&
+   git tag file_v3 $(git hash-object -w file_v3) &&
+   git tag file_v4 $(git hash-object -w file_v4) &&
+   git clean -f &&
+
+   # Setup original commit (or merge-base), consisting of
+   # files named "one" and "two" if renames were involved.
+   touch irrelevant_file &&
+   git add irrelevant_file &&
+   if [ $sideL = "rename" ]
+   then
+   git show file_v1 >one &&
+   git add one
+   fi &&
+   if [ $sideR = "rename" ]
+   then
+   git show file_v3 >two &&
+   git add two
+   fi &&
+   test_tick && git commit -m initial &&
+
+   git branch L &&
+   git branch R &&
+
+   # Handle the left side
+   git checkout L &&
+   if [ $sideL = "rename" ]
+   then
+   git mv one three
+   else
+   git show file_v2 >three &&
+   git add three
+   fi &&
+   if [ $sideR = "rename" ]
+   then
+   git show file_v4 >two &&
+

[PATCH v4 06/10] merge-recursive: improve handling for rename/rename(2to1) conflicts

2018-11-02 Thread Elijah Newren

This makes the rename/rename(2to1) conflicts use the new
handle_file_collision() function.  Since that function was based
originally on the rename/rename(2to1) handling code, the main
differences here are in what was added.  In particular:

  * Instead of storing files at collide_path~HEAD and collide_path~MERGE,
the files are two-way merged and recorded at collide_path.

  * Instead of recording the version of the renamed file that existed
on the renamed side in the index (thus ignoring any changes that
were made to the file on the side of history without the rename),
we do a three-way content merge on the renamed path, then store
that at either stage 2 or stage 3.

  * Note that since the content merge for each rename may have conflicts,
and then we have to merge the two renamed files, we can end up with
nested conflict markers.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c| 104 ---
 t/t6036-recursive-corner-cases.sh|  12 +---
 t/t6042-merge-rename-corner-cases.sh |  38 ++
 t/t6043-merge-rename-directories.sh  |  83 -
 4 files changed, 89 insertions(+), 148 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 0805095168..ead6054a75 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -696,27 +696,6 @@ static int update_stages(struct merge_options *opt, const 
char *path,
return 0;
 }
 
-static int update_stages_for_stage_data(struct merge_options *opt,
-   const char *path,
-   const struct stage_data *stage_data)
-{
-   struct diff_filespec o, a, b;
-
-   o.mode = stage_data->stages[1].mode;
-   oidcpy(, _data->stages[1].oid);
-
-   a.mode = stage_data->stages[2].mode;
-   oidcpy(, _data->stages[2].oid);
-
-   b.mode = stage_data->stages[3].mode;
-   oidcpy(, _data->stages[3].oid);
-
-   return update_stages(opt, path,
-is_null_oid() ? NULL : ,
-is_null_oid() ? NULL : ,
-is_null_oid() ? NULL : );
-}
-
 static void update_entry(struct stage_data *entry,
 struct diff_filespec *o,
 struct diff_filespec *a,
@@ -1870,7 +1849,6 @@ static int handle_rename_rename_2to1(struct merge_options 
*o,
char *path_side_2_desc;
struct merge_file_info mfi_c1;
struct merge_file_info mfi_c2;
-   int ret;
 
output(o, 1, _("CONFLICT (rename/rename): "
   "Rename %s->%s in %s. "
@@ -1878,81 +1856,22 @@ static int handle_rename_rename_2to1(struct 
merge_options *o,
   a->path, c1->path, ci->branch1,
   b->path, c2->path, ci->branch2);
 
-   remove_file(o, 1, a->path, o->call_depth || 
would_lose_untracked(a->path));
-   remove_file(o, 1, b->path, o->call_depth || 
would_lose_untracked(b->path));
-
path_side_1_desc = xstrfmt("version of %s from %s", path, a->path);
path_side_2_desc = xstrfmt("version of %s from %s", path, b->path);
if (merge_mode_and_contents(o, a, c1, >ren1_other, path_side_1_desc,
o->branch1, o->branch2,
-   o->call_depth * 2, _c1) ||
+   1 + o->call_depth * 2, _c1) ||
merge_mode_and_contents(o, b, >ren2_other, c2, path_side_2_desc,
o->branch1, o->branch2,
-   o->call_depth * 2, _c2))
+   1 + o->call_depth * 2, _c2))
return -1;
free(path_side_1_desc);
free(path_side_2_desc);
 
-   if (o->call_depth) {
-   /*
-* If mfi_c1.clean && mfi_c2.clean, then it might make
-* sense to do a two-way merge of those results.  But, I
-* think in all cases, it makes sense to have the virtual
-* merge base just undo the renames; they can be detected
-* again later for the non-recursive merge.
-*/
-   remove_file(o, 0, path, 0);
-   ret = update_file(o, 0, _c1.oid, mfi_c1.mode, a->path);
-   if (!ret)
-   ret = update_file(o, 0, _c2.oid, mfi_c2.mode,
- b->path);
-   } else {
-   char *new_path1 = unique_path(o, path, ci->branch1);
-   char *new_path2 = unique_path(o, path, ci->branch2);
-   output(o, 1, _("Renaming %s to %s and %s to %s instead"),
-  a->path, new_path1, b->path, new_path2);
-   if (was_dirty(o, path))
-   output(o, 1, _("Refusing to lose dirty file at %s"),
-  path);
-   else if (would_lose_untracked(path))
-   /*
-* Only

[RFC PATCH v4 09/10] fixup! merge-recursive: fix rename/add conflict handling

2018-11-02 Thread Elijah Newren

NOTE: This test added solely to improve code coverage of new code added in
this series.

Signed-off-by: Elijah Newren 
---
 t/t6036-recursive-corner-cases.sh | 51 +++
 1 file changed, 51 insertions(+)

diff --git a/t/t6036-recursive-corner-cases.sh 
b/t/t6036-recursive-corner-cases.sh
index 99cddb05af..b7488b00c0 100755
--- a/t/t6036-recursive-corner-cases.sh
+++ b/t/t6036-recursive-corner-cases.sh
@@ -240,6 +240,57 @@ test_expect_success 'git detects differently handled 
merges conflict' '
)
 '
 
+# Repeat the above testcase with precisely the same setup, other than with
+# the two merge bases having different orderings of commit timestamps so
+# that they are reversed in the order they are provided to merge-recursive,
+# so that we can improve code coverage.
+test_expect_success 'git detects differently handled merges conflict, swapped' 
'
+   (
+   cd rename-add &&
+
+   # Difference #1: Do cleanup from previous testrun
+   git reset --hard &&
+   git clean -fdqx &&
+
+   # Difference #2: Change commit timestamps
+   btime=$(git log --no-walk --date=raw --format=%cd B | awk 
"{print \$1}") &&
+   ctime=$(git log --no-walk --date=raw --format=%cd C | awk 
"{print \$1}") &&
+   newctime=$(($btime+1)) &&
+   git fast-export --no-data --all | sed -e s/$ctime/$newctime/ | 
git fast-import --force --quiet &&
+   # End of differences; rest is copy-paste of last test
+
+   git checkout D^0 &&
+   test_must_fail git merge -s recursive E^0 &&
+
+   git ls-files -s >out &&
+   test_line_count = 3 out &&
+   git ls-files -u >out &&
+   test_line_count = 3 out &&
+   git ls-files -o >out &&
+   test_line_count = 1 out &&
+
+   git rev-parse >expect   \
+   C:new_a  D:new_a  E:new_a &&
+   git rev-parse   >actual \
+   :1:new_a :2:new_a :3:new_a &&
+   test_cmp expect actual &&
+
+   # Test that the two-way merge in new_a is as expected
+   git cat-file -p D:new_a >ours &&
+   git cat-file -p E:new_a >theirs &&
+   >empty &&
+   test_must_fail git merge-file \
+   -L "HEAD" \
+   -L "" \
+   -L "E^0" \
+   ours empty theirs &&
+   sed -e "s/^\([<=>]\)/\1\1\1/" ours >expect &&
+   git hash-object new_a >actual &&
+   git hash-object ours  >expect &&
+   test_cmp expect actual
+   )
+'
+
 #
 # criss-cross + modify/delete:
 #
-- 
2.19.0.232.gd14c2061fc

[PATCH v4 08/10] merge-recursive: improve rename/rename(1to2)/add[/add] handling

2018-11-02 Thread Elijah Newren

When we have a rename/rename(1to2) conflict, each of the renames can
collide with a file addition.  Each of these rename/add conflicts suffered
from the same kinds of problems that normal rename/add suffered from.
Make the code use handle_file_conflicts() as well so that we get all the
same fixes and consistent behavior between the different conflict types.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c| 154 +--
 t/t6042-merge-rename-corner-cases.sh |  29 +++--
 t/t6043-merge-rename-directories.sh  |  24 +++--
 3 files changed, 113 insertions(+), 94 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index c78b347112..5986b6 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1709,80 +1709,17 @@ static int handle_rename_add(struct merge_options *o,
 ci->dst_entry1->stages[other_stage].mode);
 }
 
-static int handle_file(struct merge_options *o,
-   struct diff_filespec *rename,
-   int stage,
-   struct rename_conflict_info *ci)
-{
-   char *dst_name = rename->path;
-   struct stage_data *dst_entry;
-   const char *cur_branch, *other_branch;
-   struct diff_filespec other;
-   struct diff_filespec *add;
-   int ret;
-
-   if (stage == 2) {
-   dst_entry = ci->dst_entry1;
-   cur_branch = ci->branch1;
-   other_branch = ci->branch2;
-   } else {
-   dst_entry = ci->dst_entry2;
-   cur_branch = ci->branch2;
-   other_branch = ci->branch1;
-   }
-
-   add = filespec_from_entry(, dst_entry, stage ^ 1);
-   if (add) {
-   int ren_src_was_dirty = was_dirty(o, rename->path);
-   char *add_name = unique_path(o, rename->path, other_branch);
-   if (update_file(o, 0, >oid, add->mode, add_name))
-   return -1;
-
-   if (ren_src_was_dirty) {
-   output(o, 1, _("Refusing to lose dirty file at %s"),
-  rename->path);
-   }
-   /*
-* Because the double negatives somehow keep confusing me...
-*1) update_wd iff !ren_src_was_dirty.
-*2) no_wd iff !update_wd
-*3) so, no_wd == !!ren_src_was_dirty == ren_src_was_dirty
-*/
-   remove_file(o, 0, rename->path, ren_src_was_dirty);
-   dst_name = unique_path(o, rename->path, cur_branch);
-   } else {
-   if (dir_in_way(rename->path, !o->call_depth, 0)) {
-   dst_name = unique_path(o, rename->path, cur_branch);
-   output(o, 1, _("%s is a directory in %s adding as %s 
instead"),
-  rename->path, other_branch, dst_name);
-   } else if (!o->call_depth &&
-  would_lose_untracked(rename->path)) {
-   dst_name = unique_path(o, rename->path, cur_branch);
-   output(o, 1, _("Refusing to lose untracked file at %s; "
-  "adding as %s instead"),
-  rename->path, dst_name);
-   }
-   }
-   if ((ret = update_file(o, 0, >oid, rename->mode, dst_name)))
-   ; /* fall through, do allow dst_name to be released */
-   else if (stage == 2)
-   ret = update_stages(o, rename->path, NULL, rename, add);
-   else
-   ret = update_stages(o, rename->path, NULL, add, rename);
-
-   if (dst_name != rename->path)
-   free(dst_name);
-
-   return ret;
-}
-
 static int handle_rename_rename_1to2(struct merge_options *o,
 struct rename_conflict_info *ci)
 {
/* One file was renamed in both branches, but to different names. */
+   struct merge_file_info mfi;
+   struct diff_filespec other;
+   struct diff_filespec *add;
struct diff_filespec *one = ci->pair1->one;
struct diff_filespec *a = ci->pair1->two;
struct diff_filespec *b = ci->pair2->two;
+   char *path_desc;
 
output(o, 1, _("CONFLICT (rename/rename): "
   "Rename \"%s\"->\"%s\" in branch \"%s\" "
@@ -1790,15 +1727,16 @@ static int handle_rename_rename_1to2(struct 
merge_options *o,
   one->path, a->path, ci->branch1,
   one->path, b->path, ci->branch2,
   o->call_depth ? _(" (left unresolved)") : "");
-   if (o->call_depth) {
-   struct merge_file_info mfi;
-   struct diff_filespec other;
-   struct diff_filespec *add;
-   if (merge_mode_and_contents(o, one, a, b, one->path,
-   ci->branch1, ci->branch2,
-   o->call_depth * 2, ))
-

[PATCH v4 02/10] t6036, t6042: testcases for rename collision of already conflicting files

2018-11-02 Thread Elijah Newren

When a single file is renamed, it can also be modified, yielding the
possibility of that renamed file having content conflicts.  If two
different such files are renamed into the same location, then two-way
merging those files may result in nested conflicts.  Add a testcase that
makes sure we get this case correct, and uses different lengths of
conflict markers to differentiate between the different nestings.

Also add another case with an extra (i.e. third) level of conflict
markers due to using merge.conflictstyle=diff3 and the virtual merge
base also having conflicts present.

Signed-off-by: Elijah Newren 
---
 t/t6036-recursive-corner-cases.sh| 194 +++
 t/t6042-merge-rename-corner-cases.sh | 118 
 2 files changed, 312 insertions(+)

diff --git a/t/t6036-recursive-corner-cases.sh 
b/t/t6036-recursive-corner-cases.sh
index e1cef58f2a..f229d7e47b 100755
--- a/t/t6036-recursive-corner-cases.sh
+++ b/t/t6036-recursive-corner-cases.sh
@@ -1402,4 +1402,198 @@ test_expect_failure 'check conflicting modes for 
regular file' '
)
 '
 
+# Setup:
+#  L1---L2
+# /  \ /  \
+#   masterX?
+# \  / \  /
+#  R1---R2
+#
+# Where:
+#   master has two files, named 'b' and 'a'
+#   branches L1 and R1 both modify each of the two files in conflicting ways
+#
+#   L2 is a merge of R1 into L1; more on it later.
+#   R2 is a merge of L1 into R1; more on it later.
+#
+#   X is an auto-generated merge-base used when merging L2 and R2.
+#   since X is a merge of L1 and R1, it has conflicting versions of each file
+#
+#   More about L2 and R2:
+# - both resolve the conflicts in 'b' and 'a' differently
+# - L2 renames 'b' to 'm'
+# - R2 renames 'a' to 'm'
+#
+#   In the end, in file 'm' we have four different conflicting files (from
+#   two versions of 'b' and two of 'a').  In addition, if
+#   merge.conflictstyle is diff3, then the base version also has
+#   conflict markers of its own, leading to a total of three levels of
+#   conflict markers.  This is a pretty weird corner case, but we just want
+#   to ensure that we handle it as well as practical.
+
+test_expect_success 'setup nested conflicts' '
+   test_create_repo nested_conflicts &&
+   (
+   cd nested_conflicts &&
+
+   # Create some related files now
+   for i in $(test_seq 1 10)
+   do
+   echo Random base content line $i
+   done >initial &&
+
+   cp initial b_L1 &&
+   cp initial b_R1 &&
+   cp initial b_L2 &&
+   cp initial b_R2 &&
+   cp initial a_L1 &&
+   cp initial a_R1 &&
+   cp initial a_L2 &&
+   cp initial a_R2 &&
+
+   test_write_lines b b_L1 >>b_L1 &&
+   test_write_lines b b_R1 >>b_R1 &&
+   test_write_lines b b_L2 >>b_L2 &&
+   test_write_lines b b_R2 >>b_R2 &&
+   test_write_lines a a_L1 >>a_L1 &&
+   test_write_lines a a_R1 >>a_R1 &&
+   test_write_lines a a_L2 >>a_L2 &&
+   test_write_lines a a_R2 >>a_R2 &&
+
+   # Setup original commit (or merge-base), consisting of
+   # files named "b" and "a"
+   cp initial b &&
+   cp initial a &&
+   echo b >>b &&
+   echo a >>a &&
+   git add b a &&
+   test_tick && git commit -m initial &&
+
+   git branch L &&
+   git branch R &&
+
+   # Handle the left side
+   git checkout L &&
+   mv -f b_L1 b &&
+   mv -f a_L1 a &&
+   git add b a &&
+   test_tick && git commit -m "version L1 of files" &&
+   git tag L1 &&
+
+   # Handle the right side
+   git checkout R &&
+   mv -f b_R1 b &&
+   mv -f a_R1 a &&
+   git add b a &&
+   test_tick && git commit -m "verson R1 of files" &&
+   git tag R1 &&
+
+   # Create first merge on left side
+   git checkout L &&
+   test_must_fail git merge R1 &&
+   mv -f b_L2 b &&
+   mv -f a_L2 a &&
+   git add b a &&
+   git mv b m &&
+   test_tick && git commit -m "left merge, rename b->m" &&
+   git tag L2 &&
+
+   # Create first merge on right side
+   git checkout R &&
+   test_must_fail git merge L1 &&
+   mv -f b_R2 b &&
+   mv -f a_R2 a &&
+   git add b a &&
+   git mv a m &&
+   test_tick && git commit -m "right merge, rename a->m" &&
+   git tag R2
+   )
+'
+
+test_expect_failure 'check nested conflicts' '
+   (
+   cd nested_conflicts &&
+
+   git clean -f &&
+

[PATCH v4 00/10] Improve path collision conflict resolutions

2018-11-02 Thread Elijah Newren

This series depends on en/merge-cleanup-more and is built on that
series.  (It merges cleanly to master, next, and pu -- well, as long
as v3 of this series is excluded from pu, that is).

This series makes all the "file collision" conflict types be handled
consistently; making them all behave like add/add (as suggested by
Jonathan[1] and Junio[2]).  These types are:
  * add/add
  * rename/add
  * rename/rename(2to1)
  * each rename/add piece of a rename/rename(1to2)/add[/add] conflict

[1] 
https://public-inbox.org/git/20180312213521.gb58...@aiede.svl.corp.google.com/
[2] 
https://public-inbox.org/git/capc5davu8vv9rdgon8jixeo3ycdvqq38yszzc-cpo+aqcak...@mail.gmail.com

Changes since v3:
  * Fixed test names to be surrounded by single quotes instead of double
quotes, as suggested by Derrick.
  * Two more (RFC) patches add a couple testcases to cover previously
uncovered code, as pointed out by Derrick and his test coverage report.
  * Full range-diff below.

Major question:
  * You'll note that I edited the last two patches to mark them as RFC.
To be honest, I'm not sure what to do with these.  They improve code
coverage of new code, but the same gaps existed in the old code;
they only show up in the coverage-diff because I essentially moved
code around to a new improved function.  Since the new code doesn't
really add new abilities but rather just shifts the handling of
these conflicts to a common function, they shouldn't need any more
testcases than previously and modifying the existing patches thus
feels slightly misleading.  That line of thought leads me to believe
that perhaps putting them in a separate combined patch of their own
with a decent commit message is the right way to go.  On the other
hand, squashing them to the commits they're marked as fixups for
shows which logical part of the code the tests are related to, which
seems like a good thing.  So what's the right way to handle these?


 1:  1be9e213db !  1:  0fa67d6109 t6036, t6042: testcases for rename collision 
of already conflicting files
@@ -51,7 +51,7 @@
 +#   conflict markers.  This is a pretty weird corner case, but we just 
want
 +#   to ensure that we handle it as well as practical.
 +
-+test_expect_success "setup nested conflicts" '
++test_expect_success 'setup nested conflicts' '
 +  test_create_repo nested_conflicts &&
 +  (
 +  cd nested_conflicts &&
@@ -130,7 +130,7 @@
 +  )
 +'
 +
-+test_expect_failure "check nested conflicts" '
++test_expect_failure 'check nested conflicts' '
 +  (
 +  cd nested_conflicts &&
 +
@@ -241,7 +241,7 @@
 +#
 +#   So, we have four different conflicting files that all end up at path
 +#   'three'.
-+test_expect_success "setup nested conflicts from rename/rename(2to1)" '
++test_expect_success 'setup nested conflicts from rename/rename(2to1)' '
 +  test_create_repo nested_conflicts_from_rename_rename &&
 +  (
 +  cd nested_conflicts_from_rename_rename &&
@@ -294,7 +294,7 @@
 +  )
 +'
 +
-+test_expect_failure "check nested conflicts from rename/rename(2to1)" '
++test_expect_failure 'check nested conflicts from rename/rename(2to1)' '
 +  (
 +  cd nested_conflicts_from_rename_rename &&
 +
 2:  d3356ff525 !  2:  9f5f0105d0 merge-recursive: increase marker length with 
depth of recursion
@@ -179,7 +179,7 @@
 +#   nested conflict markers from X2 in the base version -- that means we
 +#   have three levels of conflict markers.  Can we distinguish all three?
 +
-+test_expect_success "setup virtual merge base with nested conflicts" '
++test_expect_success 'setup virtual merge base with nested conflicts' '
 +  test_create_repo virtual_merge_base_has_nested_conflicts &&
 +  (
 +  cd virtual_merge_base_has_nested_conflicts &&
@@ -241,7 +241,7 @@
 +  )
 +'
 +
-+test_expect_success "check virtual merge base with nested conflicts" '
++test_expect_success 'check virtual merge base with nested conflicts' '
 +  (
 +  cd virtual_merge_base_has_nested_conflicts &&
 +
 3:  aa68e3d675 =  3:  5922c40fa7 merge-recursive: new function for better 
colliding conflict resolutions
 4:  f046ba6362 =  4:  dcf88dd363 merge-recursive: fix rename/add conflict 
handling
 5:  37742bdefd !  5:  1d11288be4 merge-recursive: improve handling for 
rename/rename(2to1) conflicts
@@ -209,8 +209,8 @@
)
  '
  
--test_expect_failure "check nested conflicts" '
-+test_expect_success "check nested conflicts" '
+-test_expect_failure 'check nested conflicts' '
++test_expect_success 'check nested conflicts' '
(
cd nested_conflicts &&
  
@@ -290,8 +290,8 @@
)
  '
  
--test_expect_failure "check nested conflicts from rename/rename(2to1)" '

[PATCH v4 05/10] merge-recursive: fix rename/add conflict handling

2018-11-02 Thread Elijah Newren

This makes the rename/add conflict handling make use of the new
handle_file_collision() function, which fixes several bugs and improves
things for the rename/add case significantly.  Previously, rename/add
would:

  * Not leave any higher order stage entries in the index, making it
appear as if there were no conflict.
  * Would place the rename file at the colliding path, and move the
added file elsewhere, which combined with the lack of higher order
stage entries felt really odd.  It's not clear to me why the
rename should take precedence over the add; if one should be moved
out of the way, they both probably should.
  * In the recursive case, it would do a two way merge of the added
file and the version of the renamed file on the renamed side,
completely excluding modifications to the renamed file on the
unrenamed side of history.

Use the new handle_file_collision() to fix all of these issues.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c| 137 +--
 t/t6036-recursive-corner-cases.sh|  24 ++---
 t/t6042-merge-rename-corner-cases.sh |   4 +-
 3 files changed, 101 insertions(+), 64 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 24d979022e..0805095168 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -186,6 +186,7 @@ static int oid_eq(const struct object_id *a, const struct 
object_id *b)
 enum rename_type {
RENAME_NORMAL = 0,
RENAME_VIA_DIR,
+   RENAME_ADD,
RENAME_DELETE,
RENAME_ONE_FILE_TO_ONE,
RENAME_ONE_FILE_TO_TWO,
@@ -228,6 +229,7 @@ static inline void setup_rename_conflict_info(enum 
rename_type rename_type,
  struct stage_data *src_entry1,
  struct stage_data *src_entry2)
 {
+   int ostage1 = 0, ostage2;
struct rename_conflict_info *ci;
 
/*
@@ -264,18 +266,22 @@ static inline void setup_rename_conflict_info(enum 
rename_type rename_type,
dst_entry2->rename_conflict_info = ci;
}
 
-   if (rename_type == RENAME_TWO_FILES_TO_ONE) {
-   /*
-* For each rename, there could have been
-* modifications on the side of history where that
-* file was not renamed.
-*/
-   int ostage1 = o->branch1 == branch1 ? 3 : 2;
-   int ostage2 = ostage1 ^ 1;
+   /*
+* For each rename, there could have been
+* modifications on the side of history where that
+* file was not renamed.
+*/
+   if (rename_type == RENAME_ADD ||
+   rename_type == RENAME_TWO_FILES_TO_ONE) {
+   ostage1 = o->branch1 == branch1 ? 3 : 2;
 
ci->ren1_other.path = pair1->one->path;
oidcpy(>ren1_other.oid, _entry1->stages[ostage1].oid);
ci->ren1_other.mode = src_entry1->stages[ostage1].mode;
+   }
+
+   if (rename_type == RENAME_TWO_FILES_TO_ONE) {
+   ostage2 = ostage1 ^ 1;
 
ci->ren2_other.path = pair2->one->path;
oidcpy(>ren2_other.oid, _entry2->stages[ostage2].oid);
@@ -1559,7 +1565,6 @@ static struct diff_filespec *filespec_from_entry(struct 
diff_filespec *target,
return target;
 }
 
-#if 0 // #if-0-ing avoids unused function warning; will make live in next 
commit
 static int handle_file_collision(struct merge_options *o,
 const char *collide_path,
 const char *prev_path1,
@@ -1575,6 +1580,20 @@ static int handle_file_collision(struct merge_options *o,
char *alt_path = NULL;
const char *update_path = collide_path;
 
+   /*
+* It's easiest to get the correct things into stage 2 and 3, and
+* to make sure that the content merge puts HEAD before the other
+* branch if we just ensure that branch1 == o->branch1.  So, simply
+* flip arguments around if we don't have that.
+*/
+   if (branch1 != o->branch1) {
+   return handle_file_collision(o, collide_path,
+prev_path2, prev_path1,
+branch2, branch1,
+b_oid, b_mode,
+a_oid, a_mode);
+   }
+
/*
 * In the recursive case, we just opt to undo renames
 */
@@ -1678,7 +1697,38 @@ static int handle_file_collision(struct merge_options *o,
 */
return mfi.clean;
 }
-#endif
+
+static int handle_rename_add(struct merge_options *o,
+struct rename_conflict_info *ci)
+{
+   /* a was renamed to c, and a separate c was added. */
+   struct diff_filespec *a = ci->pair1->one;
+   struct diff_filespec *c = ci->pair1->two;
+   char *path = c->path;
+   char

Re: submodule support in git-bundle

2018-11-02 Thread Duy Nguyen

On Fri, Nov 2, 2018 at 6:09 PM Stefan Beller  wrote:
>
> On Fri, Nov 2, 2018 at 9:10 AM Duy Nguyen  wrote:
> >
> > I use git-bundle today and it occurs to me that if I want to use it to
> > transfer part of a history that involves submodule changes, things
> > aren't pretty. Has anybody given thought on how to do binary history
> > transfer that contains changes from submodules?
> >
> > Since .bundle files are basically .pack files, i'm not sure if it's
> > easy to bundle multiple pack files (one per repo)...
>
> That is a really good discussion starter!
>
> As bundles are modeled after the fetch protocol, I would
> redirect the discussion there.
>
> The new fetch protocol could support sending more than
> one pack, which could be for both the superproject as
> well as the relevant submodule updates (i.e. what is recorded
> in the superproject) based on a new capability.
>
> We at Google have given this idea some thought, but from a
> different angle: As you may know currently Android uses the
> repo tool, which we want to replace with Gits native submodules
> eventually. The repo tool tests for each repository to clone if
> there is a bundle file for that repository, such that instead of
> cloning the repo, the bundle can be downloaded and then
> a catch-up fetch can be performed. (This helps the Git servers
> as well as the client, the bundle can be hosted on a CDN,
> which is faster and cheaper than a git server for us).
>
> So we've given some thought on extending the packfiles in the
> fetch protocol to have some redirection to a CDN possible,
> i.e. instead of sending bytes as is, you get more or less a "todo"
> list, which might be
> (a) take the following bytes as is (current pack format)
> (b) download these other bytes from $THERE
> (possibly with a checksum)
> once the stream of bytes is assembled, it will look like a regular
> packfile with deltas etc.
>
> This offloading-to-CDN (or "mostly resumable clone" in the
> sense that the communication with the server is minimal, and
> you get most of your data via resumable http range-requests)
> sounds like complete offtopic, but is one of the requirements
> for the repo to submodule migration, hence I came to speak of it.

Hm.. so what you're saying is, we could have a pack file that lists
other (real) pack files and for the bundle case they are all in the
same file. And "download from $THERE" in this case is "download at
this file offset"? That might actually work.

> Did you have other things in mind, on a higher level?
> e.g. querying the bundle and creating submodule bundles
> based off the superproject bundle? 'git bundle create' could
> learn the --recurse-submodules option, which then produces
> multiple bundle files without changing the file formats.

This is probably the simplest way to support submodules. I just
haven't really thought much about it (the problem just came up to me
like 2 hours ago). Two problems with this are convenience (I don't
want to handle multiple files) and submodule info (which pack should
be unbundled on which submodule?). But I suppose if "git bundle"
produces a tarball of these bundle files then you solve both.

But of course there may be other and better options like what you
described above. If in long term we have "pack with hyperlinks" anyway
for resumable clone and other fancy stuff then reusing the same
mechanism for bundles makes sense, less maintenance burden.
-- 
Duy

Re: [RFE] Please add name and email to git credentials

2018-11-02 Thread Jeff King

On Fri, Nov 02, 2018 at 06:32:36PM +0100, Nicolas Mailhot wrote:

> > I did create the way git credential matches repo urls. And I do not
> > think your proposal is a good idea. The credential system is about
> > interacting with a remote URL, and creating a commit object is a local
> > operation. That mismatch leaks through when you work with multiple
> > remotes, since it is not clear which URL we would match against when
> > the operation does not involve a remote at all.
> 
> I don't think it's quite that simple. The id part of creating a commit
> object is not a local operation at all. You choose the id written in a
> commit for a specific remote target, it has no use locally, most of us
> do not need it to reach themselves.

I don't think that's true. You do not need to have a remote at all to
use Git.

But more importantly, the commit is not in any way tied to a specific
remote. You might have multiple remotes that are storing many of the
same objects. So even if you wanted to somehow assign a segment of
history to a remote, it is not an unambiguous mapping.

> So yes there is a leak but it’s built in the git commit logic itself.
> Ideally, git would defer resolving  in commits to when I push to a
> remote target. I'm sure you’re aware of all the workarounds git users do
> at push time, when they realize the commit  is not the good one.

Your second sentence is fundamentally at odds with how Git works and its
decentralized data model.

> And since the leak is built in the commit logic itself, there are no
> perfect solutions that do not involve revisiting how commit works.
> 
> So, unless someone wants to revisit git commit, we’re left with
> imperfect solutions, and git credentials is no worse than another. It
> certainly fixes my use case fine. You did good work in git credentials.

Sorry, I just don't agree with any of the logic here. That's not how
commits work in Git, and all of the solutions are not equally imperfect.

-Peff

Re: [RFC] Generation Number v2

2018-11-02 Thread Jakub Narebski

Derrick Stolee  writes:
> On 10/31/2018 8:54 AM, Ævar Arnfjörð Bjarmason wrote:
>> On Tue, Oct 30 2018, Junio C Hamano wrote:
>>> Derrick Stolee  writes:

 In contrast, maximum generation numbers and corrected commit
 dates both performed quite well. They are frequently the top
 two performing indexes, and rarely significantly different.

 The trade-off here now seems to be: which _property_ is more important,
 locally-computable or backwards-compatible?
>>>
>>> Nice summary.
>>>
>>> As I already said, I personally do not think being compatible with
>>> currently deployed clients is important at all (primarily because I
>>> still consider the whole thing experimental), and there is a clear
>>> way forward once we correct the mistake of not having a version
>>> number in the file format that tells the updated clients to ignore
>>> the generation numbers.  For longer term viability, we should pick
>>> something that is immutable, reproducible, computable with minimum
>>> input---all of which would lead to being incrementally computable, I
>>> would think.
>>
>> I think it depends on what we mean by backwards compatibility. None of
>> our docs are saying this is experimental right now, just that it's
>> opt-in like so many other git-config(1) options.
>>
>> So if we mean breaking backwards compatibility in that we'll write a new
>> file or clobber the existing one with a version older clients can't use
>> as an optimization, fine.
>>
>> But it would be bad to produce a hard error on older clients, but
>> avoiding that seems as easy as just creating a "commit-graph2" file in
>> .git/objects/info/.
>
> Well, we have a 1-byte version number following the "CGPH" header in
> the commit-graph file, and clients will ignore the commit-graph file
> if that number is not "1". My hope for backwards-compatibility was
> to avoid incrementing this value and instead use the unused 8th byte.

How?  Some of the considered new generation numbers were backwards
compatibile in the sense that for graph traversal the old code for
(minimum) generation numbers could be used.  But that is for reading.
If the old client tried to update new generation number using old
generation number algorithm (assuming locality), it would write some
chimaera that may or may not work.

> However, it appears that we are destined to increment that version
> number, anyway. Here is my list for what needs to be in the next
> version of the commit-graph file format:

As a reminder, here is how the commit-graph format looks now [1]

  HEADER:
4-byte signature:  CGPH
1-byte version number: 1
1-byte Hash Version:   1 = SHA-1
1-byte number (C) of "chunks": 3 or 4
1-byte (reserved for later use)
  CHUNK LOOKUP:
(C + 1) * 12 bytes listing the table of contents for the chunks
  CHUNK DATA:
OIDF
OIDL
CDAT
EDGE [optional, depends on graph structure]
  TRAILER:
checksum

[1]: Documentation/technical/commit-graph-format.txt

> 1. A four-byte hash version.

Why do you need a four-byte hash version?  255 possible hash functions
is not enough?

> 2. File incrementality (split commit-graph).

This would probably require a segment lookup part, and one chunk of each
type per segment (or even one chunk lookup table per segment).  It may
or may not need generation number which can support segmenting (here
maximal generation numbers have the advantage); but you can assume that
if segment(A) > segment(B) then reach.ix(A) > reach.ix(B),
i.e. lexicographical-like sort.

> 3. Reachability Index versioning

I assume that you mean here versioning for the reachability index that
is stored in the CDAT section (if it is in a separate chunk, it should
be easy to version).

The easiest thing to do when encountering unknown or not supported
generation number (perhaps the commit-graph file was created in the past
by earlier version of Git, perahs it came from server with newer Git, or
perhaps the same repository is sometimes accessed using newer and
sometimes older Git version), is to set commit->generation_number to
GENERATION_NUMBER_ZERO, as you wrote in [2].

We could encode backward-compatibility (for using) somewhat, but I
wonder how much it would be of use.

[2]: 
https://public-inbox.org/git/61a829ce-0d29-81c9-880e-7aef1bec9...@gmail.com/

> Most of these changes will happen in the file header. The chunks
> themselves don't need to change, but some chunks may be added that
> only make sense in v2 commit-graphs.

What I would like to see in v2 commit-graph format would be some kind
convention for naming / categorizing chunks, so that Git would be able
to handle unknown chunks gracefully.

In PNG format there are various types of chunks.  Quoting Wikipedia:

  Chunk types [in PNG format] are given a four-letter case sensitive
  ASCII type/name; compare FourCC. The case of the different letters in
  the name (bit 5 of the numeric value of the character) is a bit

Re: [RFE] Please add name and email to git credentials

2018-11-02 Thread Nicolas Mailhot

Le vendredi 02 novembre 2018 à 12:51 -0400, Jeff King a écrit :

Hi,

Thank you for reading the RFE.

> I did create the way git credential matches repo urls. And I do not
> think your proposal is a good idea. The credential system is about
> interacting with a remote URL, and creating a commit object is a local
> operation. That mismatch leaks through when you work with multiple
> remotes, since it is not clear which URL we would match against when
> the operation does not involve a remote at all.

I don't think it's quite that simple. The id part of creating a commit
object is not a local operation at all. You choose the id written in a
commit for a specific remote target, it has no use locally, most of us
do not need it to reach themselves.

So yes there is a leak but it’s built in the git commit logic itself.
Ideally, git would defer resolving  in commits to when I push to a
remote target. I'm sure you’re aware of all the workarounds git users do
at push time, when they realize the commit  is not the good one.

And since the leak is built in the commit logic itself, there are no
perfect solutions that do not involve revisiting how commit works.

So, unless someone wants to revisit git commit, we’re left with
imperfect solutions, and git credentials is no worse than another. It
certainly fixes my use case fine. You did good work in git credentials.

Regards,

-- 
Nicolas Mailhot

Re: [PATCH v3 8/8] merge-recursive: improve rename/rename(1to2)/add[/add] handling

2018-11-02 Thread Derrick Stolee


On 11/2/2018 1:27 PM, Elijah Newren wrote:

On Thu, Nov 1, 2018 at 12:01 AM Elijah Newren  wrote:

On Wed, Oct 31, 2018 at 8:08 AM Derrick Stolee  wrote:

On 10/19/2018 3:31 PM, Elijah Newren wrote:

[snip]

+ char *new_path = NULL;
+ if (dir_in_way(b->path, !o->call_depth, 0)) {
+ new_path = unique_path(o, b->path, ci->branch2);
+ output(o, 1, _("%s is a directory in %s adding "
+"as %s instead"),
+b->path, ci->branch1, new_path);

I tried really hard, but failed to get a test to cover the block below.
I was able to
find that the "check handling of differently renamed file with D/F
conflicts" test
in t6022-merge-rename.sh covers the block above. Trying to tweak the
example using
untracked files seems to hit an error message from unpack-trees.c instead.


+ } else if (would_lose_untracked(b->path)) {
+ new_path = unique_path(o, b->path, ci->branch2);
+ output(o, 1, _("Refusing to lose untracked file"
+" at %s; adding as %s instead"),
+b->path, new_path);

So now I'm confused.  This block was not listed in your coverage
report[1].  And, in fact, I think this block IS covered by testcase
10c of t6043.  However, there is a very similar looking block about 30
lines up that is uncovered (and which was mentioned in your report):

 } else if (would_lose_untracked(a->path)) {
 new_path = unique_path(o, a->path, ci->branch1);
 output(o, 1, _("Refusing to lose untracked file"
" at %s; adding as %s instead"),
a->path, new_path);

covering it, I think, is just a matter of repeating the 10c test with
the merge repeated in the other direction (checkout B and merge A
instead of checking out A and merging B) -- and touching up the checks
accordingly.

However, now I'm wondering if I'm crazy.  Was it really the block you
had highlighted that you were seeing uncovered?


Trust the report (generated by computer) over me (generated by squinting 
at an email, trying to match line numbers).


Thanks,

-Stolee

Re: [PATCH v3 8/8] merge-recursive: improve rename/rename(1to2)/add[/add] handling

2018-11-02 Thread Elijah Newren

On Thu, Nov 1, 2018 at 12:01 AM Elijah Newren  wrote:
>
> On Wed, Oct 31, 2018 at 8:08 AM Derrick Stolee  wrote:
> >
> > On 10/19/2018 3:31 PM, Elijah Newren wrote:
> > > [snip]
> > >
> > > + char *new_path = NULL;
> > > + if (dir_in_way(b->path, !o->call_depth, 0)) {
> > > + new_path = unique_path(o, b->path, 
> > > ci->branch2);
> > > + output(o, 1, _("%s is a directory in %s 
> > > adding "
> > > +"as %s instead"),
> > > +b->path, ci->branch1, new_path);
> >
> > I tried really hard, but failed to get a test to cover the block below.
> > I was able to
> > find that the "check handling of differently renamed file with D/F
> > conflicts" test
> > in t6022-merge-rename.sh covers the block above. Trying to tweak the
> > example using
> > untracked files seems to hit an error message from unpack-trees.c instead.
> >
> > > + } else if (would_lose_untracked(b->path)) {
> > > + new_path = unique_path(o, b->path, 
> > > ci->branch2);
> > > + output(o, 1, _("Refusing to lose untracked 
> > > file"
> > > +" at %s; adding as %s 
> > > instead"),
> > > +b->path, new_path);
> >

So now I'm confused.  This block was not listed in your coverage
report[1].  And, in fact, I think this block IS covered by testcase
10c of t6043.  However, there is a very similar looking block about 30
lines up that is uncovered (and which was mentioned in your report):

} else if (would_lose_untracked(a->path)) {
new_path = unique_path(o, a->path, ci->branch1);
output(o, 1, _("Refusing to lose untracked file"
   " at %s; adding as %s instead"),
   a->path, new_path);

covering it, I think, is just a matter of repeating the 10c test with
the merge repeated in the other direction (checkout B and merge A
instead of checking out A and merging B) -- and touching up the checks
accordingly.

However, now I'm wondering if I'm crazy.  Was it really the block you
had highlighted that you were seeing uncovered?

Thanks,
Elijah

[1] https://public-inbox.org/git/62f0bcf6-aa73-c192-d804-e6d69cac1...@gmail.com/

Re: [PATCH 19/24] submodule: use submodule repos for object lookup

2018-11-02 Thread Derrick Stolee


On 11/2/2018 1:23 PM, Stefan Beller wrote:

On Fri, Nov 2, 2018 at 6:03 AM Derrick Stolee  wrote:

On 10/30/2018 6:08 PM, Stefan Beller wrote:

This converts the 'show_submodule_header' function to use
the repository API properly, such that the submodule objects
are not added to the main object store.

Signed-off-by: Stefan Beller 

A couple tests are broken in 'pu' when run with GIT_TEST_COMMIT_GRAPH=1,
including t4041-diff-submodule-option.sh. The failure bisects to this patch.

Here is a verbose output of the first failure in that script:;


expecting success:
  git diff-index -p --submodule=log HEAD >actual &&
  cat >expected <<-EOF &&
  Submodule sm1 $head2..$head3 (rewind):
< Add foo3 ($added foo3)
< Add foo2 ($added foo2)
  EOF
  test_cmp expected actual

+ git diff-index -p --submodule=log HEAD
+ cat
+ test_cmp expected actual
+ diff -u expected actual
--- expected2018-11-02 12:58:43.429262380 +
+++ actual  2018-11-02 12:58:43.429262380 +
@@ -1,3 +1,5 @@
-Submodule sm1 30b9670..dafb207 (rewind):
+Submodule sm1 30b9670...dafb207:
 < Add foo3 (hinzugefügt foo3)
 < Add foo2 (hinzugefügt foo2)
+  > Add foo1 (hinzugefügt foo1)
+  < Add foo1 (hinzugefügt foo1)
error: last command exited with $?=1
not ok 9 - modified submodule(backward)

I've been looking into the patch below to see if there is an obvious
problem, but the best I can think is that open_submodule() creates an
alternate 'struct repository' and somehow the commit-graph feature is
interacting poorly with that struct.

Stefan, do you know what's going on?

Sure, see the last four patches of this series
https://public-inbox.org/git/20181030220817.61691-1-sbel...@google.com/
(to which you also reply to? Junio did not queue this one, yet).


Sorry! Got a bit mixed up looking at everything. I didn't realize that 
the current 'pu' didn't have your latest. Thanks!

Re: [PATCH 19/24] submodule: use submodule repos for object lookup

2018-11-02 Thread Stefan Beller

On Fri, Nov 2, 2018 at 6:03 AM Derrick Stolee  wrote:
>
> On 10/30/2018 6:08 PM, Stefan Beller wrote:
> > This converts the 'show_submodule_header' function to use
> > the repository API properly, such that the submodule objects
> > are not added to the main object store.
> >
> > Signed-off-by: Stefan Beller 
>
> A couple tests are broken in 'pu' when run with GIT_TEST_COMMIT_GRAPH=1,
> including t4041-diff-submodule-option.sh. The failure bisects to this patch.
>
> Here is a verbose output of the first failure in that script:;
>
>
> expecting success:
>  git diff-index -p --submodule=log HEAD >actual &&
>  cat >expected <<-EOF &&
>  Submodule sm1 $head2..$head3 (rewind):
>< Add foo3 ($added foo3)
>< Add foo2 ($added foo2)
>  EOF
>  test_cmp expected actual
>
> + git diff-index -p --submodule=log HEAD
> + cat
> + test_cmp expected actual
> + diff -u expected actual
> --- expected2018-11-02 12:58:43.429262380 +
> +++ actual  2018-11-02 12:58:43.429262380 +
> @@ -1,3 +1,5 @@
> -Submodule sm1 30b9670..dafb207 (rewind):
> +Submodule sm1 30b9670...dafb207:
> < Add foo3 (hinzugefügt foo3)
> < Add foo2 (hinzugefügt foo2)
> +  > Add foo1 (hinzugefügt foo1)
> +  < Add foo1 (hinzugefügt foo1)
> error: last command exited with $?=1
> not ok 9 - modified submodule(backward)
>
> I've been looking into the patch below to see if there is an obvious
> problem, but the best I can think is that open_submodule() creates an
> alternate 'struct repository' and somehow the commit-graph feature is
> interacting poorly with that struct.
>
> Stefan, do you know what's going on?

Sure, see the last four patches of this series
https://public-inbox.org/git/20181030220817.61691-1-sbel...@google.com/
(to which you also reply to? Junio did not queue this one, yet).

Re: [PATCH 1/1] poll: use GetTickCount64() to avoid wrap-around issues

2018-11-02 Thread Steve Hoelzer

On Fri, Nov 2, 2018 at 11:43 AM Johannes Sixt  wrote:
>
> Am 02.11.18 um 15:47 schrieb Steve Hoelzer:
> > On Thu, Nov 1, 2018 at 5:22 AM Johannes Sixt  wrote:
> >>
> >> Am 31.10.18 um 22:11 schrieb Steve Hoelzer via GitGitGadget:
> >>> @@ -614,7 +618,7 @@ restart:
> >>>
> >>>  if (!rc && orig_timeout && timeout != INFTIM)
> >>>{
> >>> -  elapsed = GetTickCount() - start;
> >>> +  elapsed = (DWORD)(GetTickCount64() - start);
> >>
> >> AFAICS, this subtraction in the old code is the correct way to take
> >> account of wrap-arounds in the tick count. The new code truncates the 64
> >> bit difference to 32 bits; the result is exactly identical to a
> >> difference computed from truncated 32 bit values, which is what we had
> >> in the old code.
> >>
> >> IOW, there is no change in behavior. The statement "avoid wrap-around
> >> issues" in the subject line is not correct. The patch's only effect is
> >> that it removes Warning C28159.
> >>
> >> What is really needed is that all quantities in the calculations are
> >> promoted to ULONGLONG. Unless, of course, we agree that a timeout of
> >> more than 49 days cannot happen ;)
> >
> > Yep, correct on all counts. I'm in favor of changing the commit message to
> > only say that this patch removes Warning C28159.
>
> How about this fixup instead?
>
>  8< 
> squash! poll: use GetTickCount64() to avoid wrap-around issues
>
> The value of timeout starts as an int value, and for this reason it
> cannot overflow unsigned long long aka ULONGLONG. The unsigned version
> of this initial value is available in orig_timeout. The difference
> (orig_timeout - elapsed) cannot wrap around because it is protected by
> a conditional (as can be seen in the patch text). Hence, the ULONGLONG
> difference can only have values that are smaller than the initial
> timeout value and truncation to int cannot overflow.
>
> Signed-off-by: Johannes Sixt 
> ---
>  compat/poll/poll.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/compat/poll/poll.c b/compat/poll/poll.c
> index 4abbfcb6a4..4459408c7d 100644
> --- a/compat/poll/poll.c
> +++ b/compat/poll/poll.c
> @@ -452,7 +452,7 @@ poll (struct pollfd *pfd, nfds_t nfd, int timeout)
>static HANDLE hEvent;
>WSANETWORKEVENTS ev;
>HANDLE h, handle_array[FD_SETSIZE + 2];
> -  DWORD ret, wait_timeout, nhandles, elapsed, orig_timeout = 0;
> +  DWORD ret, wait_timeout, nhandles, orig_timeout = 0;
>ULONGLONG start = 0;
>fd_set rfds, wfds, xfds;
>BOOL poll_again;
> @@ -618,8 +618,8 @@ poll (struct pollfd *pfd, nfds_t nfd, int timeout)
>
>if (!rc && orig_timeout && timeout != INFTIM)
>  {
> -  elapsed = (DWORD)(GetTickCount64() - start);
> -  timeout = elapsed >= orig_timeout ? 0 : orig_timeout - elapsed;
> +  ULONGLONG elapsed = GetTickCount64() - start;
> +  timeout = elapsed >= orig_timeout ? 0 : (int)(orig_timeout - elapsed);
>  }
>
>if (!rc && timeout)
> --
> 2.19.1.406.g1aa3f475f3

I like it. This still removes the warning and avoids overflow issues.

Steve

Re: [PATCH v2 06/16] sequencer: refactor sequencer_add_exec_commands() to work on a todo_list

2018-11-02 Thread Phillip Wood


Hi Alban

On 02/11/2018 16:26, Alban Gruin wrote:

Hi Phillip,

Le 02/11/2018 à 11:09, Phillip Wood a écrit :

+    struct todo_item *items = NULL,
+    base_item = {TODO_EXEC, NULL, 0, 0, commands_len, 0};
+
+    strbuf_addstr(buf, commands);
+    base_item.offset_in_buf = buf->len - commands_len - 1;
+    base_item.arg = buf->buf + base_item.offset_in_buf;


I think if the user gives --exec more than once on the command line then
commands will contain more than one exec command so this needs to parse
commands and create one todo_item for each command.



Ouch, you’re right.  Thanks for the heads up.


I haven't looked how difficult it would be but it might be best to
change the option parsing to pass an array of strings containing the
exec commands rather than one long string so we can just loop over the
array here.



It would be the best way to do so.  This string comes from git-rebase.sh
(or builtin/rebase.c) -- they format it this way before invoking
git-rebase--interactive.  So either I modify both of them (for this, I
would need to rebase my branch on master), or I can split this string in
builtin/rebase--interactive.c.  I prefer the first option, but maybe
changing the base of this series will not please Junio.


I think in the last 'what's cooking' email Junio said he was going to 
merge all the builtin/rebase.c stuff to master so there may not be a 
problem if you wait a couple of days.


Best Wishes

Phillip


Cheers,
Alban

Re: submodule support in git-bundle

2018-11-02 Thread Stefan Beller

On Fri, Nov 2, 2018 at 9:10 AM Duy Nguyen  wrote:
>
> I use git-bundle today and it occurs to me that if I want to use it to
> transfer part of a history that involves submodule changes, things
> aren't pretty. Has anybody given thought on how to do binary history
> transfer that contains changes from submodules?
>
> Since .bundle files are basically .pack files, i'm not sure if it's
> easy to bundle multiple pack files (one per repo)...

That is a really good discussion starter!

As bundles are modeled after the fetch protocol, I would
redirect the discussion there.

The new fetch protocol could support sending more than
one pack, which could be for both the superproject as
well as the relevant submodule updates (i.e. what is recorded
in the superproject) based on a new capability.

We at Google have given this idea some thought, but from a
different angle: As you may know currently Android uses the
repo tool, which we want to replace with Gits native submodules
eventually. The repo tool tests for each repository to clone if
there is a bundle file for that repository, such that instead of
cloning the repo, the bundle can be downloaded and then
a catch-up fetch can be performed. (This helps the Git servers
as well as the client, the bundle can be hosted on a CDN,
which is faster and cheaper than a git server for us).

So we've given some thought on extending the packfiles in the
fetch protocol to have some redirection to a CDN possible,
i.e. instead of sending bytes as is, you get more or less a "todo"
list, which might be
(a) take the following bytes as is (current pack format)
(b) download these other bytes from $THERE
(possibly with a checksum)
once the stream of bytes is assembled, it will look like a regular
packfile with deltas etc.

This offloading-to-CDN (or "mostly resumable clone" in the
sense that the communication with the server is minimal, and
you get most of your data via resumable http range-requests)
sounds like complete offtopic, but is one of the requirements
for the repo to submodule migration, hence I came to speak of it.

Did you have other things in mind, on a higher level?
e.g. querying the bundle and creating submodule bundles
based off the superproject bundle? 'git bundle create' could
learn the --recurse-submodules option, which then produces
multiple bundle files without changing the file formats.

Stefan

Re: [RFE] Please add name and email to git credentials

2018-11-02 Thread Jeff King

On Fri, Nov 02, 2018 at 09:57:30AM +0100, Nicolas Mailhot wrote:

> Or are you arguing that having two separate mecanisms in git, to match
> config directives to repo urls, is some kind of improvement?

There are already are multiple mechanisms (e.g., http.* config). So that
ship has sailed. ;)

> I didn't create or write or specify the way git credential matches repo
> urls. It already exists within git. If you have a problem with the
> matching logic git credential uses, why are you arguing with me instead
> of taking it up with the maintainers of this logic?

I did create the way git credential matches repo urls. And I do not
think your proposal is a good idea. The credential system is about
interacting with a remote URL, and creating a commit object is a local
operation. That mismatch leaks through when you work with multiple
remotes, since it is not clear which URL we would match against when the
operation does not involve a remote at all.

If you want to have per-repo name/email for creating commits, right now
your option is to have conditional config based on the local filesystem
path.

Elsewhere in this thread[1] Ævar suggested having conditional config
directives based on the presence of a remote. I'm a little hesitant
there because it requires the config parsing to depend on the value of
another key (introducing subtle timing dependencies). But aside from
bizarre corner cases (e.g., including config for remote.* via a
conditional include on the remote.* config), I think it would do what
you want and would be generally useful.

That seems like a more fruitful direction to me.

-Peff

[1] https://public-inbox.org/git/87wopxj5wr@evledraar.gmail.com/

Re: [PATCH 1/1] poll: use GetTickCount64() to avoid wrap-around issues

2018-11-02 Thread Johannes Sixt

Am 02.11.18 um 15:47 schrieb Steve Hoelzer:
> On Thu, Nov 1, 2018 at 5:22 AM Johannes Sixt  wrote:
>>
>> Am 31.10.18 um 22:11 schrieb Steve Hoelzer via GitGitGadget:
>>> @@ -614,7 +618,7 @@ restart:
>>>
>>>  if (!rc && orig_timeout && timeout != INFTIM)
>>>{
>>> -  elapsed = GetTickCount() - start;
>>> +  elapsed = (DWORD)(GetTickCount64() - start);
>>
>> AFAICS, this subtraction in the old code is the correct way to take
>> account of wrap-arounds in the tick count. The new code truncates the 64
>> bit difference to 32 bits; the result is exactly identical to a
>> difference computed from truncated 32 bit values, which is what we had
>> in the old code.
>>
>> IOW, there is no change in behavior. The statement "avoid wrap-around
>> issues" in the subject line is not correct. The patch's only effect is
>> that it removes Warning C28159.
>>
>> What is really needed is that all quantities in the calculations are
>> promoted to ULONGLONG. Unless, of course, we agree that a timeout of
>> more than 49 days cannot happen ;)
> 
> Yep, correct on all counts. I'm in favor of changing the commit message to
> only say that this patch removes Warning C28159.

How about this fixup instead?

 8< 
squash! poll: use GetTickCount64() to avoid wrap-around issues

The value of timeout starts as an int value, and for this reason it
cannot overflow unsigned long long aka ULONGLONG. The unsigned version
of this initial value is available in orig_timeout. The difference
(orig_timeout - elapsed) cannot wrap around because it is protected by
a conditional (as can be seen in the patch text). Hence, the ULONGLONG
difference can only have values that are smaller than the initial
timeout value and truncation to int cannot overflow.

Signed-off-by: Johannes Sixt 
---
 compat/poll/poll.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/compat/poll/poll.c b/compat/poll/poll.c
index 4abbfcb6a4..4459408c7d 100644
--- a/compat/poll/poll.c
+++ b/compat/poll/poll.c
@@ -452,7 +452,7 @@ poll (struct pollfd *pfd, nfds_t nfd, int timeout)
   static HANDLE hEvent;
   WSANETWORKEVENTS ev;
   HANDLE h, handle_array[FD_SETSIZE + 2];
-  DWORD ret, wait_timeout, nhandles, elapsed, orig_timeout = 0;
+  DWORD ret, wait_timeout, nhandles, orig_timeout = 0;
   ULONGLONG start = 0;
   fd_set rfds, wfds, xfds;
   BOOL poll_again;
@@ -618,8 +618,8 @@ poll (struct pollfd *pfd, nfds_t nfd, int timeout)
 
   if (!rc && orig_timeout && timeout != INFTIM)
 {
-  elapsed = (DWORD)(GetTickCount64() - start);
-  timeout = elapsed >= orig_timeout ? 0 : orig_timeout - elapsed;
+  ULONGLONG elapsed = GetTickCount64() - start;
+  timeout = elapsed >= orig_timeout ? 0 : (int)(orig_timeout - elapsed);
 }
 
   if (!rc && timeout)
-- 
2.19.1.406.g1aa3f475f3

Re: [PATCH v3] i18n: make GETTEXT_POISON a runtime option

2018-11-02 Thread SZEDER Gábor

On Thu, Nov 01, 2018 at 07:31:15PM +, Ævar Arnfjörð Bjarmason wrote:
> Change the GETTEXT_POISON compile-time + runtime GIT_GETTEXT_POISON
> test parameter to only be a GIT_TEST_GETTEXT_POISON=
> runtime parameter, to be consistent with other parameters documented
> in "Running tests with special setups" in t/README.
> 
> When I added GETTEXT_POISON in bb946bba76 ("i18n: add GETTEXT_POISON
> to simulate unfriendly translator", 2011-02-22) I was concerned with
> ensuring that the _() function would get constant folded if NO_GETTEXT
> was defined, and likewise that GETTEXT_POISON would be compiled out
> unless it was defined.
> 
> But as the benchmark in my [1] shows doing a one-off runtime
> getenv("GIT_TEST_[...]") is trivial, and since GETTEXT_POISON was
> originally added the GIT_TEST_* env variables have become the common
> idiom for turning on special test setups.
> 
> So change GETTEXT_POISON to work the same way. Now the
> GETTEXT_POISON=YesPlease compile-time option is gone, and running the
> tests with GIT_TEST_GETTEXT_POISON=[YesPlease|] can be toggled on/off
> without recompiling.
> 
> This allows for conditionally amending tests to test with/without
> poison, similar to what 859fdc0c3c ("commit-graph: define
> GIT_TEST_COMMIT_GRAPH", 2018-08-29) did for GIT_TEST_COMMIT_GRAPH. Do
> some of that, now we e.g. always run the t0205-gettext-poison.sh test.
> 
> I did enough there to remove the GETTEXT_POISON prerequisite, but its
> inverse C_LOCALE_OUTPUT is still around, and surely some tests using
> it can be converted to e.g. always set GIT_TEST_GETTEXT_POISON=.
> 
> Notes on the implementation:
> 
>  * We still compile a dedicated GETTEXT_POISON build in Travis CI.
>This is probably the wrong thing to do and should be followed-up
>with something similar to ae59a4e44f ("travis: run tests with
>GIT_TEST_SPLIT_INDEX", 2018-01-07) to re-use an existing test setup
>for running in the GIT_TEST_GETTEXT_POISON mode.

I'm of two minds about this.  Sure, now it's not a compile time
option, so we can spare a dedicated compilation, and sparing resources
is good, of course.

However, checking test failures in the 'linux-gcc' build job is always
a bit of a hassle, because it's not enough to look at the output of
the failed test, but I also have to check which one of the two test
runs failed (i.e. the "regular" or one with all the GIT_TEST_* knobs
turned on).  Adding a second test run with GIT_TEST_GETTEXT_POISON
enabled to an other build job (e.g. 'linux-clang') would then bring
this hassle to that build job as well.

Furthermore, occasionally a build job is slower than usual (whatever
the reason might be), which can be an issue when running the test
suite twice, as it can exceed the time limit.

Anyway, we can think about this later.

In any case, GIT_TEST_GETTEXT_POISON definitely should not be enabled
in the same "catch-all" test run with all other GIT_TEST_* variables,
because it would mess up any translated error messages that might pop
up in a test failure, and then we wouldn't have any idea about what
went wrong.

>  * We now skip a test in t-basic.sh under
>GIT_TEST_GETTEXT_POISON=YesPlease that wasn't skipped before. This
>test relies on C locale output, but due to an edge case in how the
>previous implementation of GETTEXT_POISON worked (reading it from
>GIT-BUILD-OPTIONS) wasn't enabling poison correctly. Now it does,
>and needs to be skipped.
> 
>  * The getenv() function is not reentrant, so out of paranoia about
>code of the form:
> 
>printf(_("%s"), getenv("some-env"));
> 
>call use_gettext_poison() in our early setup in git_setup_gettext()
>so we populate the "poison_requested" variable in a codepath that's
>won't suffer from that race condition.
> 
> See also [3] for more on the motivation behind this patch, and the
> history of the GETTEXT_POISON facility.
> 
> 1. https://public-inbox.org/git/871s8gd32p@evledraar.gmail.com/
> 2. https://public-inbox.org/git/87woq7b58k@evledraar.gmail.com/
> 3. https://public-inbox.org/git/878t2pd6yu@evledraar.gmail.com/
> 
> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---
> 
> Now:
> 
>  * The new i18n helper is gone. We just use "test -n" semantics for
>$GIT_TEST_GETTEXT_POISON
> 
>  * We error out in the Makefile if you're still saying
>GETTEXT_POISON=YesPlease.
> 
>This makes more sense than just making it a synonym since now this
>also needs to be defined at runtime.

OK, I think erroring out is better than silently ignoring
GETTEXT_POISON=YesPlease.  However, the commit message only mentions
that GETTEXT_POISON is gone, but perhaps this should be mentioned
there as well.

>  * The caveat with avoiding test_lazy_prereq is gone (although there's
>still some unrelated bug there worth looking into).
> 
>  * We call use_gettext_poison() really early to avoid any reentrancy
>issue with getenv().

> diff --git a/t/t9902-completion.sh b/t/t9902-completion.sh
>

Re: [PATCH v2 06/16] sequencer: refactor sequencer_add_exec_commands() to work on a todo_list

2018-11-02 Thread Alban Gruin

Hi Phillip,

Le 02/11/2018 à 11:09, Phillip Wood a écrit :
 +    struct todo_item *items = NULL,
 +    base_item = {TODO_EXEC, NULL, 0, 0, commands_len, 0};
 +
 +    strbuf_addstr(buf, commands);
 +    base_item.offset_in_buf = buf->len - commands_len - 1;
 +    base_item.arg = buf->buf + base_item.offset_in_buf;
>>>
>>> I think if the user gives --exec more than once on the command line then
>>> commands will contain more than one exec command so this needs to parse
>>> commands and create one todo_item for each command.
>>>
>>
>> Ouch, you’re right.  Thanks for the heads up.
> 
> I haven't looked how difficult it would be but it might be best to
> change the option parsing to pass an array of strings containing the
> exec commands rather than one long string so we can just loop over the
> array here.
> 

It would be the best way to do so.  This string comes from git-rebase.sh
(or builtin/rebase.c) -- they format it this way before invoking
git-rebase--interactive.  So either I modify both of them (for this, I
would need to rebase my branch on master), or I can split this string in
builtin/rebase--interactive.c.  I prefer the first option, but maybe
changing the base of this series will not please Junio.

Cheers,
Alban

Git Slowness on Windows w/o Internet

2018-11-02 Thread Peter Kostyukov

Hello,

Wanted to bring to your attention an issue that we discovered on our
Windows Jenkins nodes with git scm installed (git.exe). Our Jenkins
servers don't have Internet access. It appears that git.exe is trying
to connect to various Cloudflare and Akamai CDN instances over the
Internet when it first runs and it keeps trying to connect to these
CDNs every git.exe execution until it makes a successful attempt. See
the screenshot attached with the details.

Enabling Internet access via proxy fixes the issue and git.exe
continues to work fast on the next attempts to run git.exe

Is there any configuration setting that can disable this git's
behavior or is there any other workaround without allowing Internet
access? Otherwise, every git command run on a server without the
Internet takes about 30 seconds to complete.

Please advise.

Thanks,
Peter
CONFIDENTIALITY NOTICE:
This is a transmission from Kohl's Department Stores, Inc.
and may contain information which is confidential and proprietary.
If you are not the addressee, any disclosure, copying or distribution or use of 
the contents of this message is expressly prohibited.
If you have received this transmission in error, please destroy it and notify 
us immediately at 262-703-7000.

CAUTION:
Internet and e-mail communications are Kohl's property and Kohl's reserves the 
right to retrieve and read any message created, sent and received. Kohl's 
reserves the right to monitor messages by authorized Kohl's Associates at any 
time
without any further consent.

Re: [PATCH v1] add: speed up cmd_add() by utilizing read_cache_preload()

2018-11-02 Thread Ben Peart





On 11/2/2018 11:23 AM, Junio C Hamano wrote:

Ben Peart  writes:


From: Ben Peart 

During an "add", a call is made to run_diff_files() which calls
check_remove() for each index-entry.  The preload_index() code
distributes some of the costs across multiple threads.


Nice.  I peeked around and noticed that we already do this in
builtin_diff_index() before running run_diff_index() when !cached,
and builtin_diff_files(), of course.



Thanks for double checking!


Because the files checked are restricted to pathspec, adding individual
files makes no measurable impact but on a Windows repo with ~200K files,
'git add .' drops from 6.3 seconds to 3.3 seconds for a 47% savings.


;-)


diff --git a/builtin/add.c b/builtin/add.c
index ad49806ebf..f65c172299 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -445,11 +445,6 @@ int cmd_add(int argc, const char **argv, const char 
*prefix)
return 0;
}
  
-	if (read_cache() < 0)

-   die(_("index file corrupt"));
-
-   die_in_unpopulated_submodule(_index, prefix);
-
/*
 * Check the "pathspec '%s' did not match any files" block
 * below before enabling new magic.


It is not explained why this is not a mere s/read_cache/&_preload/
in the log message.  I can see it is because you wanted to make the
pathspec available to preload to further cut down the preloaded
paths, and I do not think it has any unintended (negatie) side
effect to parse the pathspec before populating the in-core index,
but that would have been a good thing to mention in the proposed log
message.



That is correct.  parse_pathspec() was after read_cache() because it 
_used_ to use the index to determine whether a pathspec is in a 
submodule.  That was fixed by Brandon in Aug 2017 when he cleaned up all 
submodule config code so it is safe to move read_cache_preload() after 
the call to parse_pathspec().


How about this for a revised commit message?



During an "add", a call is made to run_diff_files() which calls
check_remove() for each index-entry.  The preload_index() code 
distributes some of the costs across multiple threads.


Limit read_cache_preload() to pathspec, so that it doesn't process more 
entries than are needed and end up slowing things down instead of 
speeding them up.


Because the files checked are restricted to pathspec, adding individual
files makes no measurable impact but on a Windows repo with ~200K files,
'git add .' drops from 6.3 seconds to 3.3 seconds for a 47% savings.




@@ -459,6 +454,10 @@ int cmd_add(int argc, const char **argv, const char 
*prefix)
   PATHSPEC_SYMLINK_LEADING_PATH,
   prefix, argv);
  
+	if (read_cache_preload() < 0)

+   die(_("index file corrupt"));
+
+   die_in_unpopulated_submodule(_index, prefix);
die_path_inside_submodule(_index, );
  
  	if (add_new_files) {


base-commit: 4ede3d42dfb57f9a41ac96a1f216c62eb7566cc2

submodule support in git-bundle

2018-11-02 Thread Duy Nguyen

I use git-bundle today and it occurs to me that if I want to use it to
transfer part of a history that involves submodule changes, things
aren't pretty. Has anybody given thought on how to do binary history
transfer that contains changes from submodules?

Since .bundle files are basically .pack files, i'm not sure if it's
easy to bundle multiple pack files (one per repo)...
-- 
Duy

Re: Understanding pack format

2018-11-02 Thread Duy Nguyen

On Fri, Nov 2, 2018 at 7:19 AM Junio C Hamano  wrote:
>
> Farhan Khan  writes:
>
> > ...Where is this in the git code? That might
> > serve as a good guide.
>
> There are two major codepaths.  One is used at runtime, giving us
> random access into the packfile with the help with .idx file.  The
> other is used when receiving a new packstream to create an .idx
> file.

The third path is copying/reusing objects in
builtin/pack-objects.c::write_reuse_object(). Since it's mostly
encoding the header of new objects in pack, it could also be a good
starting point. Then you can move to write_no_reuse_object() and get
how the data is encoded, deltified or not (yeah not parsed, but I
think it's more or less the same thing conceptually).
-- 
Duy

Re: Understanding pack format

2018-11-02 Thread Duy Nguyen

On Fri, Nov 2, 2018 at 6:26 AM Farhan Khan  wrote:
>
> Hi all,
>
> I am trying to understand the pack file format and have been reading
> the documentation, specifically https://git-scm.com/docs/pack-format
> (which is in git's own git repository as
> "Documentation/technical/pack-format.txt"). I see that the file starts
> with the "PACK" signature, followed by the 4 byte version and 4 byte
> number of objects. After this, the documentation speaks about
> Undeltified and Deltified representations. I understand conceptually
> what each is, but do not know specifically how git parses it out.

If by "it" you mean the deltified representations, I think it's
actually documented in pack-format.txt. If you prefer C over English,
look at patch-delta.c

-- 
Duy

Re: [PATCH v1] add: speed up cmd_add() by utilizing read_cache_preload()

2018-11-02 Thread Duy Nguyen

On Fri, Nov 2, 2018 at 2:32 PM Ben Peart  wrote:
>
> From: Ben Peart 
>
> During an "add", a call is made to run_diff_files() which calls
> check_remove() for each index-entry.  The preload_index() code distributes
> some of the costs across multiple threads.

Instead of doing this site by site. How about we make read_cache()
always do multithread preload?

The only downside I see is preload may actually harm when there are
too few cache entries (but more than 500), but this needs to be
verified. If the penalty is small enough, I think we could live with
it since everything is fast when you have few entries anyway.

But if that's not true, we could add a threshold to activate preload.
Something like "if you have 50k files or more, then activate preload"
would do. I notice THREAD_COST in preload code, but I don't think it's
the same thing.

>
> Because the files checked are restricted to pathspec, adding individual
> files makes no measurable impact but on a Windows repo with ~200K files,
> 'git add .' drops from 6.3 seconds to 3.3 seconds for a 47% savings.
>
> Signed-off-by: Ben Peart 
> ---
>
> Notes:
> Base Ref: master
> Web-Diff: https://github.com/benpeart/git/commit/fc4830b545
> Checkout: git fetch https://github.com/benpeart/git add-preload-index-v1 
> && git checkout fc4830b545
>
>  builtin/add.c | 9 -
>  1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/builtin/add.c b/builtin/add.c
> index ad49806ebf..f65c172299 100644
> --- a/builtin/add.c
> +++ b/builtin/add.c
> @@ -445,11 +445,6 @@ int cmd_add(int argc, const char **argv, const char 
> *prefix)
> return 0;
> }
>
> -   if (read_cache() < 0)
> -   die(_("index file corrupt"));
> -
> -   die_in_unpopulated_submodule(_index, prefix);
> -
> /*
>  * Check the "pathspec '%s' did not match any files" block
>  * below before enabling new magic.
> @@ -459,6 +454,10 @@ int cmd_add(int argc, const char **argv, const char 
> *prefix)
>PATHSPEC_SYMLINK_LEADING_PATH,
>prefix, argv);
>
> +   if (read_cache_preload() < 0)
> +   die(_("index file corrupt"));
> +
> +   die_in_unpopulated_submodule(_index, prefix);
> die_path_inside_submodule(_index, );
>
> if (add_new_files) {
>
> base-commit: 4ede3d42dfb57f9a41ac96a1f216c62eb7566cc2
> --
> 2.18.0.windows.1
>


-- 
Duy

Re: [PATCH 0/3] Make add_missing_tags() linear

2018-11-02 Thread Derrick Stolee


On 11/2/2018 10:58 AM, Elijah Newren wrote:

On Thu, Nov 1, 2018 at 12:02 PM Derrick Stolee  wrote:

On 11/1/2018 2:57 PM, Elijah Newren wrote:

On Thu, Nov 1, 2018 at 5:32 AM Derrick Stolee  wrote:

No rush. I'd just like to understand how removing the commit-graph file
can make the new algorithm faster. Putting a similar count in the old
algorithm would involve giving a count for every call to
in_merge_bases_many(), which would be very noisy.

$ time git push --dry-run --follow-tags /home/newren/repo-mirror
count: 92912
To /home/newren/repo-mirror
   * [new branch]  test5 -> test5

real0m3.024s
user0m2.752s
sys0m0.320s

Is the above test with or without the commit-graph file? Can you run it
in the other mode, too? I'd like to see if the "count" value changes
when the only difference is the presence of a commit-graph file.

I apologize for providing misleading information earlier; this was an
apples to oranges comparison.  Here's what I did:


git clone coworker.bundle coworker-repo
cd coworker-repo
time git push --dry-run --follow-tags /home/newren/repo-mirror
git config core.commitgraph true
git config gc.writecommitgraph true
git gc
time git push --dry-run --follow-tags /home/newren/nucleus-mirror


I figured I had just done a fresh clone, so surely the gc wouldn't do
anything other than write the .git/objects/info/commit-graph file.
However, the original bundle contained many references outside of
refs/heads/ and refs/tags/:

$ git bundle list-heads ../coworker.bundle | grep -v -e refs/heads/ -e
refs/tags/ -e HEAD | wc -l
2396

These other refs apparently referred to objects not otherwise
referenced in refs/heads/ and refs/tags/, and caused the gc to explode
lots of loose objects:
$ git count-objects -v
count: 147604
size: 856416
in-pack: 1180692
packs: 1
size-pack: 796143
prune-packable: 0
garbage: 0
size-garbage: 0


The slowdown with commit-graph was entirely due to there being lots of
loose objects (147K of them).  If I add a git-prune before doing the
timing with commit-graph, then the timing with commit-graph is faster
than the run without a commit-graph.

Sorry for the wild goose chase.

And thanks for the fixes; get_reachable_subset() makes things much
faster even without a commit-graph, and the commit-graph just improves
it more.  :-)



Thanks for double-checking! It's good to have confidence that this is a good

algorithm to use.


Thanks,

-Stolee

Re: [PATCH v1] add: speed up cmd_add() by utilizing read_cache_preload()

2018-11-02 Thread Junio C Hamano

Ben Peart  writes:

> From: Ben Peart 
>
> During an "add", a call is made to run_diff_files() which calls
> check_remove() for each index-entry.  The preload_index() code
> distributes some of the costs across multiple threads.

Nice.  I peeked around and noticed that we already do this in
builtin_diff_index() before running run_diff_index() when !cached,
and builtin_diff_files(), of course.

> Because the files checked are restricted to pathspec, adding individual
> files makes no measurable impact but on a Windows repo with ~200K files,
> 'git add .' drops from 6.3 seconds to 3.3 seconds for a 47% savings.

;-)

> diff --git a/builtin/add.c b/builtin/add.c
> index ad49806ebf..f65c172299 100644
> --- a/builtin/add.c
> +++ b/builtin/add.c
> @@ -445,11 +445,6 @@ int cmd_add(int argc, const char **argv, const char 
> *prefix)
>   return 0;
>   }
>  
> - if (read_cache() < 0)
> - die(_("index file corrupt"));
> -
> - die_in_unpopulated_submodule(_index, prefix);
> -
>   /*
>* Check the "pathspec '%s' did not match any files" block
>* below before enabling new magic.

It is not explained why this is not a mere s/read_cache/&_preload/
in the log message.  I can see it is because you wanted to make the
pathspec available to preload to further cut down the preloaded
paths, and I do not think it has any unintended (negatie) side
effect to parse the pathspec before populating the in-core index,
but that would have been a good thing to mention in the proposed log
message.

> @@ -459,6 +454,10 @@ int cmd_add(int argc, const char **argv, const char 
> *prefix)
>  PATHSPEC_SYMLINK_LEADING_PATH,
>  prefix, argv);
>  
> + if (read_cache_preload() < 0)
> + die(_("index file corrupt"));
> +
> + die_in_unpopulated_submodule(_index, prefix);
>   die_path_inside_submodule(_index, );
>  
>   if (add_new_files) {
>
> base-commit: 4ede3d42dfb57f9a41ac96a1f216c62eb7566cc2

Re: [PATCH] build: link with curl-defined linker flags

2018-11-02 Thread Junio C Hamano

James Knight  writes:

>  Makefile | 21 +++--
>  config.mak.uname |  5 ++---
>  configure.ac | 17 +++--
>  3 files changed, 16 insertions(+), 27 deletions(-)
>
> diff --git a/Makefile b/Makefile
> index b08d5ea25..c3be87b0e 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -183,10 +183,6 @@ all::
>  #
>  # Define NEEDS_SSL_WITH_CRYPTO if you need -lssl when using -lcrypto 
> (Darwin).
>  #
> -# Define NEEDS_SSL_WITH_CURL if you need -lssl with -lcurl (Minix).
> -#
> -# Define NEEDS_IDN_WITH_CURL if you need -lidn when using -lcurl (Minix).
> -#

Hmm, because the use of autoconf -> ./configure in our build
procedure is optional, wouldn't this change give regression to those
of us who use these Makefile variables to configure their build,
instead of relying on autoconf?

> diff --git a/config.mak.uname b/config.mak.uname
> index 8acdeb71f..923b8fa09 100644
> --- a/config.mak.uname
> +++ b/config.mak.uname
> @@ -431,8 +431,7 @@ ifeq ($(uname_S),Minix)
>   NO_NSEC = YesPlease
>   NEEDS_LIBGEN =
>   NEEDS_CRYPTO_WITH_SSL = YesPlease
> - NEEDS_IDN_WITH_CURL = YesPlease
> - NEEDS_SSL_WITH_CURL = YesPlease
> + CURL_LDFLAGS = -lssl -lcrypto -lidn

OK, as long as we describe how to update their config.mak to adjust
to the new world order, the "regression" I noticed earlier is not
too bad, and the way the new CURL_LDFLAGS variable drives the build
is much more direct and transparent than via the old way to specify
NEEDS_*_WITH_CURL to affect the build indirectly.

I think such "describe how to configure in the new world order"
should go to where NEEDS_*_WITH_CURL used to be, e.g. "Define
CURL_LDFLAGS to specify flags that you need to link using -lcurl;
see config.mak.uname and look for the variable for examples"
or something like that, perhaps.

Thanks.

Re: [PATCH 0/3] Make add_missing_tags() linear

2018-11-02 Thread Elijah Newren

On Thu, Nov 1, 2018 at 12:02 PM Derrick Stolee  wrote:
>
> On 11/1/2018 2:57 PM, Elijah Newren wrote:
> > On Thu, Nov 1, 2018 at 5:32 AM Derrick Stolee  wrote:
> >> No rush. I'd just like to understand how removing the commit-graph file
> >> can make the new algorithm faster. Putting a similar count in the old
> >> algorithm would involve giving a count for every call to
> >> in_merge_bases_many(), which would be very noisy.
> > $ time git push --dry-run --follow-tags /home/newren/repo-mirror
> > count: 92912
> > To /home/newren/repo-mirror
> >   * [new branch]  test5 -> test5
> >
> > real0m3.024s
> > user0m2.752s
> > sys0m0.320s
>
> Is the above test with or without the commit-graph file? Can you run it
> in the other mode, too? I'd like to see if the "count" value changes
> when the only difference is the presence of a commit-graph file.

I apologize for providing misleading information earlier; this was an
apples to oranges comparison.  Here's what I did:

git clone coworker.bundle coworker-repo
cd coworker-repo
time git push --dry-run --follow-tags /home/newren/repo-mirror
git config core.commitgraph true
git config gc.writecommitgraph true
git gc
time git push --dry-run --follow-tags /home/newren/nucleus-mirror

I figured I had just done a fresh clone, so surely the gc wouldn't do
anything other than write the .git/objects/info/commit-graph file.
However, the original bundle contained many references outside of
refs/heads/ and refs/tags/:

$ git bundle list-heads ../coworker.bundle | grep -v -e refs/heads/ -e
refs/tags/ -e HEAD | wc -l
2396

These other refs apparently referred to objects not otherwise
referenced in refs/heads/ and refs/tags/, and caused the gc to explode
lots of loose objects:
$ git count-objects -v
count: 147604
size: 856416
in-pack: 1180692
packs: 1
size-pack: 796143
prune-packable: 0
garbage: 0
size-garbage: 0

The slowdown with commit-graph was entirely due to there being lots of
loose objects (147K of them).  If I add a git-prune before doing the
timing with commit-graph, then the timing with commit-graph is faster
than the run without a commit-graph.

Sorry for the wild goose chase.

And thanks for the fixes; get_reachable_subset() makes things much
faster even without a commit-graph, and the commit-graph just improves
it more.  :-)

Re: [PATCH 1/1] poll: use GetTickCount64() to avoid wrap-around issues

2018-11-02 Thread Steve Hoelzer

On Thu, Nov 1, 2018 at 5:22 AM Johannes Sixt  wrote:
>
> Am 31.10.18 um 22:11 schrieb Steve Hoelzer via GitGitGadget:
> > From: Steve Hoelzer 
> >
> >  From Visual Studio 2015 Code Analysis: Warning C28159 Consider using
> > 'GetTickCount64' instead of 'GetTickCount'.
> >
> > Reason: GetTickCount() overflows roughly every 49 days. Code that does
> > not take that into account can loop indefinitely. GetTickCount64()
> > operates on 64 bit values and does not have that problem.
> >
> > Note: this patch has been carried in Git for Windows for almost two
> > years, but with a fallback for Windows XP, as the GetTickCount64()
> > function is only available on Windows Vista and later. However, in the
> > meantime we require Vista or later, hence we can drop that fallback.
> >
> > Signed-off-by: Steve Hoelzer 
> > Signed-off-by: Johannes Schindelin 
> > ---
> >   compat/poll/poll.c | 10 +++---
> >   1 file changed, 7 insertions(+), 3 deletions(-)
> >
> > diff --git a/compat/poll/poll.c b/compat/poll/poll.c
> > index ad5dcde439..4abbfcb6a4 100644
> > --- a/compat/poll/poll.c
> > +++ b/compat/poll/poll.c
> > @@ -18,6 +18,9 @@
> >  You should have received a copy of the GNU General Public License along
> >  with this program; if not, see .  */
> >
> > +/* To bump the minimum Windows version to Windows Vista */
> > +#include "git-compat-util.h"
> > +
> >   /* Tell gcc not to warn about the (nfd < 0) tests, below.  */
> >   #if (__GNUC__ == 4 && 3 <= __GNUC_MINOR__) || 4 < __GNUC__
> >   # pragma GCC diagnostic ignored "-Wtype-limits"
> > @@ -449,7 +452,8 @@ poll (struct pollfd *pfd, nfds_t nfd, int timeout)
> > static HANDLE hEvent;
> > WSANETWORKEVENTS ev;
> > HANDLE h, handle_array[FD_SETSIZE + 2];
> > -  DWORD ret, wait_timeout, nhandles, start = 0, elapsed, orig_timeout = 0;
> > +  DWORD ret, wait_timeout, nhandles, elapsed, orig_timeout = 0;
> > +  ULONGLONG start = 0;
> > fd_set rfds, wfds, xfds;
> > BOOL poll_again;
> > MSG msg;
> > @@ -465,7 +469,7 @@ poll (struct pollfd *pfd, nfds_t nfd, int timeout)
> > if (timeout != INFTIM)
> >   {
> > orig_timeout = timeout;
> > -  start = GetTickCount();
> > +  start = GetTickCount64();
> >   }
> >
> > if (!hEvent)
> > @@ -614,7 +618,7 @@ restart:
> >
> > if (!rc && orig_timeout && timeout != INFTIM)
> >   {
> > -  elapsed = GetTickCount() - start;
> > +  elapsed = (DWORD)(GetTickCount64() - start);
>
> AFAICS, this subtraction in the old code is the correct way to take
> account of wrap-arounds in the tick count. The new code truncates the 64
> bit difference to 32 bits; the result is exactly identical to a
> difference computed from truncated 32 bit values, which is what we had
> in the old code.
>
> IOW, there is no change in behavior. The statement "avoid wrap-around
> issues" in the subject line is not correct. The patch's only effect is
> that it removes Warning C28159.
>
> What is really needed is that all quantities in the calculations are
> promoted to ULONGLONG. Unless, of course, we agree that a timeout of
> more than 49 days cannot happen ;)

Yep, correct on all counts. I'm in favor of changing the commit message to
only say that this patch removes Warning C28159.

Steve

Hello

2018-11-02 Thread Joshua Mene

Good day I am Dr. Joshua Mene a banker here in Cotonu Benin Republic, I have a 
lucrative business transaction for you and it is profitable and risk free. 
kindly reach me via my private email: me...@rediffmail.com for more details.

Thanks
Dr. Joshua Mene

Re: [RFC] Generation Number v2

2018-11-02 Thread Jakub Narebski

Derrick Stolee  writes:
> On 10/29/2018 11:59 PM, Junio C Hamano wrote:
>> Derrick Stolee  writes:
[...]
>>> * **Compatible?** In our test implementation, we use a previously unused
>>>byte of data in the commit-graph format to indicate which reachability
>>>index version we are using. Existing clients ignore this value, so we
>>>will want to consider if these new indexes are _backwards compatible_.
>>>That is, will they still report correct values if they ignore this byte
>>>and use the generation number column from the commit-graph file assuming
>>>the values are minimum generation numbers?
>>
>> I personally consider that the current commit-graph with generation
>> numbers experimental, so I am not sure how much we care about this.
>>
>> Having said that.
>>
>> By the above definition, any new index that is wider than the
>> current generation number cannot be compatible (can we even tell the
>> existing clients how wide each elements in the ix array is?)
>>
>> In any case, perhaps the first thing to do is to update the clients
>> so that they stop ignoring the version number field, and instead
>> work without generation number when there is no version of reach.ix
>> available in the file?  That way, a better reachablility index can
>> be chosen freely without having to worry about the compatibility.
>
> I can work on that. It should be as simple as setting commit->generation to
> GENERATION_NUMBER_ZERO in fill_commit_in_graph when the graph
> has a different version.

That is a very good idea, but we have here a chicken-and-egg problem.
The only way to denote that the generation numbers / reachability index
have changed (for reading/using: that it changed in backwards
incompatibile way; for update: that it changed at all) is to change the
format version:

  1-byte version number:
  Currently, the only valid version is 1.

But if we assume that different commit-graph format version simply means
that commit->generation is to be set to GENERATION_NUMBER_ZERO, then we
block possible changes to the structure of the commit-graph file.

[Reads further in thread]

Ah, I see that's why you want to introduce [1]:

DS>
DS> 3. Reachability Index versioning

[1]: 
https://public-inbox.org/git/6902dbff-d9f6-e897-2c20-d0cb47a50...@gmail.com/

>>> * **Immutable?** Git objects are _immutable_. If you change an object you
>>>actually create a new object with a new object ID. Are the values we 
>>> store
>>>for these reachability indexes also immutable?
>>
>> Even if we do not embed the reachability ix in commit objects,
>> having an immutable value is probably a must if we want to make them
>> incrementally computable, so this is a very good property to have.
>> Unless there is a clever idea to incrementally compute a mutable
>> reach.ix, my gut instinct says that this property is a must.

I think that the reachability index can be mutable and non-local, but
still being able to be incrementally updated, though perhaps it would
require not only calculations for new commits, but recalculations for
some limited subset of commits existing in the commit-graph file.

I'm trying to modify exact definition of maximal generation numbers to
check if I can find out a version that exhibits nearly-incremental
updates property.

>> Another thing, perhaps related to "local" below, is if exactly the
>> same reach.ix is computed by anybody, given an identical commit
>> history graph (perhaps "reproducibility"?).  I think most of the
>> candidates you listed are reproducible without a fixed tie-breaker,
>> but I am not sure about felineY() thing.

felineY() is deterministic (or can be made deterministic) for given
felineX() (i.e. given [reverse] topological order).

>>> * **Local?** Are these values **locally computable**? That is, do we only
>>>need to look at the parents of a commit (assuming those parents have
>>>computed values) in order to determine the value at that commit?
>>
>> A subset of non-local reachability ix, for example, the ones that
>> need to know what each commit's children are, cannot be immutable,
>> as adding new objects to the graph (either with locally committing,
>> or transferring objects from other repositories) would affect the
>> ix; is this true for all non-local reachability ix, I wonder?
>
> As a thought experiment, we could define a function size(C) to be the
> number of commits reachable from C.

Let's call it reach(C), instead ;-)

>This is not locally-computable
> from the size values at C's parents due to the inclusion-exclusion
> principle. We would need to compute it by walking the reachable set
> and counting (resulting in quadratic performance overall) but is
> immutable. Since the performance cost is so expensive (unlike the
> linear costs in the other non-local versions) I didn't include it
> in my comparison.

Let's define Reach(C) as set of all commits reachable from commit C.
Then reach(C) = ||Reach(C)||, where

[PATCH v1] add: speed up cmd_add() by utilizing read_cache_preload()

2018-11-02 Thread Ben Peart

From: Ben Peart 

During an "add", a call is made to run_diff_files() which calls
check_remove() for each index-entry.  The preload_index() code distributes
some of the costs across multiple threads.

Because the files checked are restricted to pathspec, adding individual
files makes no measurable impact but on a Windows repo with ~200K files,
'git add .' drops from 6.3 seconds to 3.3 seconds for a 47% savings.

Signed-off-by: Ben Peart 
---

Notes:
Base Ref: master
Web-Diff: https://github.com/benpeart/git/commit/fc4830b545
Checkout: git fetch https://github.com/benpeart/git add-preload-index-v1 && 
git checkout fc4830b545

 builtin/add.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/builtin/add.c b/builtin/add.c
index ad49806ebf..f65c172299 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -445,11 +445,6 @@ int cmd_add(int argc, const char **argv, const char 
*prefix)
return 0;
}
 
-   if (read_cache() < 0)
-   die(_("index file corrupt"));
-
-   die_in_unpopulated_submodule(_index, prefix);
-
/*
 * Check the "pathspec '%s' did not match any files" block
 * below before enabling new magic.
@@ -459,6 +454,10 @@ int cmd_add(int argc, const char **argv, const char 
*prefix)
   PATHSPEC_SYMLINK_LEADING_PATH,
   prefix, argv);
 
+   if (read_cache_preload() < 0)
+   die(_("index file corrupt"));
+
+   die_in_unpopulated_submodule(_index, prefix);
die_path_inside_submodule(_index, );
 
if (add_new_files) {

base-commit: 4ede3d42dfb57f9a41ac96a1f216c62eb7566cc2
-- 
2.18.0.windows.1

[PATCH v2 0/3] Make add_missing_tags() linear

2018-11-02 Thread Derrick Stolee via GitGitGadget

As reported earlier [1], the add_missing_tags() method in remote.c has
quadratic performance. Some of that performance is curbed due to the
generation-number cutoff in in_merge_bases_many(). However, that fix doesn't
help users without a commit-graph, and it can still be painful if that
cutoff is sufficiently low compared to the tags we are using for
reachability testing.

Add a new method in commit-reach.c called get_reachable_subset() which does
a many-to-many reachability test. Starting at the 'from' commits, walk until
the generation is below the smallest generation in the 'to' commits, or all
'to' commits have been discovered. This performs only one commit walk for
the entire add_missing_tags() method, giving linear performance in the worst
case.

Tests are added in t6600-test-reach.sh to ensure get_reachable_subset()
works independently of its application in add_missing_tags().

Thanks, -Stolee

[1] 
https://public-inbox.org/git/cabpp-becpsoxudovjbdg_3w9wus102rw+e+qpmd4g3qyd-q...@mail.gmail.com/

Derrick Stolee (3):
  commit-reach: implement get_reachable_subset
  test-reach: test get_reachable_subset
  remote: make add_missing_tags() linear

 commit-reach.c| 70 +++
 commit-reach.h| 13 
 remote.c  | 34 -
 t/helper/test-reach.c | 34 ++---
 t/t6600-test-reach.sh | 52 
 5 files changed, 198 insertions(+), 5 deletions(-)


base-commit: c670b1f876521c9f7cd40184bf7ed05aad843433
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-60%2Fderrickstolee%2Fadd-missing-tags-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-60/derrickstolee/add-missing-tags-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/60

Range-diff vs v1:

 1:  4c0c5c9143 ! 1:  9e570603bd commit-reach: implement get_reachable_subset
 @@ -49,7 +49,7 @@
  +
  +struct commit_list *get_reachable_subset(struct commit **from, int 
nr_from,
  +  struct commit **to, int nr_to,
 -+  int reachable_flag)
 ++  unsigned int reachable_flag)
  +{
  + struct commit **item;
  + struct commit *current;
 @@ -129,9 +129,12 @@
  + * Return a list of commits containing the commits in the 'to' array
  + * that are reachable from at least one commit in the 'from' array.
  + * Also add the given 'flag' to each of the commits in the returned list.
 ++ *
 ++ * This method uses the PARENT1 and PARENT2 flags during its operation,
 ++ * so be sure these flags are not set before calling the method.
  + */
  +struct commit_list *get_reachable_subset(struct commit **from, int 
nr_from,
  +  struct commit **to, int nr_to,
 -+  int reachable_flag);
 ++  unsigned int reachable_flag);
  +
   #endif
 2:  382f4f4a5b = 2:  52e847b928 test-reach: test get_reachable_subset
 3:  ecbed3de5c = 3:  dfaceb162f remote: make add_missing_tags() linear

-- 
gitgitgadget

[PATCH v2 1/3] commit-reach: implement get_reachable_subset

2018-11-02 Thread Derrick Stolee via GitGitGadget

From: Derrick Stolee 

The existing reachability algorithms in commit-reach.c focus on
finding merge-bases or determining if all commits in a set X can
reach at least one commit in a set Y. However, for two commits sets
X and Y, we may also care about which commits in Y are reachable
from at least one commit in X.

Implement get_reachable_subset() which answers this question. Given
two arrays of commits, 'from' and 'to', return a commit_list with
every commit from the 'to' array that is reachable from at least
one commit in the 'from' array.

The algorithm is a simple walk starting at the 'from' commits, using
the PARENT2 flag to indicate "this commit has already been added to
the walk queue". By marking the 'to' commits with the PARENT1 flag,
we can determine when we see a commit from the 'to' array. We remove
the PARENT1 flag as we add that commit to the result list to avoid
duplicates.

The order of the resulting list is a reverse of the order that the
commits are discovered in the walk.

There are a couple shortcuts to avoid walking more than we need:

1. We determine the minimum generation number of commits in the
   'to' array. We do not walk commits with generation number
   below this minimum.

2. We count how many distinct commits are in the 'to' array, and
   decrement this count when we discover a 'to' commit during the
   walk. If this number reaches zero, then we can terminate the
   walk.

Tests will be added using the 'test-tool reach' helper in a
subsequent commit.

Signed-off-by: Derrick Stolee 
---
 commit-reach.c | 70 ++
 commit-reach.h | 13 ++
 2 files changed, 83 insertions(+)

diff --git a/commit-reach.c b/commit-reach.c
index 9f79ce0a22..8ad5352752 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -688,3 +688,73 @@ int can_all_from_reach(struct commit_list *from, struct 
commit_list *to,
object_array_clear(_objs);
return result;
 }
+
+struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
+struct commit **to, int nr_to,
+unsigned int reachable_flag)
+{
+   struct commit **item;
+   struct commit *current;
+   struct commit_list *found_commits = NULL;
+   struct commit **to_last = to + nr_to;
+   struct commit **from_last = from + nr_from;
+   uint32_t min_generation = GENERATION_NUMBER_INFINITY;
+   int num_to_find = 0;
+
+   struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
+
+   for (item = to; item < to_last; item++) {
+   struct commit *c = *item;
+   
+   parse_commit(c);
+   if (c->generation < min_generation)
+   min_generation = c->generation;
+
+   if (!(c->object.flags & PARENT1)) {
+   c->object.flags |= PARENT1;
+   num_to_find++;
+   }
+   }
+
+   for (item = from; item < from_last; item++) {
+   struct commit *c = *item;
+   if (!(c->object.flags & PARENT2)) {
+   c->object.flags |= PARENT2;
+   parse_commit(c);
+
+   prio_queue_put(, *item);
+   }
+   }
+
+   while (num_to_find && (current = prio_queue_get()) != NULL) {
+   struct commit_list *parents;
+
+   if (current->object.flags & PARENT1) {
+   current->object.flags &= ~PARENT1;
+   current->object.flags |= reachable_flag;
+   commit_list_insert(current, _commits);
+   num_to_find--;
+   }
+
+   for (parents = current->parents; parents; parents = 
parents->next) {
+   struct commit *p = parents->item;
+
+   parse_commit(p);
+
+   if (p->generation < min_generation)
+   continue;
+
+   if (p->object.flags & PARENT2)
+   continue;
+
+   p->object.flags |= PARENT2;
+   prio_queue_put(, p);
+   }
+   }
+
+   clear_commit_marks_many(nr_to, to, PARENT1);
+   clear_commit_marks_many(nr_from, from, PARENT2);
+
+   return found_commits;
+}
+
diff --git a/commit-reach.h b/commit-reach.h
index 7d313e2975..bb34af0269 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -74,4 +74,17 @@ int can_all_from_reach_with_flag(struct object_array *from,
 int can_all_from_reach(struct commit_list *from, struct commit_list *to,
   int commit_date_cutoff);
 
+
+/*
+ * Return a list of commits containing the commits in the 'to' array
+ * that are reachable from at least one commit in the 'from' array.
+ * Also add the given 'flag' to each of the commits in the returned list.
+ *
+ * This method uses the PARENT1 and PARENT2

[PATCH v2 3/3] remote: make add_missing_tags() linear

2018-11-02 Thread Derrick Stolee via GitGitGadget

From: Derrick Stolee 

The add_missing_tags() method currently has quadratic behavior.
This is due to a linear number (based on number of tags T) of
calls to in_merge_bases_many, which has linear performance (based
on number of commits C in the repository).

Replace this O(T * C) algorithm with an O(T + C) algorithm by
using get_reachable_subset(). We ignore the return list and focus
instead on the reachable_flag assigned to the commits we care
about, because we need to interact with the tag ref and not just
the commit object.

Signed-off-by: Derrick Stolee 
---
 remote.c | 34 +-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/remote.c b/remote.c
index 81f4f01b00..b850f2feb3 100644
--- a/remote.c
+++ b/remote.c
@@ -1205,9 +1205,36 @@ static void add_missing_tags(struct ref *src, struct ref 
**dst, struct ref ***ds
 * sent to the other side.
 */
if (sent_tips.nr) {
+   const int reachable_flag = 1;
+   struct commit_list *found_commits;
+   struct commit **src_commits;
+   int nr_src_commits = 0, alloc_src_commits = 16;
+   ALLOC_ARRAY(src_commits, alloc_src_commits);
+
for_each_string_list_item(item, _tag) {
struct ref *ref = item->util;
+   struct commit *commit;
+
+   if (is_null_oid(>new_oid))
+   continue;
+   commit = lookup_commit_reference_gently(the_repository,
+   >new_oid,
+   1);
+   if (!commit)
+   /* not pushing a commit, which is not an error 
*/
+   continue;
+
+   ALLOC_GROW(src_commits, nr_src_commits + 1, 
alloc_src_commits);
+   src_commits[nr_src_commits++] = commit;
+   }
+
+   found_commits = get_reachable_subset(sent_tips.tip, 
sent_tips.nr,
+src_commits, 
nr_src_commits,
+reachable_flag);
+
+   for_each_string_list_item(item, _tag) {
struct ref *dst_ref;
+   struct ref *ref = item->util;
struct commit *commit;
 
if (is_null_oid(>new_oid))
@@ -1223,7 +1250,7 @@ static void add_missing_tags(struct ref *src, struct ref 
**dst, struct ref ***ds
 * Is this tag, which they do not have, reachable from
 * any of the commits we are sending?
 */
-   if (!in_merge_bases_many(commit, sent_tips.nr, 
sent_tips.tip))
+   if (!(commit->object.flags & reachable_flag))
continue;
 
/* Add it in */
@@ -1231,7 +1258,12 @@ static void add_missing_tags(struct ref *src, struct ref 
**dst, struct ref ***ds
oidcpy(_ref->new_oid, >new_oid);
dst_ref->peer_ref = copy_ref(ref);
}
+
+   clear_commit_marks_many(nr_src_commits, src_commits, 
reachable_flag);
+   free(src_commits);
+   free_commit_list(found_commits);
}
+
string_list_clear(_tag, 0);
free(sent_tips.tip);
 }
-- 
gitgitgadget

[PATCH v2 2/3] test-reach: test get_reachable_subset

2018-11-02 Thread Derrick Stolee via GitGitGadget

From: Derrick Stolee 

The get_reachable_subset() method returns the list of commits in
the 'to' array that are reachable from at least one commit in the
'from' array. Add tests that check this method works in a few
cases:

1. All commits in the 'to' list are reachable. This exercises the
   early-termination condition.

2. Some commits in the 'to' list are reachable. This exercises the
   loop-termination condition.

3. No commits in the 'to' list are reachable. This exercises the
   NULL return condition.

Signed-off-by: Derrick Stolee 
---
 t/helper/test-reach.c | 34 
 t/t6600-test-reach.sh | 52 +++
 2 files changed, 82 insertions(+), 4 deletions(-)

diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
index 08d2ea68e8..a0272178b7 100644
--- a/t/helper/test-reach.c
+++ b/t/helper/test-reach.c
@@ -32,8 +32,8 @@ int cmd__reach(int ac, const char **av)
struct commit *A, *B;
struct commit_list *X, *Y;
struct object_array X_obj = OBJECT_ARRAY_INIT;
-   struct commit **X_array;
-   int X_nr, X_alloc;
+   struct commit **X_array, **Y_array;
+   int X_nr, X_alloc, Y_nr, Y_alloc;
struct strbuf buf = STRBUF_INIT;
struct repository *r = the_repository;
 
@@ -44,9 +44,10 @@ int cmd__reach(int ac, const char **av)
 
A = B = NULL;
X = Y = NULL;
-   X_nr = 0;
-   X_alloc = 16;
+   X_nr = Y_nr = 0;
+   X_alloc = Y_alloc = 16;
ALLOC_ARRAY(X_array, X_alloc);
+   ALLOC_ARRAY(Y_array, Y_alloc);
 
while (strbuf_getline(, stdin) != EOF) {
struct object_id oid;
@@ -92,6 +93,8 @@ int cmd__reach(int ac, const char **av)
 
case 'Y':
commit_list_insert(c, );
+   ALLOC_GROW(Y_array, Y_nr + 1, Y_alloc);
+   Y_array[Y_nr++] = c;
break;
 
default:
@@ -136,6 +139,29 @@ int cmd__reach(int ac, const char **av)
filter.with_commit_tag_algo = 0;
 
printf("%s(_,A,X,_):%d\n", av[1], commit_contains(, A, 
X, ));
+   } else if (!strcmp(av[1], "get_reachable_subset")) {
+   const int reachable_flag = 1;
+   int i, count = 0;
+   struct commit_list *current;
+   struct commit_list *list = get_reachable_subset(X_array, X_nr,
+   Y_array, Y_nr,
+   reachable_flag);
+   printf("get_reachable_subset(X,Y)\n");
+   for (current = list; current; current = current->next) {
+   if (!(list->item->object.flags & reachable_flag))
+   die(_("commit %s is not marked reachable"),
+   oid_to_hex(>item->object.oid));
+   count++;
+   }
+   for (i = 0; i < Y_nr; i++) {
+   if (Y_array[i]->object.flags & reachable_flag)
+   count--;
+   }
+
+   if (count < 0)
+   die(_("too many commits marked reachable"));
+
+   print_sorted_commit_ids(list);
}
 
exit(0);
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index ae94b27f70..a0c64e617a 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -265,4 +265,56 @@ test_expect_success 'commit_contains:miss' '
test_three_modes commit_contains --tag
 '
 
+test_expect_success 'get_reachable_subset:all' '
+   cat >input <<-\EOF &&
+   X:commit-9-1
+   X:commit-8-3
+   X:commit-7-5
+   X:commit-6-6
+   X:commit-1-7
+   Y:commit-3-3
+   Y:commit-1-7
+   Y:commit-5-6
+   EOF
+   (
+   echo "get_reachable_subset(X,Y)" &&
+   git rev-parse commit-3-3 \
+ commit-1-7 \
+ commit-5-6 | sort
+   ) >expect &&
+   test_three_modes get_reachable_subset
+'
+
+test_expect_success 'get_reachable_subset:some' '
+   cat >input <<-\EOF &&
+   X:commit-9-1
+   X:commit-8-3
+   X:commit-7-5
+   X:commit-1-7
+   Y:commit-3-3
+   Y:commit-1-7
+   Y:commit-5-6
+   EOF
+   (
+   echo "get_reachable_subset(X,Y)" &&
+   git rev-parse commit-3-3 \
+ commit-1-7 | sort
+   ) >expect &&
+   test_three_modes get_reachable_subset
+'
+
+test_expect_success 'get_reachable_subset:none' '
+   cat >input <<-\EOF &&
+   X:commit-9-1
+   X:commit-8-3
+   X:commit-7-5
+   X:commit-1-7
+   Y:commit-9-3
+   Y:commit-7-6
+   Y:commit-2-8
+   EOF
+   echo "get_reachable_subset(X,Y)" >expect &&
+   test_three_modes get_reachable_subset
+'
+

Re: [PATCH 19/24] submodule: use submodule repos for object lookup

2018-11-02 Thread Derrick Stolee


On 10/30/2018 6:08 PM, Stefan Beller wrote:

This converts the 'show_submodule_header' function to use
the repository API properly, such that the submodule objects
are not added to the main object store.

Signed-off-by: Stefan Beller 


A couple tests are broken in 'pu' when run with GIT_TEST_COMMIT_GRAPH=1, 
including t4041-diff-submodule-option.sh. The failure bisects to this patch.


Here is a verbose output of the first failure in that script:;


expecting success:
    git diff-index -p --submodule=log HEAD >actual &&
    cat >expected <<-EOF &&
    Submodule sm1 $head2..$head3 (rewind):
  < Add foo3 ($added foo3)
  < Add foo2 ($added foo2)
    EOF
    test_cmp expected actual

+ git diff-index -p --submodule=log HEAD
+ cat
+ test_cmp expected actual
+ diff -u expected actual
--- expected    2018-11-02 12:58:43.429262380 +
+++ actual  2018-11-02 12:58:43.429262380 +
@@ -1,3 +1,5 @@
-Submodule sm1 30b9670..dafb207 (rewind):
+Submodule sm1 30b9670...dafb207:
   < Add foo3 (hinzugefügt foo3)
   < Add foo2 (hinzugefügt foo2)
+  > Add foo1 (hinzugefügt foo1)
+  < Add foo1 (hinzugefügt foo1)
error: last command exited with $?=1
not ok 9 - modified submodule(backward)

I've been looking into the patch below to see if there is an obvious 
problem, but the best I can think is that open_submodule() creates an 
alternate 'struct repository' and somehow the commit-graph feature is 
interacting poorly with that struct.


Stefan, do you know what's going on?

Thanks,

-Stolee


---
  submodule.c | 76 ++---
  1 file changed, 61 insertions(+), 15 deletions(-)

diff --git a/submodule.c b/submodule.c
index d9d3046689..0fdda45252 100644
--- a/submodule.c
+++ b/submodule.c
@@ -443,7 +443,7 @@ static int prepare_submodule_summary(struct rev_info *rev, 
const char *path,
return prepare_revision_walk(rev);
  }
  
-static void print_submodule_summary(struct rev_info *rev, struct diff_options *o)

+static void print_submodule_summary(struct repository *r, struct rev_info 
*rev, struct diff_options *o)
  {
static const char format[] = "  %m %s";
struct strbuf sb = STRBUF_INIT;
@@ -454,7 +454,8 @@ static void print_submodule_summary(struct rev_info *rev, 
struct diff_options *o
ctx.date_mode = rev->date_mode;
ctx.output_encoding = get_log_output_encoding();
strbuf_setlen(, 0);
-   format_commit_message(commit, format, , );
+   repo_format_commit_message(r, commit, format, ,
+ );
strbuf_addch(, '\n');
if (commit->object.flags & SYMMETRIC_LEFT)
diff_emit_submodule_del(o, sb.buf);
@@ -481,14 +482,47 @@ void prepare_submodule_repo_env(struct argv_array *out)
 DEFAULT_GIT_DIR_ENVIRONMENT);
  }
  
-/* Helper function to display the submodule header line prior to the full

- * summary output. If it can locate the submodule objects directory it will
- * attempt to lookup both the left and right commits and put them into the
- * left and right pointers.
+/*
+ * Initialize 'out' based on the provided submodule path.
+ *
+ * Unlike repo_submodule_init, this tolerates submodules not present
+ * in .gitmodules. This function exists only to preserve historical behavior,
+ *
+ * Returns 0 on success, -1 when the submodule is not present.
   */
-static void show_submodule_header(struct diff_options *o, const char *path,
+static struct repository *open_submodule(const char *path)
+{
+   struct strbuf sb = STRBUF_INIT;
+   struct repository *out = xmalloc(sizeof(*out));
+
+   if (submodule_to_gitdir(, path) || repo_init(out, sb.buf, NULL)) {
+   strbuf_release();
+   free(out);
+   return NULL;
+   }
+
+   out->submodule_prefix = xstrfmt("%s%s/",
+   the_repository->submodule_prefix ?
+   the_repository->submodule_prefix :
+   "", path);
+
+   strbuf_release();
+   return out;
+}
+
+/*
+ * Helper function to display the submodule header line prior to the full
+ * summary output.
+ *
+ * If it can locate the submodule git directory it will create a repository
+ * handle for the submodule and lookup both the left and right commits and
+ * put them into the left and right pointers.
+ */
+static void show_submodule_header(struct diff_options *o,
+   const char *path,
struct object_id *one, struct object_id *two,
unsigned dirty_submodule,
+   struct repository *sub,
struct commit **left, struct commit **right,
struct commit_list **merge_bases)
  {
@@ -507,7 +541,7 @@ static void show_submodule_header(struct diff_options *o, 
const char *path,
else if (is_null_oid(two))

Hello

2018-11-02 Thread Joshua Mene

Good day I am Dr. Joshua Mene a banker here in Cotonu Benin Republic, I have a 
lucrative business transaction for you and it is profitable and risk free. 
kindly reach me via my private email: me...@rediffmail.com for more details.

Thanks
Dr. Joshua Mene

Re: [PATCH] it's called read_object_file() these days

2018-11-02 Thread Junio C Hamano

Jeff King  writes:

> There's another mention in Documentation/technical/api-object-access.txt.

Yes, and we are on the same page on that one.

>
> But since the entire API is undocumented, I'm not sure it matters much.
> That file has been a placeholder since 2007. Maybe we should just delete
> it; its existence does not seem to be guilting anyone into documenting,
> and these days we'd prefer to do it in-header anyway.
>
> -Peff

Re: [PATCH 0/9] saner xdiff hunk callbacks

2018-11-02 Thread Junio C Hamano

Jeff King  writes:

> ... it seems rather silly that we have xdiff generate a string
> just so that we can parse it from within the same process. So instead I
> improved the xdiff interface to pass the actual integers out, made use
> of it as appropriate, and then in the final patch we can just drop the
> buggy function entirely.

Acked-by: me.

Thanks.  This is exactly what I wanted to do for a very long time,
ever since I did the original xdiff-interface.c

Dear intended recipient

2018-11-02 Thread Hama Diallo





--
Greetings,

I wish to seek your assistance for the transfer of US$35M depository
made by a politician for an investment program that has
remained dormant for years now.I shall provide you with more details
and relevant documents that will help you understand the transaction.

Mr. Hama Diallo
--

Re: [PATCH v2 06/16] sequencer: refactor sequencer_add_exec_commands() to work on a todo_list

2018-11-02 Thread Phillip Wood

Hi Alban

On 01/11/2018 23:31, Alban Gruin wrote:
> Le 30/10/2018 à 17:47, Phillip Wood a écrit :
>> On 27/10/2018 22:29, Alban Gruin wrote:
>>> This refactors sequencer_add_exec_commands() to work on a todo_list to
>>> avoid redundant reads and writes to the disk.
>>>
>>> An obvious way to do this would be to insert the `exec' command between
>>> the other commands, and reparse it once this is done.  This is not what
>>> is done here.  Instead, the command is appended to the buffer once, and
>>> a new list of items is created.  Items from the old list are copied to
>>> it, and new `exec' items are appended when necessary.  This eliminates
>>> the need to reparse the todo list, but this also means its buffer cannot
>>> be directly written to the disk, hence todo_list_write_to_disk().
>>
>> I'd reword this slightly, maybe
>>
>> Instead of just inserting the `exec' command between the other commands,
>> and re-parsing the buffer at the end the exec command is appended to the
>> buffer once, and a new list of items is created.  Items from the old
>> list are copied across and new `exec' items are appended when necessary.
>>  This eliminates the need to reparse the buffer, but this also means we
>> have to use todo_list_write_to_disk() to write the file.
>>
>>> sequencer_add_exec_commands() still reads the todo list from the disk,
>>> as it is needed by rebase -p.  todo_list_add_exec_commands() works on a
>>> todo_list structure, and reparses it at the end.
>>
>> I think the saying 'reparses' is confusing as that is what we're trying
>> to avoid.
>>
>>> complete_action() still uses sequencer_add_exec_commands() for now.
>>> This will be changed in a future commit.
>>>
>>> Signed-off-by: Alban Gruin 
>>> ---
>>>   sequencer.c | 69 +
>>>   1 file changed, 49 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/sequencer.c b/sequencer.c
>>> index e12860c047..12a3efeca8 100644
>>> --- a/sequencer.c
>>> +++ b/sequencer.c
>>> @@ -4216,6 +4216,50 @@ int sequencer_make_script(FILE *out, int argc,
>>> const char **argv,
>>>   return 0;
>>>   }
>>>   +static void todo_list_add_exec_commands(struct todo_list *todo_list,
>>> +    const char *commands)
>>> +{
>>> +    struct strbuf *buf = _list->buf;
>>> +    const char *old_buf = buf->buf;
>>> +    size_t commands_len = strlen(commands + strlen("exec ")) - 1;
>>> +    int i, first = 1, nr = 0, alloc = 0;
>>
>> Minor nit pick, I think it is clearer if first is initialized just
>> before the loop as it is in the deleted code below.
>>
>>> +    struct todo_item *items = NULL,
>>> +    base_item = {TODO_EXEC, NULL, 0, 0, commands_len, 0};
>>> +
>>> +    strbuf_addstr(buf, commands);
>>> +    base_item.offset_in_buf = buf->len - commands_len - 1;
>>> +    base_item.arg = buf->buf + base_item.offset_in_buf;
>>
>> I think if the user gives --exec more than once on the command line then
>> commands will contain more than one exec command so this needs to parse
>> commands and create one todo_item for each command.
>>
> 
> Ouch, you’re right.  Thanks for the heads up.

I haven't looked how difficult it would be but it might be best to
change the option parsing to pass an array of strings containing the
exec commands rather than one long string so we can just loop over the
array here.

> 
>>> +
>>> +    /*
>>> + * Insert  after every pick. Here, fixup/squash chains
>>> + * are considered part of the pick, so we insert the commands
>>> *after*
>>> + * those chains if there are any.
>>> + */
>>> +    for (i = 0; i < todo_list->nr; i++) {
>>> +    enum todo_command command = todo_list->items[i].command;
>>> +    if (todo_list->items[i].arg)
>>> +    todo_list->items[i].arg = todo_list->items[i].arg -
>>> old_buf + buf->buf;
>>> +
>>> +    if (command == TODO_PICK && !first) {
>>> +    ALLOC_GROW(items, nr + 1, alloc);
>>> +    memcpy(items + nr++, _item, sizeof(struct todo_item));
>>
>> I think it would be clearer to say
>> items[nr++] = base_item;
>> rather than using memcpy. This applies below and to some of the other
>> patches as well. Also this needs to loop over all the base_items if the
>> user gave --exec more than once on the command line.
>>
> 
> I agree with you, it’s way more readable, IMO.  But for some reason, I
> thought it was not possible to assign a struct to another in C.

In general one needs to be careful as it is only does a shallow copy but
that is exactly what we want here. Having said that if we have an array
of exec commands to insert then it might be worth sticking with memcpy()
here so the whole array can be copied in one go.

Best Wishes

Phillip

>> Best Wishes
>>
>> Phillip
>>
> 
> Cheers,
> Alban
>

[PATCH v2] send-email: Avoid empty transfer encoding header

2018-11-02 Thread Aaron Lindsay

Fix a small bug introduced by "7a36987ff (send-email: add an auto option
for transfer encoding, 2018-07-14)"

I saw the following message when setting --transfer-encoding for a file
with the same encoding:
$ git send-email --transfer-encoding=8bit example.patch
Use of uninitialized value $xfer_encoding in concatenation (.) or string
at /usr/lib/git-core/git-send-email line 1744.

v2 adds a test provided by brian m. carlson.

Signed-off-by: Aaron Lindsay 
Signed-off-by: brian m. carlson 
---
 git-send-email.perl   |  2 +-
 t/t9001-send-email.sh | 15 +++
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index 2be5dac33..39c15bcc8 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -1834,7 +1834,7 @@ sub apply_transfer_encoding {
my $from = shift;
my $to = shift;
 
-   return $message if ($from eq $to and $from ne '7bit');
+   return ($message, $to) if ($from eq $to and $from ne '7bit');
 
require MIME::QuotedPrint;
require MIME::Base64;
diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh
index 1ef1a1900..ee1efcc59 100755
--- a/t/t9001-send-email.sh
+++ b/t/t9001-send-email.sh
@@ -492,6 +492,21 @@ do
--validate \
$patches longline.patch
'
+
+done
+
+for enc in 7bit 8bit quoted-printable base64
+do
+   test_expect_success $PREREQ "--transfer-encoding=$enc produces correct 
header" '
+   clean_fake_sendmail &&
+   git send-email \
+   --from="Example " \
+   --to=nob...@example.com \
+   --smtp-server="$(pwd)/fake.sendmail" \
+   --transfer-encoding=$enc \
+   $patches &&
+   grep "Content-Transfer-Encoding: $enc" msgtxt1
+   '
 done
 
 test_expect_success $PREREQ 'Invalid In-Reply-To' '
-- 
2.19.1

Re: [RFC] Generation Number v2

2018-11-02 Thread Jakub Narebski

Jakub Narebski  writes:
> Stefan Beller  writes:
[...]
>> How would this impact creation of a commit?
>>
>> The current generation numbers can be lazily updated or not
>> updated at all. In my understanding of the maximum generation
>> numbers, a new commit would make these maximum generation
>> numbers invalid (i.e. they have to be recomputed).
[...]
>> For the V2 maximum generation numbers, would we need to
>> rewrite the numbers for all commits once we recompute them?
>> Assuming that is true, it sounds like the benchmark doesn't
>> cover the whole costs associated with V2, which is why the
>> exceptional performance can be explained.
>
> Let's check it using a simple example
>
> First, (minimum) parent-based generation numbers before and after
> extending the commit graph:
>
>   1   2 3   4 5   6   7new
>   1   2 3   4 5   -   -old
>   .---.-.---.-.---*---*
>\
> \   3   4 5   6new
>  \  3   4 5   6old
>   \-.---.-.---.
>  \
>   \   5new
>\  -old
> \-*

Let me check yet another idea, using (minimum) parent-based V0 generation
numbers (counting distance from the sink / root) as a starting number
for source / heads commits.

1   2 3   4 5   6   7new
1   2 3   4 5   -   -old
.---.-.---.-.---*---*
 \
  \   3   4 5   6new
   \  3   4 5   6old
\-.---.-.---.
   \
\   5new
 \  -old
  \-*


Well, on this example it looks like this variant of maximum generation
numbers can be done incrementally, but let's check another example

   1   2 3   4   5   6 7   8   9   new
   1   2 3   4   5   6 7   8   -   old
   .---.-.---.---.---.-.---.---*
\ /
 \   3   4   / 5   6   7   8   new
  \  5   6  /  -   -   -   -   old
   \-.---.-/---*---*---*---*

It looks like it doesn; give as good results as I thought.  Less values
are changed, but you would still need to recalculate them, unless it can
be proven that they do not need it.

>
> Next, maximum generation numbers.  We start with 9 commits, and we end
> up with 12 commits after the change
>
>   6   7 8   9 10  11  12   new
>   4   5 7   8 9   -   -old
>   .---.-.---.-.---*---*
>\
> \   9   1011  12   new
>  \  6   7 8   9old
>   \-.---.-.---.
>  \
>   \   12   new
>\  -old
> \-*
>
>
> As you can see all maximum generation numbers got rewritten.
>
> Though if instead using the number of commits, we use the maximum
> generation number, or in other words the length of the longest path, we
> get the following:
>
>   1   2 3   4 5   6   7new
>   1   2 4   5 6   -   -old
>   .---.-.---.-.---*---*
>\
> \   4   5 6   7new
>  \  3   4 5   6old
>   \-.---.-.---.
>  \
>   \   7new
>\  -old
> \-*
>
> A bit better, but still much change in numbers.

-- 
Jakub Narębski

Re: [RFE] Please add name and email to git credentials

2018-11-02 Thread Nicolas Mailhot

Le vendredi 02 novembre 2018 à 09:27 +0100, Christian Couder a écrit :
> On Thu, Nov 1, 2018 at 3:42 PM Nicolas Mailhot
>  wrote:
> > Le jeudi 01 novembre 2018 à 15:13 +0100, Christian Couder a écrit :
> > > How can Git know when you commit where you will want to push the
> > > commit afterwards?
> > 
> > You have an url in the repo config. of course you can change it
> > between
> > the commit and the push, but that's not the general case.
> 
> If I did a `git init`, then I have no url in the repo config. Also if
> I cloned from a repo that has a different URL than the sites I have
> credentials for, then how should git use the URL in the repo config?

Then you have no need or use for git credentials. Where’s the problem? 

Will the fact that git credential users, that already have per-repo-url
settings in their .gitconfig, will also be able to use this existing
per-url section to control the mail and name associated with their
repos, wake you at night, or something?

> You could have no user.name and user.email configured in your global
> config, and a script that configures those in the local config
> depending on remote.origin.url. 

One could use the same arguments to say git credentials is not
necessary, it's a maintenance burden, everyone should just script their
auth needs manually, etc.

Are you arguing for git credentails removal here?

Or are you arguing that having two separate mecanisms in git, to match
config directives to repo urls, is some kind of improvement?

I didn't create or write or specify the way git credential matches repo
urls. It already exists within git. If you have a problem with the
matching logic git credential uses, why are you arguing with me instead
of taking it up with the maintainers of this logic?

Regards,

-- 
Nicolas Mailhot

Re: [RFE] Please add name and email to git credentials

2018-11-02 Thread Christian Couder

On Thu, Nov 1, 2018 at 3:42 PM Nicolas Mailhot
 wrote:
>
> Le jeudi 01 novembre 2018 à 15:13 +0100, Christian Couder a écrit :
> >
> > How can Git know when you commit where you will want to push the
> > commit afterwards?
>
> You have an url in the repo config. of course you can change it between
> the commit and the push, but that's not the general case.

If I did a `git init`, then I have no url in the repo config. Also if
I cloned from a repo that has a different URL than the sites I have
credentials for, then how should git use the URL in the repo config?

> Nowadays, most git projects have a preferred git hosting, and your
> name/email with the project match the credentials you use to push
> (otherwise things like gitlab/github issues trackers would not work at
> all).

I think you are talking about a special need that you have, but it's
not the same for everyone. For example I always use my real name
"Christian Couder", and when my email changes, I would like it to
change everywhere, so support for .mailmap files in GitHub and GitLab
for example would be much more important for me than what you suggest.

> > What if you want to push the same commit to 2 different places that
> > need different credentials?
>
> Then you do not use git credentials and have to configure all by hand.
> Which will usually be a major error-prone PITA, so you’ll end up pushing
> to the system that matches the ID you want to se in git logs, and then
> push from this system to others.

You could have no user.name and user.email configured in your global
config, and a script that configures those in the local config
depending on remote.origin.url. So when you commit in a repo, Git will
fail if user.name and user.email are not already configured in the
repo. It will tell you to configure those, then you just have to run
your script to get those configured locally, and you are good to go.

Another more flexible possibility is what Ævar suggest with
conditional includes in the config.

[PATCH] build: link with curl-defined linker flags

2018-11-02 Thread James Knight

Adjust the autotools configuration to populate libcurl-related linker
flags from curl-config instead of manually populating flags based off
detected features.

Originally, the configuration would check for SSL-support in the target
curl library. If enabled, NEEDS_SSL_WITH_CURL would be set and used in
the Makefile to append additional libraries to link against. Since the
process is already depending on a curl-config utility to provide
curl-related build information, adjusting the build to track the linker
flags in CURL_LIBCURL and pass the configuration option into the
Makefile.

Signed-off-by: James Knight 
---
 Makefile | 21 +++--
 config.mak.uname |  5 ++---
 configure.ac | 17 +++--
 3 files changed, 16 insertions(+), 27 deletions(-)

diff --git a/Makefile b/Makefile
index b08d5ea25..c3be87b0e 100644
--- a/Makefile
+++ b/Makefile
@@ -183,10 +183,6 @@ all::
 #
 # Define NEEDS_SSL_WITH_CRYPTO if you need -lssl when using -lcrypto (Darwin).
 #
-# Define NEEDS_SSL_WITH_CURL if you need -lssl with -lcurl (Minix).
-#
-# Define NEEDS_IDN_WITH_CURL if you need -lidn when using -lcurl (Minix).
-#
 # Define NEEDS_LIBICONV if linking with libc is not enough (Darwin).
 #
 # Define NEEDS_LIBINTL_BEFORE_LIBICONV if you need libintl before libiconv.
@@ -1305,18 +1301,15 @@ else
ifdef CURLDIR
# Try "-Wl,-rpath=$(CURLDIR)/$(lib)" in such a case.
BASIC_CFLAGS += -I$(CURLDIR)/include
-   CURL_LIBCURL = -L$(CURLDIR)/$(lib) 
$(CC_LD_DYNPATH)$(CURLDIR)/$(lib) -lcurl
+   CURL_LIBCURL = -L$(CURLDIR)/$(lib) 
$(CC_LD_DYNPATH)$(CURLDIR)/$(lib)
else
-   CURL_LIBCURL = -lcurl
-   endif
-   ifdef NEEDS_SSL_WITH_CURL
-   CURL_LIBCURL += -lssl
-   ifdef NEEDS_CRYPTO_WITH_SSL
-   CURL_LIBCURL += -lcrypto
-   endif
+   CURL_LIBCURL =
endif
-   ifdef NEEDS_IDN_WITH_CURL
-   CURL_LIBCURL += -lidn
+
+   ifdef CURL_LDFLAGS
+   CURL_LIBCURL += $(CURL_LDFLAGS)
+   else
+   CURL_LIBCURL += -lcurl
endif
 
REMOTE_CURL_PRIMARY = git-remote-http$X
diff --git a/config.mak.uname b/config.mak.uname
index 8acdeb71f..923b8fa09 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -431,8 +431,7 @@ ifeq ($(uname_S),Minix)
NO_NSEC = YesPlease
NEEDS_LIBGEN =
NEEDS_CRYPTO_WITH_SSL = YesPlease
-   NEEDS_IDN_WITH_CURL = YesPlease
-   NEEDS_SSL_WITH_CURL = YesPlease
+   CURL_LDFLAGS = -lssl -lcrypto -lidn
NEEDS_RESOLV =
NO_HSTRERROR = YesPlease
NO_MMAP = YesPlease
@@ -458,7 +457,7 @@ ifeq ($(uname_S),NONSTOP_KERNEL)
# Missdetected, hence commented out, see below.
#NO_CURL = YesPlease
# Added manually, see above.
-   NEEDS_SSL_WITH_CURL = YesPlease
+   CURL_LDFLAGS = -lssl -lcrypto
HAVE_LIBCHARSET_H = YesPlease
HAVE_STRINGS_H = YesPlease
NEEDS_LIBICONV = YesPlease
diff --git a/configure.ac b/configure.ac
index e11b7976a..44e8c036b 100644
--- a/configure.ac
+++ b/configure.ac
@@ -600,17 +600,14 @@ AC_CHECK_PROG([CURL_CONFIG], [curl-config],
 
 if test $CURL_CONFIG != no; then
 GIT_CONF_SUBST([CURL_CONFIG])
-if test -z "${NO_OPENSSL}"; then
-  AC_MSG_CHECKING([if Curl supports SSL])
-  if test $(curl-config --features|grep SSL) = SSL; then
- NEEDS_SSL_WITH_CURL=YesPlease
- AC_MSG_RESULT([yes])
-  else
- NEEDS_SSL_WITH_CURL=
- AC_MSG_RESULT([no])
-  fi
-  GIT_CONF_SUBST([NEEDS_SSL_WITH_CURL])
+
+if test -z "$CURL_CONFIG_OPTS"; then
+CURL_CONFIG_OPTS="--libs"
 fi
+
+CURL_LDFLAGS=$($CURL_CONFIG $CURL_CONFIG_OPTS)
+AC_MSG_NOTICE([Setting CURL_LDFLAGS to '$CURL_LDFLAGS'])
+GIT_CONF_SUBST([CURL_LDFLAGS], [$CURL_LDFLAGS])
 fi
 
 fi
-- 
2.19.1

Re: [PATCH v3 0/8] fixes for unqualified push

2018-11-02 Thread Jeff King

On Fri, Oct 26, 2018 at 11:07:33PM +, Ævar Arnfjörð Bjarmason wrote:

> After sending out v2 I noticed I didn't update the examples in a
> couple of the commit messages, and figured I'd finish this up by
> adding a patch to document how this works in the "git-push"
> manpage. This behavior has never been properly documented. range-diff
> with v2:

I read over these patches and the comments from Junio. Overall, I like
the direction of the first half 5, and tend to agree with the review
that the "refs/remotes" thing is going too far.

I didn't dig down into reviewing each line, since it looked like Junio
already did, and there's probably a re-roll coming.

-Peff

Re: [PATCH v3 5/8] push: add an advice on unqualified push

2018-11-02 Thread Jeff King

On Mon, Oct 29, 2018 at 02:14:02PM +0900, Junio C Hamano wrote:

> Any failure in the &&-chain (or the last grep) would not terminate
> the for loop, so for the purpose of determining the success of this
> test_expect_success, the last "blob" iteration is the only thing
> that matters.
> 
> Which is probably not what you want.  If testing all of these is
> important, then you can do this:
> 
>   (
>   exit_with=true &&
>   for type in ...
>   do
>   ... many ... &&
>   ... many ... &&
>   ... tests ... ||
>   exit_with=false
>   done
>   $exit_with
>   )
> 
> or just say "|| exit" and leave the loop (and the subprocess running
> it) early on the first failure.

You can skip the subshell and just "|| return 1" from the chain inside
the loop. We run the test snippets inside an extra layer of
function-call exactly to allow that.

-Peff

[PATCH 9/9] xdiff-interface: drop parse_hunk_header()

2018-11-02 Thread Jeff King

This function was used only for parsing the hunk headers generated by
xdiff. Now that we can use hunk callbacks to get that information
directly, it has outlived its usefulness.

Note to anyone who wants to resurrect it: the "len" parameter was
totally unused, meaning that the function could read past the end of the
"line" array. In practice this never happened, because we only used it
to parse xdiff's generated header lines. But it would be dangerous to
use it for other cases without fixing this defect.

Signed-off-by: Jeff King 
---
 xdiff-interface.c | 45 -
 xdiff-interface.h |  3 ---
 2 files changed, 48 deletions(-)

diff --git a/xdiff-interface.c b/xdiff-interface.c
index f2a3d9a577..80f060d278 100644
--- a/xdiff-interface.c
+++ b/xdiff-interface.c
@@ -15,51 +15,6 @@ struct xdiff_emit_state {
struct strbuf remainder;
 };
 
-static int parse_num(char **cp_p, int *num_p)
-{
-   char *cp = *cp_p;
-   int num = 0;
-
-   while ('0' <= *cp && *cp <= '9')
-   num = num * 10 + *cp++ - '0';
-   if (!(cp - *cp_p))
-   return -1;
-   *cp_p = cp;
-   *num_p = num;
-   return 0;
-}
-
-int parse_hunk_header(char *line, int len,
- int *ob, int *on,
- int *nb, int *nn)
-{
-   char *cp;
-   cp = line + 4;
-   if (parse_num(, ob)) {
-   bad_line:
-   return error("malformed diff output: %s", line);
-   }
-   if (*cp == ',') {
-   cp++;
-   if (parse_num(, on))
-   goto bad_line;
-   }
-   else
-   *on = 1;
-   if (*cp++ != ' ' || *cp++ != '+')
-   goto bad_line;
-   if (parse_num(, nb))
-   goto bad_line;
-   if (*cp == ',') {
-   cp++;
-   if (parse_num(, nn))
-   goto bad_line;
-   }
-   else
-   *nn = 1;
-   return -!!memcmp(cp, " @@", 3);
-}
-
 static int xdiff_out_hunk(void *priv_,
  long old_begin, long old_nr,
  long new_begin, long new_nr,
diff --git a/xdiff-interface.h b/xdiff-interface.h
index 7b0ccbdd1d..971aa84b57 100644
--- a/xdiff-interface.h
+++ b/xdiff-interface.h
@@ -23,9 +23,6 @@ int xdi_diff_outf(mmfile_t *mf1, mmfile_t *mf2,
  xdiff_emit_line_fn line_fn,
  void *consume_callback_data,
  xpparam_t const *xpp, xdemitconf_t const *xecfg);
-int parse_hunk_header(char *line, int len,
- int *ob, int *on,
- int *nb, int *nn);
 int read_mmfile(mmfile_t *ptr, const char *filename);
 void read_mmblob(mmfile_t *ptr, const struct object_id *oid);
 int buffer_is_binary(const char *ptr, unsigned long size);
-- 
2.19.1.1336.g081079ac04

[PATCH 8/9] range-diff: use a hunk callback

2018-11-02 Thread Jeff King

When we count the lines in a diff, we don't actually care about the
contents of each line. By using a hunk callback, we tell xdiff that it
does not need to even bother generating a hunk header line, saving a
small amount of work.

Arguably we could even ignore the hunk headers completely, since we're
just computing a cost function between patches. But doing it this way
maintains the exact same behavior before and after.

Signed-off-by: Jeff King 
---
This one might be going overboard. It can be dropped without affecting
any of the other patches.

 range-diff.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/range-diff.c b/range-diff.c
index 245fc17fee..3958720f00 100644
--- a/range-diff.c
+++ b/range-diff.c
@@ -197,6 +197,12 @@ static void diffsize_consume(void *data, char *line, 
unsigned long len)
(*(int *)data)++;
 }
 
+static void diffsize_hunk(void *data, long ob, long on, long nb, long nn,
+ const char *funcline, long funclen)
+{
+   diffsize_consume(data, NULL, 0);
+}
+
 static int diffsize(const char *a, const char *b)
 {
xpparam_t pp = { 0 };
@@ -210,7 +216,9 @@ static int diffsize(const char *a, const char *b)
mf2.size = strlen(b);
 
cfg.ctxlen = 3;
-   if (!xdi_diff_outf(, , NULL, diffsize_consume, , , 
))
+   if (!xdi_diff_outf(, ,
+  diffsize_hunk, diffsize_consume, ,
+  , ))
return count;
 
error(_("failed to generate diff"));
-- 
2.19.1.1336.g081079ac04

[PATCH 7/9] diff: convert --check to use a hunk callback

2018-11-02 Thread Jeff King

The "diff --check" code needs to know the line number on which each hunk
starts in order to generate its output. We get that now by parsing the
hunk header line generated by xdiff, but it's much simpler to just pass
it directly using a hunk callback.

Signed-off-by: Jeff King 
---
 diff.c | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/diff.c b/diff.c
index 37035110dc..e38d1ecaf4 100644
--- a/diff.c
+++ b/diff.c
@@ -3128,6 +3128,15 @@ static int is_conflict_marker(const char *line, int 
marker_size, unsigned long l
return 1;
 }
 
+static void checkdiff_consume_hunk(void *priv,
+  long ob, long on, long nb, long nn,
+  const char *func, long funclen)
+
+{
+   struct checkdiff_t *data = priv;
+   data->lineno = nb - 1;
+}
+
 static void checkdiff_consume(void *priv, char *line, unsigned long len)
 {
struct checkdiff_t *data = priv;
@@ -3163,12 +3172,6 @@ static void checkdiff_consume(void *priv, char *line, 
unsigned long len)
  data->o->file, set, reset, ws);
} else if (line[0] == ' ') {
data->lineno++;
-   } else if (line[0] == '@') {
-   char *plus = strchr(line, '+');
-   if (plus)
-   data->lineno = strtol(plus, NULL, 10) - 1;
-   else
-   die("invalid diff");
}
 }
 
@@ -3684,8 +3687,9 @@ static void builtin_checkdiff(const char *name_a, const 
char *name_b,
memset(, 0, sizeof(xecfg));
xecfg.ctxlen = 1; /* at least one context line */
xpp.flags = 0;
-   if (xdi_diff_outf(, , NULL, checkdiff_consume,
- , , ))
+   if (xdi_diff_outf(, , checkdiff_consume_hunk,
+ checkdiff_consume, ,
+ , ))
die("unable to generate checkdiff for %s", one->path);
 
if (data.ws_rule & WS_BLANK_AT_EOF) {
-- 
2.19.1.1336.g081079ac04

[PATCH 6/9] combine-diff: use an xdiff hunk callback

2018-11-02 Thread Jeff King

A combined diff has to line up the hunks for all of the individual
pairwise diffs, and thus needs to know their line numbers and sizes. We
get that now by parsing the hunk header line that xdiff generates.
However, now that xdiff supports a hunk callback, we can just use the
values directly.

Signed-off-by: Jeff King 
---
I learned to love --color-moved-ws=allow-indentation-change for this
one.

 combine-diff.c | 67 +++---
 1 file changed, 36 insertions(+), 31 deletions(-)

diff --git a/combine-diff.c b/combine-diff.c
index 3c479cfc3e..ad7752ea6b 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -345,38 +345,43 @@ struct combine_diff_state {
struct sline *lost_bucket;
 };
 
-static void consume_line(void *state_, char *line, unsigned long len)
+static void consume_hunk(void *state_,
+long ob, long on,
+long nb, long nn,
+const char *funcline, long funclen)
 {
struct combine_diff_state *state = state_;
-   if (5 < len && !memcmp("@@ -", line, 4)) {
-   if (parse_hunk_header(line, len,
- >ob, >on,
- >nb, >nn))
-   return;
-   state->lno = state->nb;
-   if (state->nn == 0) {
-   /* @@ -X,Y +N,0 @@ removed Y lines
-* that would have come *after* line N
-* in the result.  Our lost buckets hang
-* to the line after the removed lines,
-*
-* Note that this is correct even when N == 0,
-* in which case the hunk removes the first
-* line in the file.
-*/
-   state->lost_bucket = >sline[state->nb];
-   if (!state->nb)
-   state->nb = 1;
-   } else {
-   state->lost_bucket = >sline[state->nb-1];
-   }
-   if (!state->sline[state->nb-1].p_lno)
-   state->sline[state->nb-1].p_lno =
-   xcalloc(state->num_parent,
-   sizeof(unsigned long));
-   state->sline[state->nb-1].p_lno[state->n] = state->ob;
-   return;
+
+   state->ob = ob;
+   state->on = on;
+   state->nb = nb;
+   state->nn = nn;
+   state->lno = state->nb;
+   if (state->nn == 0) {
+   /* @@ -X,Y +N,0 @@ removed Y lines
+* that would have come *after* line N
+* in the result.  Our lost buckets hang
+* to the line after the removed lines,
+*
+* Note that this is correct even when N == 0,
+* in which case the hunk removes the first
+* line in the file.
+*/
+   state->lost_bucket = >sline[state->nb];
+   if (!state->nb)
+   state->nb = 1;
+   } else {
+   state->lost_bucket = >sline[state->nb-1];
}
+   if (!state->sline[state->nb-1].p_lno)
+   state->sline[state->nb-1].p_lno =
+   xcalloc(state->num_parent, sizeof(unsigned long));
+   state->sline[state->nb-1].p_lno[state->n] = state->ob;
+}
+
+static void consume_line(void *state_, char *line, unsigned long len)
+{
+   struct combine_diff_state *state = state_;
if (!state->lost_bucket)
return; /* not in any hunk yet */
switch (line[0]) {
@@ -421,8 +426,8 @@ static void combine_diff(struct repository *r,
state.num_parent = num_parent;
state.n = n;
 
-   if (xdi_diff_outf(_file, result_file, NULL, consume_line,
- , , ))
+   if (xdi_diff_outf(_file, result_file, consume_hunk,
+ consume_line, , , ))
die("unable to generate combined diff for %s",
oid_to_hex(parent));
free(parent_file.ptr);
-- 
2.19.1.1336.g081079ac04

[PATCH 5/9] diff: use hunk callback for word-diff

2018-11-02 Thread Jeff King

Our word-diff does not look at the -/+ lines generated by xdiff at all
(because they are not real lines to show the user, but just the
tokenized words split into lines). Instead we use the line numbers from
the hunk headers to index our own data structure.

As a result, our xdi_diff_outf() callback throws away all lines except
hunk headers. We can instead use a hunk callback, which has two
benefits:

  1. We don't have to re-parse the generated hunk header line, but can
 use the passed parameters directly.

  2. By setting our line callback to NULL, we can tell xdiff-interface
 that it does not even need to bother generating the other lines,
 saving a small amount of work.

Signed-off-by: Jeff King 
---
 diff.c| 12 +---
 xdiff-interface.c |  3 +++
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/diff.c b/diff.c
index 8a83ce6058..37035110dc 100644
--- a/diff.c
+++ b/diff.c
@@ -1912,19 +1912,17 @@ static int color_words_output_graph_prefix(struct 
diff_words_data *diff_words)
}
 }
 
-static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
+static void fn_out_diff_words_aux(void *priv,
+ long minus_first, long minus_len,
+ long plus_first, long plus_len,
+ const char *func, long funclen)
 {
struct diff_words_data *diff_words = priv;
struct diff_words_style *style = diff_words->style;
-   int minus_first, minus_len, plus_first, plus_len;
const char *minus_begin, *minus_end, *plus_begin, *plus_end;
struct diff_options *opt = diff_words->opt;
const char *line_prefix;
 
-   if (line[0] != '@' || parse_hunk_header(line, len,
-   _first, _len, _first, _len))
-   return;
-
assert(opt);
line_prefix = diff_line_prefix(opt);
 
@@ -2074,7 +2072,7 @@ static void diff_words_show(struct diff_words_data 
*diff_words)
xpp.flags = 0;
/* as only the hunk header will be parsed, we need a 0-context */
xecfg.ctxlen = 0;
-   if (xdi_diff_outf(, , NULL, fn_out_diff_words_aux,
+   if (xdi_diff_outf(, , fn_out_diff_words_aux, NULL,
  diff_words, , ))
die("unable to generate word diff");
free(minus.ptr);
diff --git a/xdiff-interface.c b/xdiff-interface.c
index 2622981d97..f2a3d9a577 100644
--- a/xdiff-interface.c
+++ b/xdiff-interface.c
@@ -95,6 +95,9 @@ static int xdiff_outf(void *priv_, mmbuffer_t *mb, int nbuf)
struct xdiff_emit_state *priv = priv_;
int i;
 
+   if (!priv->line_fn)
+   return 0;
+
for (i = 0; i < nbuf; i++) {
if (mb[i].ptr[mb[i].size-1] != '\n') {
/* Incomplete line */
-- 
2.19.1.1336.g081079ac04

[PATCH 4/9] diff: discard hunk headers for patch-ids earlier

2018-11-02 Thread Jeff King

We do not include hunk header lines when computing patch-ids, since
the line numbers would create false negatives. Rather than detect and
skip them in our line callback, we can simply tell xdiff to avoid
generating them.

This is similar to the previous commit, but split out because it
actually requires modifying the matching line callback.

Signed-off-by: Jeff King 
---
 diff.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/diff.c b/diff.c
index d84356e007..8a83ce6058 100644
--- a/diff.c
+++ b/diff.c
@@ -5666,10 +5666,6 @@ static void patch_id_consume(void *priv, char *line, 
unsigned long len)
struct patch_id_t *data = priv;
int new_len;
 
-   /* Ignore line numbers when computing the SHA1 of the patch */
-   if (starts_with(line, "@@ -"))
-   return;
-
new_len = remove_space(line, len);
 
git_SHA1_Update(data->ctx, line, new_len);
@@ -5771,8 +5767,8 @@ static int diff_get_patch_id(struct diff_options 
*options, struct object_id *oid
xpp.flags = 0;
xecfg.ctxlen = 3;
xecfg.flags = 0;
-   if (xdi_diff_outf(, , NULL, patch_id_consume,
- , , ))
+   if (xdi_diff_outf(, , discard_hunk_line,
+ patch_id_consume, , , ))
return error("unable to generate patch-id diff for %s",
 p->one->path);
}
-- 
2.19.1.1336.g081079ac04

[PATCH 3/9] diff: avoid generating unused hunk header lines

2018-11-02 Thread Jeff King

Some callers of xdi_diff_outf() do not look at the generated hunk header
lines at all. By plugging in a no-op hunk callback, this tells xdiff not
to even bother formatting them.

This patch introduces a stock no-op callback and uses it with a few
callers whose line callbacks explicitly ignore hunk headers (because
they look only for +/- lines).

Signed-off-by: Jeff King 
---
 diff.c | 4 ++--
 diffcore-pickaxe.c | 3 ++-
 xdiff-interface.c  | 6 ++
 xdiff-interface.h  | 6 ++
 4 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/diff.c b/diff.c
index 07be5879e5..d84356e007 100644
--- a/diff.c
+++ b/diff.c
@@ -3637,8 +3637,8 @@ static void builtin_diffstat(const char *name_a, const 
char *name_b,
xpp.anchors_nr = o->anchors_nr;
xecfg.ctxlen = o->context;
xecfg.interhunkctxlen = o->interhunkcontext;
-   if (xdi_diff_outf(, , NULL, diffstat_consume,
- diffstat, , ))
+   if (xdi_diff_outf(, , discard_hunk_line,
+ diffstat_consume, diffstat, , ))
die("unable to generate diffstat for %s", one->path);
}
 
diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index 58df35670a..69fc55ea1e 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -62,7 +62,8 @@ static int diff_grep(mmfile_t *one, mmfile_t *two,
ecbdata.hit = 0;
xecfg.ctxlen = o->context;
xecfg.interhunkctxlen = o->interhunkcontext;
-   if (xdi_diff_outf(one, two, NULL, diffgrep_consume, , , 
))
+   if (xdi_diff_outf(one, two, discard_hunk_line, diffgrep_consume,
+ , , ))
return 0;
return ecbdata.hit;
 }
diff --git a/xdiff-interface.c b/xdiff-interface.c
index 16d37ce6be..2622981d97 100644
--- a/xdiff-interface.c
+++ b/xdiff-interface.c
@@ -157,6 +157,12 @@ int xdi_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const 
*xpp, xdemitconf_t co
return xdl_diff(, , xpp, xecfg, xecb);
 }
 
+void discard_hunk_line(void *priv,
+  long ob, long on, long nb, long nn,
+  const char *func, long funclen)
+{
+}
+
 int xdi_diff_outf(mmfile_t *mf1, mmfile_t *mf2,
  xdiff_emit_hunk_fn hunk_fn,
  xdiff_emit_line_fn line_fn,
diff --git a/xdiff-interface.h b/xdiff-interface.h
index 2dbe2feb19..7b0ccbdd1d 100644
--- a/xdiff-interface.h
+++ b/xdiff-interface.h
@@ -35,6 +35,12 @@ extern void xdiff_clear_find_func(xdemitconf_t *xecfg);
 extern int git_xmerge_config(const char *var, const char *value, void *cb);
 extern int git_xmerge_style;
 
+/*
+ * Can be used as a no-op hunk_fn for xdi_diff_outf(), since a NULL
+ * one just sends the hunk line to the line_fn callback).
+ */
+void discard_hunk_line(void *, long, long, long, long, const char *, long);
+
 /*
  * Compare the strings l1 with l2 which are of size s1 and s2 respectively.
  * Returns 1 if the strings are deemed equal, 0 otherwise.
-- 
2.19.1.1336.g081079ac04

[PATCH 2/9] xdiff-interface: provide a separate consume callback for hunks

2018-11-02 Thread Jeff King

The previous commit taught xdiff to optionally provide the hunk header
data to a specialized callback. But most users of xdiff actually use our
more convenient xdi_diff_outf() helper, which ensures that our callbacks
are always fed whole lines.

Let's plumb the special hunk-callback through this interface, too. It
will follow the same rule as xdiff when the hunk callback is NULL (i.e.,
continue to pass a stringified hunk header to the line callback). Since
we add NULL to each caller, there should be no behavior change yet.

Signed-off-by: Jeff King 
---
 combine-diff.c |  4 ++--
 diff.c | 20 ++--
 diffcore-pickaxe.c |  2 +-
 range-diff.c   |  2 +-
 xdiff-interface.c  | 30 ++
 xdiff-interface.h  | 10 --
 6 files changed, 48 insertions(+), 20 deletions(-)

diff --git a/combine-diff.c b/combine-diff.c
index 10155e0ec8..3c479cfc3e 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -421,8 +421,8 @@ static void combine_diff(struct repository *r,
state.num_parent = num_parent;
state.n = n;
 
-   if (xdi_diff_outf(_file, result_file, consume_line, ,
- , ))
+   if (xdi_diff_outf(_file, result_file, NULL, consume_line,
+ , , ))
die("unable to generate combined diff for %s",
oid_to_hex(parent));
free(parent_file.ptr);
diff --git a/diff.c b/diff.c
index 8647db3d30..07be5879e5 100644
--- a/diff.c
+++ b/diff.c
@@ -2074,8 +2074,8 @@ static void diff_words_show(struct diff_words_data 
*diff_words)
xpp.flags = 0;
/* as only the hunk header will be parsed, we need a 0-context */
xecfg.ctxlen = 0;
-   if (xdi_diff_outf(, , fn_out_diff_words_aux, diff_words,
- , ))
+   if (xdi_diff_outf(, , NULL, fn_out_diff_words_aux,
+ diff_words, , ))
die("unable to generate word diff");
free(minus.ptr);
free(plus.ptr);
@@ -3526,8 +3526,8 @@ static void builtin_diff(const char *name_a,
xecfg.ctxlen = strtoul(v, NULL, 10);
if (o->word_diff)
init_diff_words_data(, o, one, two);
-   if (xdi_diff_outf(, , fn_out_consume, ,
- , ))
+   if (xdi_diff_outf(, , NULL, fn_out_consume,
+ , , ))
die("unable to generate diff for %s", one->path);
if (o->word_diff)
free_diff_words_data();
@@ -3637,8 +3637,8 @@ static void builtin_diffstat(const char *name_a, const 
char *name_b,
xpp.anchors_nr = o->anchors_nr;
xecfg.ctxlen = o->context;
xecfg.interhunkctxlen = o->interhunkcontext;
-   if (xdi_diff_outf(, , diffstat_consume, diffstat,
- , ))
+   if (xdi_diff_outf(, , NULL, diffstat_consume,
+ diffstat, , ))
die("unable to generate diffstat for %s", one->path);
}
 
@@ -3686,8 +3686,8 @@ static void builtin_checkdiff(const char *name_a, const 
char *name_b,
memset(, 0, sizeof(xecfg));
xecfg.ctxlen = 1; /* at least one context line */
xpp.flags = 0;
-   if (xdi_diff_outf(, , checkdiff_consume, ,
- , ))
+   if (xdi_diff_outf(, , NULL, checkdiff_consume,
+ , , ))
die("unable to generate checkdiff for %s", one->path);
 
if (data.ws_rule & WS_BLANK_AT_EOF) {
@@ -5771,8 +5771,8 @@ static int diff_get_patch_id(struct diff_options 
*options, struct object_id *oid
xpp.flags = 0;
xecfg.ctxlen = 3;
xecfg.flags = 0;
-   if (xdi_diff_outf(, , patch_id_consume, ,
- , ))
+   if (xdi_diff_outf(, , NULL, patch_id_consume,
+ , , ))
return error("unable to generate patch-id diff for %s",
 p->one->path);
}
diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index d2361e06a1..58df35670a 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -62,7 +62,7 @@ static int diff_grep(mmfile_t *one, mmfile_t *two,
ecbdata.hit = 0;
xecfg.ctxlen = o->context;
xecfg.interhunkctxlen = o->interhunkcontext;
-   if (xdi_diff_outf(one, two, diffgrep_consume, , , ))
+   if (xdi_diff_outf(one, two, NULL, diffgrep_consume, , , 
))
return 0;
return ecbdata.hit;
 }
diff --git a/range-diff.c b/range-diff.c
index bd8083f2d1..245fc17fee 100644
--- a/range-diff.c
+++ b/range-diff.c
@@ -210,7 +210,7 @@ static int diffsize(const char *a, const char *b)
mf2.size = strlen(b);

[PATCH 1/9] xdiff: provide a separate emit callback for hunks

2018-11-02 Thread Jeff King

The xdiff library always emits hunk header lines to our callbacks as
formatted strings like "@@ -a,b +c,d @@\n". This is convenient if we're
going to output a diff, but less so if we actually need to compute using
those numbers, which requires re-parsing the line.

In preparation for moving away from this, let's teach xdiff a new
callback function which gets the broken-out hunk information. To help
callers that don't want to use this new callback, if it's NULL we'll
continue to format the hunk header into a string.

Note that this function renames the "outf" callback to "out_line", as
well. This isn't strictly necessary, but helps in two ways:

  1. Now that there are two callbacks, it's nice to use more descriptive
 names.

  2. Many callers did not zero the emit_callback_data struct, and needed
 to be modified to set ecb.out_hunk to NULL. By changing the name of
 the existing struct member, that guarantees that any new callers
 from in-flight topics will break the build and be examined
 manually.

Signed-off-by: Jeff King 
---
 builtin/merge-tree.c |  3 ++-
 builtin/rerere.c |  3 ++-
 xdiff-interface.c|  2 +-
 xdiff/xdiff.h|  6 +-
 xdiff/xutils.c   | 21 +
 5 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c
index 8fc108d305..70f6fc9167 100644
--- a/builtin/merge-tree.c
+++ b/builtin/merge-tree.c
@@ -110,7 +110,8 @@ static void show_diff(struct merge_list *entry)
xpp.flags = 0;
memset(, 0, sizeof(xecfg));
xecfg.ctxlen = 3;
-   ecb.outf = show_outf;
+   ecb.out_hunk = NULL;
+   ecb.out_line = show_outf;
ecb.priv = NULL;
 
src.ptr = origin(entry, );
diff --git a/builtin/rerere.c b/builtin/rerere.c
index e89ccbc524..d78eeaed32 100644
--- a/builtin/rerere.c
+++ b/builtin/rerere.c
@@ -41,7 +41,8 @@ static int diff_two(const char *file1, const char *label1,
xpp.flags = 0;
memset(, 0, sizeof(xecfg));
xecfg.ctxlen = 3;
-   ecb.outf = outf;
+   ecb.out_hunk = NULL;
+   ecb.out_line = outf;
ret = xdi_diff(, , , , );
 
free(minus.ptr);
diff --git a/xdiff-interface.c b/xdiff-interface.c
index e7af95db86..25c185aff4 100644
--- a/xdiff-interface.c
+++ b/xdiff-interface.c
@@ -152,7 +152,7 @@ int xdi_diff_outf(mmfile_t *mf1, mmfile_t *mf2,
state.consume = fn;
state.consume_callback_data = consume_callback_data;
memset(, 0, sizeof(ecb));
-   ecb.outf = xdiff_outf;
+   ecb.out_line = xdiff_outf;
ecb.priv = 
strbuf_init(, 0);
ret = xdi_diff(mf1, mf2, xpp, xecfg, );
diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index 2356da5f78..b158369020 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -86,7 +86,11 @@ typedef struct s_xpparam {
 
 typedef struct s_xdemitcb {
void *priv;
-   int (*outf)(void *, mmbuffer_t *, int);
+   int (*out_hunk)(void *,
+   long old_begin, long old_nr,
+   long new_begin, long new_nr,
+   const char *func, long funclen);
+   int (*out_line)(void *, mmbuffer_t *, int);
 } xdemitcb_t;
 
 typedef long (*find_func_t)(const char *line, long line_len, char *buffer, 
long buffer_size, void *priv);
diff --git a/xdiff/xutils.c b/xdiff/xutils.c
index 88e5995535..963e1c58b9 100644
--- a/xdiff/xutils.c
+++ b/xdiff/xutils.c
@@ -54,7 +54,7 @@ int xdl_emit_diffrec(char const *rec, long size, char const 
*pre, long psize,
mb[2].size = strlen(mb[2].ptr);
i++;
}
-   if (ecb->outf(ecb->priv, mb, i) < 0) {
+   if (ecb->out_line(ecb->priv, mb, i) < 0) {
 
return -1;
}
@@ -344,8 +344,9 @@ int xdl_num_out(char *out, long val) {
return str - out;
 }
 
-int xdl_emit_hunk_hdr(long s1, long c1, long s2, long c2,
- const char *func, long funclen, xdemitcb_t *ecb) {
+static int xdl_format_hunk_hdr(long s1, long c1, long s2, long c2,
+  const char *func, long funclen,
+  xdemitcb_t *ecb) {
int nb = 0;
mmbuffer_t mb;
char buf[128];
@@ -387,9 +388,21 @@ int xdl_emit_hunk_hdr(long s1, long c1, long s2, long c2,
 
mb.ptr = buf;
mb.size = nb;
-   if (ecb->outf(ecb->priv, , 1) < 0)
+   if (ecb->out_line(ecb->priv, , 1) < 0)
return -1;
+   return 0;
+}
 
+int xdl_emit_hunk_hdr(long s1, long c1, long s2, long c2,
+ const char *func, long funclen,
+ xdemitcb_t *ecb) {
+   if (!ecb->out_hunk)
+   return xdl_format_hunk_hdr(s1, c1, s2, c2, func, funclen, ecb);
+   if (ecb->out_hunk(ecb->priv,
+ c1 ? s1 : s1 - 1, c1,
+ c2 ? s2 : s2 - 1, c2,
+ func, funclen) < 0)
+   return -1;
return 0;
 }
 
-- 
2.19.1.1336.g081079ac04

[PATCH 0/9] saner xdiff hunk callbacks

2018-11-02 Thread Jeff King

As part of my -Wunused-parameter hunt, I noticed that parse_hunk_header()
can read out-of-bounds in the array it is given. But it turns out not to
be a big problem because we only ever use it to parse lines that xdiff
has just generated.

This is fixable, and I'll show my patch to do so at the end of this
email. But it seems rather silly that we have xdiff generate a string
just so that we can parse it from within the same process. So instead I
improved the xdiff interface to pass the actual integers out, made use
of it as appropriate, and then in the final patch we can just drop the
buggy function entirely.

So that's this series, which I think is the better fix:

  [1/9]: xdiff: provide a separate emit callback for hunks
  [2/9]: xdiff-interface: provide a separate consume callback for hunks
  [3/9]: diff: avoid generating unused hunk header lines
  [4/9]: diff: discard hunk headers for patch-ids earlier
  [5/9]: diff: use hunk callback for word-diff
  [6/9]: combine-diff: use an xdiff hunk callback
  [7/9]: diff: convert --check to use a hunk callback
  [8/9]: range-diff: use a hunk callback
  [9/9]: xdiff-interface: drop parse_hunk_header()

 builtin/merge-tree.c |  3 +-
 builtin/rerere.c |  3 +-
 combine-diff.c   | 67 --
 diff.c   | 48 ++--
 diffcore-pickaxe.c   |  3 +-
 range-diff.c | 10 +-
 xdiff-interface.c| 76 ++--
 xdiff-interface.h| 19 ---
 xdiff/xdiff.h|  6 +++-
 xdiff/xutils.c   | 21 +---
 10 files changed, 141 insertions(+), 115 deletions(-)

For reference/comparison, here's the minimal fix patch.

-- >8 --
Subject: [PATCH DO NOT APPLY] xdiff-interface: refactor parse_hunk_header

The parse_hunk_header() function takes its input as a ptr/len pair, but
does not respect the "len" half, and instead treats it like a
NUL-terminated string. This works out in practice because we only ever
parse strings generated by xdiff. These do omit the trailing NUL, but
since they're always valid, we never parse past the newline.

Still, it would be easy to misuse it, so let's fix it.

While we're doing so, we can improve a few other things:

  - by using helpers like skip_prefix_mem(), we can do the whole parse
within a conditional, avoiding the need to jump around (backwards,
even!) to the error block with a goto. The result is hopefully much
more readable.

  - the check for the final "@@" would trigger an error return code if
it failed, but wouldn't actually emit an error message (like the
rest of the parse errors do)

  - we used to blindly assume the caller checked that the string starts
with "@@ -"; we now check it ourselves and let the caller know what
we found

  - the input line does not need to be modified, and can be marked
"const"

Signed-off-by: Jeff King 
---
The diff is pretty horrid, but hopefully the post-image is pleasant to
read.

 combine-diff.c| 13 +
 diff.c|  5 ++--
 xdiff-interface.c | 68 +--
 xdiff-interface.h |  6 ++---
 4 files changed, 50 insertions(+), 42 deletions(-)

diff --git a/combine-diff.c b/combine-diff.c
index 10155e0ec8..0318f39103 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -348,11 +348,14 @@ struct combine_diff_state {
 static void consume_line(void *state_, char *line, unsigned long len)
 {
struct combine_diff_state *state = state_;
-   if (5 < len && !memcmp("@@ -", line, 4)) {
-   if (parse_hunk_header(line, len,
- >ob, >on,
- >nb, >nn))
-   return;
+   switch (maybe_parse_hunk_header(line, len,
+   >ob, >on,
+   >nb, >nn)) {
+   case 0:
+   break; /* not a hunk header */
+   case -1:
+   return; /* parse error */
+   default:
state->lno = state->nb;
if (state->nn == 0) {
/* @@ -X,Y +N,0 @@ removed Y lines
diff --git a/diff.c b/diff.c
index 8647db3d30..af6c01c4d0 100644
--- a/diff.c
+++ b/diff.c
@@ -1921,8 +1921,9 @@ static void fn_out_diff_words_aux(void *priv, char *line, 
unsigned long len)
struct diff_options *opt = diff_words->opt;
const char *line_prefix;
 
-   if (line[0] != '@' || parse_hunk_header(line, len,
-   _first, _len, _first, _len))
+   if (maybe_parse_hunk_header(line, len,
+   _first, _len,
+   _first, _len) <= 0)
return;
 
assert(opt);
diff --git a/xdiff-interface.c b/xdiff-interface.c
index e7af95db86..f574ee49cd 100644
--- a/xdiff-interface.c
+++ b/xdiff-interface.c
@@ -14,49 +14,53 @@ struct xdiff_emit_state {
struct strbuf remainder;
 };
 
-static int

Re: [PATCH] it's called read_object_file() these days

2018-11-02 Thread Jeff King

On Fri, Nov 02, 2018 at 03:05:03PM +0900, Junio C Hamano wrote:

> Remnant of the old name of the function still remains in comments.
> Update them all.

Yay. What's here looks obviously correct.

> Signed-off-by: Junio C Hamano 
> ---
>  apply.c   | 2 +-
>  builtin/gc.c  | 2 +-
>  fast-import.c | 4 ++--
>  notes.c   | 2 +-
>  object.h  | 2 +-
>  sha1-file.c   | 2 +-
>  6 files changed, 7 insertions(+), 7 deletions(-)

There's another mention in Documentation/technical/api-object-access.txt.

But since the entire API is undocumented, I'm not sure it matters much.
That file has been a placeholder since 2007. Maybe we should just delete
it; its existence does not seem to be guilting anyone into documenting,
and these days we'd prefer to do it in-header anyway.

-Peff

Re: Understanding pack format

2018-11-02 Thread Junio C Hamano

Farhan Khan  writes:

> ...Where is this in the git code? That might
> serve as a good guide.

There are two major codepaths.  One is used at runtime, giving us
random access into the packfile with the help with .idx file.  The
other is used when receiving a new packstream to create an .idx
file.

Personally I find the latter a bit too dense for those who are new
to the codebase, and the former would probably be easier to grok.

Start from sha1-file.c::read_object(), which will eventually lead
you to oid_object_info_extended() that essentially boils down to

 - a call to find_pack_entry() with the object name, and then

 - a call to packed_object_info() with the pack entry found earlier.

Following packfile.c::packed_object_info() will lead you to
cache_or_unpack_entry(); the unpack_entry() function is where all
the action to read from the packstream for one object's worth of
data and to reconstruct the object out of its deltified representation
takes place.

[PATCH] it's called read_object_file() these days

2018-11-02 Thread Junio C Hamano

Remnant of the old name of the function still remains in comments.
Update them all.

Signed-off-by: Junio C Hamano 
---
 apply.c   | 2 +-
 builtin/gc.c  | 2 +-
 fast-import.c | 4 ++--
 notes.c   | 2 +-
 object.h  | 2 +-
 sha1-file.c   | 2 +-
 6 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/apply.c b/apply.c
index 073d5f0451..ef1a1b2c4e 100644
--- a/apply.c
+++ b/apply.c
@@ -3254,7 +3254,7 @@ static int read_blob_object(struct strbuf *buf, const 
struct object_id *oid, uns
result = read_object_file(oid, , );
if (!result)
return -1;
-   /* XXX read_sha1_file NUL-terminates */
+   /* XXX read_object_file NUL-terminates */
strbuf_attach(buf, result, sz, sz + 1);
}
return 0;
diff --git a/builtin/gc.c b/builtin/gc.c
index 871a56f1c5..a682a0f44e 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -285,7 +285,7 @@ static uint64_t estimate_repack_memory(struct packed_git 
*pack)
/* revindex is used also */
heap += sizeof(struct revindex_entry) * nr_objects;
/*
-* read_sha1_file() (either at delta calculation phase, or
+* read_object_file() (either at delta calculation phase, or
 * writing phase) also fills up the delta base cache
 */
heap += delta_base_cache_limit;
diff --git a/fast-import.c b/fast-import.c
index 95600c78e0..f73b2ae0a6 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -1298,13 +1298,13 @@ static void stream_blob(uintmax_t len, struct object_id 
*oidout, uintmax_t mark)
  * oe must not be NULL.  Such an oe usually comes from giving
  * an unknown SHA-1 to find_object() or an undefined mark to
  * find_mark().  Callers must test for this condition and use
- * the standard read_sha1_file() when it happens.
+ * the standard read_object_file() when it happens.
  *
  * oe->pack_id must not be MAX_PACK_ID.  Such an oe is usually from
  * find_mark(), where the mark was reloaded from an existing marks
  * file and is referencing an object that this fast-import process
  * instance did not write out to a packfile.  Callers must test for
- * this condition and use read_sha1_file() instead.
+ * this condition and use read_object_file() instead.
  */
 static void *gfi_unpack_entry(
struct object_entry *oe,
diff --git a/notes.c b/notes.c
index 25cdce28b7..6a430931a3 100644
--- a/notes.c
+++ b/notes.c
@@ -858,7 +858,7 @@ static int string_list_add_note_lines(struct string_list 
*list,
if (is_null_oid(oid))
return 0;
 
-   /* read_sha1_file NUL-terminates */
+   /* read_object_file NUL-terminates */
data = read_object_file(oid, , );
if (t != OBJ_BLOB || !data || !len) {
free(data);
diff --git a/object.h b/object.h
index 0feb90ae61..4cabc9f278 100644
--- a/object.h
+++ b/object.h
@@ -136,7 +136,7 @@ struct object *parse_object(struct repository *r, const 
struct object_id *oid);
  */
 struct object *parse_object_or_die(const struct object_id *oid, const char 
*name);
 
-/* Given the result of read_sha1_file(), returns the object after
+/* Given the result of read_object_file(), returns the object after
  * parsing it.  eaten_p indicates if the object has a borrowed copy
  * of buffer and the caller should not free() it.
  */
diff --git a/sha1-file.c b/sha1-file.c
index dd0b6aa873..31c2b926fe 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -124,7 +124,7 @@ const char *empty_blob_oid_hex(void)
 
 /*
  * This is meant to hold a *small* number of objects that you would
- * want read_sha1_file() to be able to return, but yet you do not want
+ * want read_object_file() to be able to return, but yet you do not want
  * to write them into the object store (e.g. a browse-only
  * application).
  */

95 matches

Mail list logo