Re: [PATCH] git: add -N as a short option for --no-pager

2018-04-24 Thread Beat Bolli <bbo...@ewanet.ch>
On Wed, Apr 25, 2018 at 09:05:56AM +0900, Junio C Hamano wrote:
> Johannes Sixt  writes:
> 
> > In modern setups, less, the pager, uses alternate screen to show
> > the content. When it is closed, it switches back to the original
> > screen, and all content is gone.
> >
> > It is not uncommon to request that the output remains visible in
> > the terminal. For this, the option --no-pager can be used. But
> > it is a bit cumbersome to type, even when command completion is
> > available. Provide a short option, -N, to make the option easier
> > accessible.
> >
> > Signed-off-by: Johannes Sixt 
> > ---
> 
> Heh, I used to append "|cat", which is four keystrokes that is a bit
> shorter than " --no-pager", but that is only acceptable when you do
> not care about colored output ;-)
> 
> I am not absolutely certain about the choice of a single letter. I
> already checked we do not use "git -N cmd" for anything else right
> now, so I am certain about the availability, but I am not sure if
> capital 'N' is the best choice, when the other side is lowercase 'p'
> (and more importantly, the other side 'p' has mneomonic value for
> 'pagination', but 'N' merely stands for 'no' and could be negating
> anything, not related to pagination). But I agree that a short-hand
> would be welcome.
> 

I'm quite fond of the notation "-p-", but that would set a precedent for
all other "--no-" options.

Maybe the option parser could be enhanced to allow for both?

Thanks,
Beat

> > diff --git a/Documentation/git.txt b/Documentation/git.txt
> > index 4767860e72..17b50b0dc6 100644
> > --- a/Documentation/git.txt
> > +++ b/Documentation/git.txt
> > @@ -11,7 +11,7 @@ SYNOPSIS
> >  [verse]
> >  'git' [--version] [--help] [-C ] [-c =]
> >  [--exec-path[=]] [--html-path] [--man-path] [--info-path]
> > -[-p|--paginate|--no-pager] [--no-replace-objects] [--bare]
> > +[-p|--paginate|-N|--no-pager] [--no-replace-objects] [--bare]
> >  [--git-dir=] [--work-tree=] [--namespace=]
> >  [--super-prefix=]
> >   []
> > @@ -103,6 +103,7 @@ foo.bar= ...`) sets `foo.bar` to the empty string which 
> > `git config
> > configuration options (see the "Configuration Mechanism" section
> > below).
> >  
> > +-N::
> >  --no-pager::
> > Do not pipe Git output into a pager.


signature.asc
Description: PGP signature


Re: [PATCH 25/41] builtin/receive-pack: avoid hard-coded constants for push certs

2018-04-24 Thread Martin Ågren
On 25 April 2018 at 04:00, brian m. carlson
 wrote:
> On Tue, Apr 24, 2018 at 11:58:17AM +0200, Martin Ågren wrote:
>> On 24 April 2018 at 01:39, brian m. carlson
>> > diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
>> > index c4272fbc96..5f35596c14 100644
>> > --- a/builtin/receive-pack.c
>> > +++ b/builtin/receive-pack.c
>> > @@ -454,21 +454,21 @@ static void hmac_sha1(unsigned char *out,
>> > /* RFC 2104 2. (6) & (7) */
>> > git_SHA1_Init();
>> > git_SHA1_Update(, k_opad, sizeof(k_opad));
>> > -   git_SHA1_Update(, out, 20);
>> > +   git_SHA1_Update(, out, GIT_SHA1_RAWSZ);
>> > git_SHA1_Final(out, );
>> >  }
>>
>> Since we do HMAC with SHA-1, we use the functions `git_SHA1_foo()`. Ok.
>> But then why not just use "20"? Isn't GIT_SHA1_RAWSZ coupled to the
>> whole hash transition thing? This use of "20" is not, IMHO, the "length
>> in bytes [...] of an object name" (quoting cache.h).
>
> Originally, GIT_SHA1_RAWSZ was a good stand-in for the hard-coded uses
> of 20 (and GIT_SHA1_HEXSZ for 40) for object IDs.  Recently, we've
> started moving toward using the_hash_algo for the object ID-specific
> hash values, so I've started using those constants only to identify
> SHA-1 specific items.
>
> In this case, using the constant makes it more obvious that what we're
> passing is indeed an SHA-1 hash.  It also makes it easier to find all
> the remaining instances of "20" in the codebase and analyze them
> accordingly.
>
> I agree that this isn't an object name strictly, but it's essentially
> equivalent.  If you feel strongly, I can leave this the way it is.

I see. So one could say that in the ideal end-game, GIT_SHA1_RAWSZ would
be gone when the oid-hash-transition is over. Except since we also use
SHA-1 for other stuff than object IDs, the real-world ideal end-game is
that we only have a few users lingering in places that have nothing to
do with oid, but only with SHA-1 (and maybe in the gluing for
calculating SHA-1 oids..).

I do not feel strongly about this. I was just surprised to see it.
Thank you for explaining this.

Martin


Re: [PATCH 25/41] builtin/receive-pack: avoid hard-coded constants for push certs

2018-04-24 Thread brian m. carlson
On Tue, Apr 24, 2018 at 11:58:17AM +0200, Martin Ågren wrote:
> On 24 April 2018 at 01:39, brian m. carlson
> > diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
> > index c4272fbc96..5f35596c14 100644
> > --- a/builtin/receive-pack.c
> > +++ b/builtin/receive-pack.c
> > @@ -454,21 +454,21 @@ static void hmac_sha1(unsigned char *out,
> > /* RFC 2104 2. (6) & (7) */
> > git_SHA1_Init();
> > git_SHA1_Update(, k_opad, sizeof(k_opad));
> > -   git_SHA1_Update(, out, 20);
> > +   git_SHA1_Update(, out, GIT_SHA1_RAWSZ);
> > git_SHA1_Final(out, );
> >  }
> 
> Since we do HMAC with SHA-1, we use the functions `git_SHA1_foo()`. Ok.
> But then why not just use "20"? Isn't GIT_SHA1_RAWSZ coupled to the
> whole hash transition thing? This use of "20" is not, IMHO, the "length
> in bytes [...] of an object name" (quoting cache.h).

Originally, GIT_SHA1_RAWSZ was a good stand-in for the hard-coded uses
of 20 (and GIT_SHA1_HEXSZ for 40) for object IDs.  Recently, we've
started moving toward using the_hash_algo for the object ID-specific
hash values, so I've started using those constants only to identify
SHA-1 specific items.

In this case, using the constant makes it more obvious that what we're
passing is indeed an SHA-1 hash.  It also makes it easier to find all
the remaining instances of "20" in the codebase and analyze them
accordingly.

I agree that this isn't an object name strictly, but it's essentially
equivalent.  If you feel strongly, I can leave this the way it is.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: Fetching tags overwrites existing tags

2018-04-24 Thread Wink Saville
On Tue, Apr 24, 2018 at 5:52 PM, Junio C Hamano  wrote:
> Wink Saville  writes:
>
>> Ideally I would have liked the tags fetched from gbenchmark to have a prefix
>> of gbenchmark/, like the branches have, maybe something like:
>>
>> $ git fetch --tags gbenchmark
>> ...
>>  * [new branch]  v2  -> gbenchmark/v2
>>  * [new tag] v0.0.9  -> gbenchmark/v0.0.9
>>  * [new tag] v0.1.0  -> gbenchmark/v0.1.0
>>  * [new tag] v1.0.0  -> gbenchmark/v1.0.0
>>  * [new tag] v1.1.0  -> gbenchmark/v1.1.0
>>  * [new tag] v1.2.0  -> gbenchmark/v1.2.0
>>  * [new tag] v1.3.0  -> gbenchmark/v1.3.0
>>  * [new tag] v1.4.0  -> gbenchmark/v1.4.0
>
> The tag namespace (refs/tags/) is considered a shared resource (I am
> not saying that that is the only valid world model---I am merely
> explaining why things are like they are), hence the auto-following
> tags will bring them to refs/tags/ (and I do not think there is no
> way to configure auto-following to place them elsewhere).
>
> But you could configure things yourself.
>
> $ git init victim && cd victim
> $ git remote add origin ../git.git
> $ git config --add remote.origin.fetch \
>   "+refs/tags/*:refs/remote-tags/origin/*"
> $ tail -n 4 .git/config
> [remote "origin"]
> url = ../git.git/
> fetch = +refs/heads/*:refs/remotes/origin/*
> fetch = +refs/tags/*:refs/remote-tags/origin/*
> $ git fetch --no-tags
>
> The "--no-tags" option serves to decline the auto-following tags to
> refs/tags/ hierarchy; once your repository is configured this way,
> your initial and subsequent "git fetch" will copy refs it finds in
> refs/tags/ hierarchy over there to your refs/remote-tags/origin/
> hierarchy locally.

Interesting that kinda works, what about teaching git-remote to
understand a "--prefix-tags" option which would create the
"fetch = refs/tags/*:refs/remote-tags/origin" entry. And teach
git-fetch to use that entry if it exists and not require the "--no-tags"?

Of course I'm sure there are "lots" of other things to change, doing
a search for "remotes/" gives the following:

$ find . -type f -name '*.c' | xargs grep '\bremotes/' | sort -uk1,1
./builtin/branch.c: fmt = "refs/remotes/%s";
./builtin/checkout.c: skip_prefix(argv0, "remotes/", );
./builtin/clone.c: strbuf_addf(_top, "refs/remotes/%s/", option_origin);
./builtin/describe.c: !skip_prefix(path, "refs/remotes/", _to_match)) {
./builtin/fast-export.c: "refs/remotes/",
./builtin/fetch.c: else if (starts_with(rm->name, "refs/remotes/")) {
./builtin/merge.c: if (starts_with(found_ref, "refs/remotes/")) {
./builtin/pull.c: * refs/heads/ to
refs/remotes//.
./builtin/remote.c: strbuf_addf(tmp, "refs/heads/%s:refs/remotes/%s/%s",
./builtin/rev-parse.c: handle_ref_opt(arg, "refs/remotes/");
./builtin/show-branch.c: if (!starts_with(refname, "refs/remotes/"))
./contrib/examples/builtin-fetch--tool.c: else if
(!strncmp(remote_name, "refs/remotes/", 13)) {
./help.c: if (skip_prefix(refname, "refs/remotes/", ) &&
./log-tree.c: else if (starts_with(refname, "refs/remotes/"))
./ref-filter.c:skip_prefix(refname, "refs/remotes/", ) ||
./refs.c: return refs_for_each_ref_in(refs, "refs/remotes/", fn, cb_data);
./remote.c: FILE *f = fopen_or_warn(git_path("remotes/%s", remote->name), "r");
./revision.c: for_each_glob_ref_in(handle_one_ref, optarg,
"refs/remotes/", );
./sha1_name.c: starts_with(refname, "refs/remotes/"))
./wt-status.c: skip_prefix(from, "refs/remotes/", );

-- wink


Re: Fetching tags overwrites existing tags

2018-04-24 Thread Jacob Keller
On Tue, Apr 24, 2018 at 5:52 PM, Junio C Hamano  wrote:
> Wink Saville  writes:
>
>> Ideally I would have liked the tags fetched from gbenchmark to have a prefix
>> of gbenchmark/, like the branches have, maybe something like:
>>
>> $ git fetch --tags gbenchmark
>> ...
>>  * [new branch]  v2  -> gbenchmark/v2
>>  * [new tag] v0.0.9  -> gbenchmark/v0.0.9
>>  * [new tag] v0.1.0  -> gbenchmark/v0.1.0
>>  * [new tag] v1.0.0  -> gbenchmark/v1.0.0
>>  * [new tag] v1.1.0  -> gbenchmark/v1.1.0
>>  * [new tag] v1.2.0  -> gbenchmark/v1.2.0
>>  * [new tag] v1.3.0  -> gbenchmark/v1.3.0
>>  * [new tag] v1.4.0  -> gbenchmark/v1.4.0
>
> The tag namespace (refs/tags/) is considered a shared resource (I am
> not saying that that is the only valid world model---I am merely
> explaining why things are like they are), hence the auto-following
> tags will bring them to refs/tags/ (and I do not think there is no
> way to configure auto-following to place them elsewhere).
>
> But you could configure things yourself.
>
> $ git init victim && cd victim
> $ git remote add origin ../git.git
> $ git config --add remote.origin.fetch \
>   "+refs/tags/*:refs/remote-tags/origin/*"
> $ tail -n 4 .git/config
> [remote "origin"]
> url = ../git.git/
> fetch = +refs/heads/*:refs/remotes/origin/*
> fetch = +refs/tags/*:refs/remote-tags/origin/*
> $ git fetch --no-tags
>
> The "--no-tags" option serves to decline the auto-following tags to
> refs/tags/ hierarchy; once your repository is configured this way,
> your initial and subsequent "git fetch" will copy refs it finds in
> refs/tags/ hierarchy over there to your refs/remote-tags/origin/
> hierarchy locally.

It should be noted, that remote-tags would not be integrated into "git
tag" or many other places in git commands, so it may be significantly
less visible.

Thanks,
Jake


Re: [PATCH 21/41] http: eliminate hard-coded constants

2018-04-24 Thread brian m. carlson
On Wed, Apr 25, 2018 at 08:44:19AM +0900, Junio C Hamano wrote:
> Martin Ågren  writes:
> 
> >> switch (data[i]) {
> >> case 'P':
> >> i++;
> >> -   if (i + 52 <= buf.len &&
> >> +   if (i + hexsz + 12 <= buf.len &&
> >> starts_with(data + i, " pack-") &&
> >> -   starts_with(data + i + 46, ".pack\n")) {
> >> -   get_sha1_hex(data + i + 6, sha1);
> >> -   fetch_and_setup_pack_index(packs_head, 
> >> sha1,
> >> +   starts_with(data + i + hexsz + 6, ".pack\n")) {
> >> +   get_sha1_hex(data + i + 6, hash);
> >> +   fetch_and_setup_pack_index(packs_head, 
> >> hash,
> >>   base_url);
> >> i += 51;
> >
> > s/51/hexsz + 11/ ?
> 
> Quite right.

Good point.  Will fix.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH v2 0/3] Some add-on patches on top of dj/runtime-prefix

2018-04-24 Thread Junio C Hamano
Johannes Schindelin  writes:

> Thank you, and sorry for the trouble. I am just too used to a Continuous
> Integration setting with exactly one integration branch.

Fixing problems close to the source (i.e. picking an appropriately
aged base) and making sure everthing works near the tip ala CI style
are not opposing goals.  It just takes an extra step (i.e. trial
merge and testing the result) and discipline.  Until one gets used
to do it so much that one can do in one's sleep, that is ;-)

> I will make an effort in the future to figure out the best base branch for
> patches that do not apply cleanly on `master` but require more stuff from
> `next`/`pu`.

The easiest is to leave that to the maintainer most of the time, as
that is what maintainers do.

Thanks.

I really want to see that the runtime prefix stuff mature enough
during this cycle, so these follow-up patches are all very much
appreciated.


Re: Fetching tags overwrites existing tags

2018-04-24 Thread Junio C Hamano
Wink Saville  writes:

> Ideally I would have liked the tags fetched from gbenchmark to have a prefix
> of gbenchmark/, like the branches have, maybe something like:
>
> $ git fetch --tags gbenchmark
> ...
>  * [new branch]  v2  -> gbenchmark/v2
>  * [new tag] v0.0.9  -> gbenchmark/v0.0.9
>  * [new tag] v0.1.0  -> gbenchmark/v0.1.0
>  * [new tag] v1.0.0  -> gbenchmark/v1.0.0
>  * [new tag] v1.1.0  -> gbenchmark/v1.1.0
>  * [new tag] v1.2.0  -> gbenchmark/v1.2.0
>  * [new tag] v1.3.0  -> gbenchmark/v1.3.0
>  * [new tag] v1.4.0  -> gbenchmark/v1.4.0

The tag namespace (refs/tags/) is considered a shared resource (I am
not saying that that is the only valid world model---I am merely
explaining why things are like they are), hence the auto-following
tags will bring them to refs/tags/ (and I do not think there is no
way to configure auto-following to place them elsewhere).

But you could configure things yourself.

$ git init victim && cd victim
$ git remote add origin ../git.git
$ git config --add remote.origin.fetch \
  "+refs/tags/*:refs/remote-tags/origin/*"
$ tail -n 4 .git/config
[remote "origin"]
url = ../git.git/
fetch = +refs/heads/*:refs/remotes/origin/*
fetch = +refs/tags/*:refs/remote-tags/origin/*
$ git fetch --no-tags

The "--no-tags" option serves to decline the auto-following tags to
refs/tags/ hierarchy; once your repository is configured this way,
your initial and subsequent "git fetch" will copy refs it finds in
refs/tags/ hierarchy over there to your refs/remote-tags/origin/
hierarchy locally.


Re: [PATCH v3 09/11] technical/shallow: describe the relationship with replace refs

2018-04-24 Thread Junio C Hamano
"Philip Oakley"  writes:

> Perhaps something like:
> +$GIT_DIR/shallow, and handle its contents similar to replace
> +refs (with the difference that shallow does not actually
> +create those replace refs) with the difference that shallow commits will
> +always have their parents not present.

I am not sure if there is enough similarity to replace mechanism to
mention that.  It has lines of text, each of which records a commit
object for which Git is told to pretend that it has no parent.

To those who are familiar with "graft" format, it is possible to
explain the format as "it is similar to graft", as a line with a
single commit object name tells Git to pretend that it has no parent
in the "graft" format, but because we are getting rid of graft, it
probably makes sense to just explain it without reference to replace
(which may serve a similar purpose, but is certainly very far from
"similar" as a mechanism when you explain how the contents of shallow
is handled).

$GIT_DIR/shallow lists commit object names and tells Git to
pretend as if they are root commits (e.g. "git log" traversal
stops after showing them; "git fsck" does not complain saying
the commits listed on their "parent" lines do not exist).





git https and github

2018-04-24 Thread Lev
Hi list,


I'm struggling with git connecting to Github.

The problem might be SSL/TLS related.

https://githubengineering.com/crypto-removal-notice/

I suspect that my setup still uses tlsv1 or tlsv1.1.

I've tried to explicitly set git to use tlsv1.2 in my .gitconfig file
like this:

[http]
sslVersion = tlsv1.2

I've tried to re-compile git with OpenSSL and GnuTLS. All give the
same error.

git clone https://github.com/OnionIoT/source.git
Cloning into 'source'...
* Couldn't find host github.com in the .netrc file; using defaults
*   Trying 192.30.253.112...
* TCP_NODELAY set
* Connected to github.com (192.30.253.112) port 443 (#0)
* ALPN, offering http/1.1
* Cipher selection:
ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* error:1409442E:SSL routines:ssl3_read_bytes:tlsv1 alert protocol
version
* Curl_http_done: called premature == 1
* stopped the pause stream!
* Closing connection 0
fatal: unable to access 'https://github.com/OnionIoT/source.git/':
error:1409442E:SSL routines:ssl3_read_bytes:tlsv1 alert protocol
version lev@jive:~/git$ unset GIT_SSL_VERSION lev@jive:~/git$ git clone
https://github.com/OnionIoT/source.git Cloning into 'source'...
* Couldn't find host github.com in the .netrc file; using defaults
*   Trying 192.30.253.112...
* TCP_NODELAY set
* Connected to github.com (192.30.253.112) port 443 (#0)
* ALPN, offering http/1.1
* Cipher selection:
ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* error:1409442E:SSL routines:ssl3_read_bytes:tlsv1 alert protocol
version
* Curl_http_done: called premature == 1
* stopped the pause stream!
* Closing connection 0
fatal: unable to access 'https://github.com/OnionIoT/source.git/':
error:1409442E:SSL routines:ssl3_read_bytes:tlsv1 alert protocol version


I can connect to other git servers without any error. This is a debian
stable system with the following components:

git version 2.11.0

libcurl 7.52.1

OpenSSL 1.0.2l


Is there any way to know what is the exact protocol used? Are there any
workaround, fix for this issue?

Any help welcome. Thank you,
Levente

-- 
Levente Kovacs
Senior Electronic Engineer

W: http://levente.logonex.eu


Re: [PATCH 5/9] packfile: add repository argument to packed_object_info

2018-04-24 Thread Junio C Hamano
Jonathan Tan  writes:

> On Mon, 23 Apr 2018 16:43:23 -0700
> Stefan Beller  wrote:
>
>> diff --git a/sha1_file.c b/sha1_file.c
>> index 93f25c6c6a..b292e04fd3 100644
>> --- a/sha1_file.c
>> +++ b/sha1_file.c
>> @@ -1307,7 +1307,8 @@ int oid_object_info_extended_the_repository(const 
>> struct object_id *oid, struct
>>   * information below, so return early.
>>   */
>>  return 0;
>> -rtype = packed_object_info(e.p, e.offset, oi);
>> +
>> +rtype = packed_object_info(the_repository, e.p, e.offset, oi);
>
> Extra blank line introduced.

Yes, but from the way this function is formatted, I think the
new paragraph break there makes sort of sense.


Re: [PATCH v2 0/2] add additional config settings for merge

2018-04-24 Thread Junio C Hamano
Ben Peart  writes:

>  diff.renameLimit::
>   The number of files to consider when performing the copy/rename
> - detection; equivalent to the 'git diff' option `-l`.
> + detection; equivalent to the 'git diff' option `-l`. This setting
> + has no effect if rename detection is turned off.

You mean "turned off via diff.renames"?

This is not meant as a suggestion to rewrite this paragraph
further---but if the answer is "no", then that might be an
indication that the sentence is inviting a misunderstanding.

>  diff.renames::
>   Whether and how Git detects renames.  If set to "false",
> diff --git a/Documentation/merge-config.txt b/Documentation/merge-config.txt
> index 5a9ab969db..38492bcb98 100644
> --- a/Documentation/merge-config.txt
> +++ b/Documentation/merge-config.txt
> @@ -39,7 +39,8 @@ include::fmt-merge-msg-config.txt[]
>  merge.renameLimit::
>   The number of files to consider when performing rename detection
>   during a merge; if not specified, defaults to the value of
> - diff.renameLimit.
> + diff.renameLimit. This setting has no effect if rename detection
> + is turned off.

Ditto.  If your design is to make the merge machinery completely
ignore diff.renames and only pay attention to merge.renames [*1*],
then it probably is a good idea to be more specific here, by saying
"... is turned off via ...", though.

>  merge.renames::
>   Whether and how Git detects renames.  If set to "false",

[Footnote]

*1* ...which I do not think is such a good idea, by the way.  I'd
personally expect merge.renames to allow overriding and falling back
to diff.renames, just like the {merge,diff}.renameLimit pair does.


Re: [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option

2018-04-24 Thread Jonathan Tan
On Tue, 24 Apr 2018 16:23:28 -0700
Stefan Beller  wrote:

> >> s/missmatch/mismatch/
> >> Also, what is this used for?
> >
> > The mismatch should be used for (thanks for catching!)
> > checking if the remainder of a line is the same, although a boolean
> > may be not the correct choice. We know that the two strings
> > passed into compute_ws_delta come from a complete white space
> > agnostic comparison, so consider:
> >
> > + SP SP more TAB more
> > + SP SP text TAB text
> >
> > - SP more TAB more
> > - SP text TAB text
> >
> > which would mark this as a moved block. This is the feature
> > working as intended, but what about
> >
> > + SP SP more TAB more
> > + SP SP text TAB text
> >
> > - SP more SP more
> > - SP text TAB text
> >
> > Note how the length of the strings is the same, hence the current
> > code of
> >
> > compute_ws_delta(...) {
> > int d = longer->len - shorter->len;
> > out->string = xmemdupz(longer->line, d);
> > }
> >
> > would work fine and not notice the "change in the remainder".
> > That ought to be caught by the mismatch variable, that
> > is assigned, but not used.
> >
> > The compare_ws_delta(a, b) needs to be extended to
> >
> >   !a->mismatch && !b->mismatch && existing_condition
> >
> > Ideally the example from above would be different depending
> > on whether the other white space flags are given or not.

Thanks - this gives me food for thought.

I'm starting to think that it is impossible to avoid creating our own
string comparison function that:
 - seeks to the first non-whitespace character in both strings
 - checks that both strings, from that first non-whitespace characters,
   are equal for some definition of equal (whether through strcmp or
   xdiff_compare_lines)
 - walks backwards from that first non-whitespace characters to look for
   the first non-matching whitespace character between the 2 strings

The existing diff whitespace modes (to be passed to xdiff_compare_lines)
do not give the exact result we want. For example, if
XDF_IGNORE_WHITESPACE is used (as is in this patch), lines like "a b"
and "ab " would match even though they shouldn't.

This comparison function can be used both in moved_entry_cmp() and when
constructing the ws_delta (in which case it should be made to output
whatever information is needed as out parameters or similar).

> >>> + if (diffopt->color_moved_ws_handling & 
> >>> COLOR_MOVED_DELTA_WHITESPACES)
> >>> + /*
> >>> +  * As there is not specific white space config given,
> >>> +  * we'd need to check for a new block, so ignore all
> >>> +  * white space. The setup of the white space
> >>> +  * configuration for the next block is done else where
> >>> +  */
> >>> + flags |= XDF_IGNORE_WHITESPACE;
> >>> +
> >>>   return !xdiff_compare_lines(a->es->line, a->es->len,
> >>>   b->es->line, b->es->len,
> >>>   flags);

We can't add XDF_IGNORE_WHITESPACE here - as far as I can tell, this
means that more lines will be treated as moved than the user wants (if
the user did not set --color-moved-ignore-all-space).

> >> It seems like pmb and wsd are parallel arrays - could each wsd be
> >> embedded into the corresponding entry of pmb instead?
> >
> > I'll explore that. It sounds like a good idea for code hygiene.
> > Although if you do not intend to use this feature, then keeping it separate
> > would allow for a smaller footprint in memory.

If you're worried about memory, wsd can be embedded as a pointer.

> > The command is missing a --color-moved, as the 
> > --color-moved-whitespace-settings
> > do not imply --color-moved, yet(?)

Maybe one should imply the other (or at least a warning).


Re: [PATCH] git: add -N as a short option for --no-pager

2018-04-24 Thread Junio C Hamano
Johannes Sixt  writes:

> In modern setups, less, the pager, uses alternate screen to show
> the content. When it is closed, it switches back to the original
> screen, and all content is gone.
>
> It is not uncommon to request that the output remains visible in
> the terminal. For this, the option --no-pager can be used. But
> it is a bit cumbersome to type, even when command completion is
> available. Provide a short option, -N, to make the option easier
> accessible.
>
> Signed-off-by: Johannes Sixt 
> ---

Heh, I used to append "|cat", which is four keystrokes that is a bit
shorter than " --no-pager", but that is only acceptable when you do
not care about colored output ;-)

I am not absolutely certain about the choice of a single letter. I
already checked we do not use "git -N cmd" for anything else right
now, so I am certain about the availability, but I am not sure if
capital 'N' is the best choice, when the other side is lowercase 'p'
(and more importantly, the other side 'p' has mneomonic value for
'pagination', but 'N' merely stands for 'no' and could be negating
anything, not related to pagination). But I agree that a short-hand
would be welcome.

> diff --git a/Documentation/git.txt b/Documentation/git.txt
> index 4767860e72..17b50b0dc6 100644
> --- a/Documentation/git.txt
> +++ b/Documentation/git.txt
> @@ -11,7 +11,7 @@ SYNOPSIS
>  [verse]
>  'git' [--version] [--help] [-C ] [-c =]
>  [--exec-path[=]] [--html-path] [--man-path] [--info-path]
> -[-p|--paginate|--no-pager] [--no-replace-objects] [--bare]
> +[-p|--paginate|-N|--no-pager] [--no-replace-objects] [--bare]
>  [--git-dir=] [--work-tree=] [--namespace=]
>  [--super-prefix=]
>   []
> @@ -103,6 +103,7 @@ foo.bar= ...`) sets `foo.bar` to the empty string which 
> `git config
>   configuration options (see the "Configuration Mechanism" section
>   below).
>  
> +-N::
>  --no-pager::
>   Do not pipe Git output into a pager.


Re: [PATCH v1 2/2] merge: Add merge.aggressive config setting

2018-04-24 Thread Junio C Hamano
Ben Peart  writes:

> That said, it makes sense to me to do
>> this when rename detection is turned off.  In fact, I think you'd
>> automatically want to set aggressive to true whenever rename detection
>> is turned off (whether by your merge.renames option or the
>> -Xno-renames flag).
>> ...
>
> While combining them would work for our specific use scenario (since
> we turn both on already along with turning off merge.stat), I really
> hesitate to tie these two different flags and code paths together with
> a single config setting.

The cases that non-agressive variant leaves unmerged are not
auto-resolved only because marking them as merged will rob the
chance from the rename detection logic to notice which ones are
"new" paths that could be matched with "deleted" ones to turn into
renames.  If rename deteciton is not done, there is no reason to
leave it non aggressive, as "#1 = missing, #2 = something and #3 =
missing" entry (just one example that is not auto-resolved by
non-agressive, but the principle is the same) left unmerged in the
index will get resolved to keep the current entry by the post
processing logic anyway.

In fact, checking git-merge-resolve would tell us that we already
use "aggresive" variant there unconditionally.

So, I think Elijah is correct---there is no reason not to enable
this setting when the other one to refuse rename detection is in
effect.


Re: [PATCH 18/41] index-pack: abstract away hash function constant

2018-04-24 Thread brian m. carlson
On Tue, Apr 24, 2018 at 11:50:16AM +0200, Martin Ågren wrote:
> On 24 April 2018 at 01:39, brian m. carlson
>  wrote:
> > The code for reading certain pack v2 offsets had a hard-coded 5
> > representing the number of uint32_t words that we needed to skip over.
> > Specify this value in terms of a value from the_hash_algo.
> >
> > Signed-off-by: brian m. carlson 
> > ---
> >  builtin/index-pack.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> > index d81473e722..c1f94a7da6 100644
> > --- a/builtin/index-pack.c
> > +++ b/builtin/index-pack.c
> > @@ -1543,12 +1543,13 @@ static void read_v2_anomalous_offsets(struct 
> > packed_git *p,
> >  {
> > const uint32_t *idx1, *idx2;
> > uint32_t i;
> > +   const uint32_t hashwords = the_hash_algo->rawsz / sizeof(uint32_t);
> 
> Should we round up? Or just what should we do if a length is not
> divisible by 4? (I am not aware of any such hash functions, but one
> could exist for all I know.) Another question is whether such an
> index-pack v2 will ever contain non-SHA-1 oids to begin with. I can't
> find anything suggesting that it could, but this is unfamiliar code to
> me.

I opted not to simply because I know that our current hash is 20 bytes
and the new one will be 32, and I know those are both divisible by 4.  I
feel confident that any future hash we choose will also be divisible by
4, and the code is going to be complicated if it isn't.

I agree that pack v2 is not going to have anything but SHA-1.  However,
writing all the code such that it's algorithm agnostic means that we can
do testing of new algorithms by wholesale replacing the algorithm with a
new one, which simplifies things considerably.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: Fetching tags overwrites existing tags

2018-04-24 Thread Jacob Keller
On Tue, Apr 24, 2018 at 12:57 PM, Wink Saville  wrote:
> If have a repository with a tag "v1.0.0" and I add a remote repository
> which also has a tag "v1.0.0" tag is overwritten.
>
> Google found [1] from 2011 and option 3 is what I'd like to see. Has it been
> implemented and I just don't see it?
>
> [1]: https://groups.google.com/forum/#!topic/git-version-control/0l_rJFyTE60
>
>
> Here is an example demonstrating what I see:
>
> $ echo abc > abc.txt
> $ git init .
> Initialized empty Git repository in
> /home/wink/prgs/git/investigate-fetch-tags/.git/
> $ git add *
> $ git commit -m "Initial commit"
> [master (root-commit) 1116fdc] Initial commit
>  1 file changed, 1 insertion(+)
>  create mode 100644 abc.txt
> $ git tag v1.0.0
> $ git remote add gbenchmark g...@github.com:google/benchmark
> $ git log --graph --format="%h %s %d"
> * 1116fdc Initial commit  (HEAD -> master, tag: v1.0.0)
> $ git fetch --tags gbenchmark
> warning: no common commits
> remote: Counting objects: 4400, done.
> remote: Compressing objects: 100% (15/15), done.
> remote: Total 4400 (delta 5), reused 5 (delta 3), pack-reused 4382
> Receiving objects: 100% (4400/4400), 1.33 MiB | 2.81 MiB/s, done.
> Resolving deltas: 100% (2863/2863), done.
> From github.com:google/benchmark
>  * [new branch]  clangtidy   -> gbenchmark/clangtidy
>  * [new branch]  iter_report -> gbenchmark/iter_report
>  * [new branch]  master  -> gbenchmark/master
>  * [new branch]  releasing   -> gbenchmark/releasing
>  * [new branch]  reportercleanup -> gbenchmark/reportercleanup
>  * [new branch]  rmheaders   -> gbenchmark/rmheaders
>  * [new branch]  v2  -> gbenchmark/v2
>  * [new tag] v0.0.9  -> v0.0.9
>  * [new tag] v0.1.0  -> v0.1.0
>  t [tag update]  v1.0.0  -> v1.0.0
>  * [new tag] v1.1.0  -> v1.1.0
>  * [new tag] v1.2.0  -> v1.2.0
>  * [new tag] v1.3.0  -> v1.3.0
>  * [new tag] v1.4.0  -> v1.4.0
> $ git log --graph --format="%h %s %d"
> * 1116fdc Initial commit  (HEAD -> master)
>
> As you can see the tag on 1116fdc is gone, v1.0.0 tag has been updated
> and now its pointing to the tag in gbenchmark:
>
> $ git log -5 --graph --format="%h %s %d" v1.0.0
> *   cd525ae Merge pull request #171 from eliben/update-doc-userealtime
>  (tag: v1.0.0)
> |\
> | * c7ab1b9 Update README to mention UseRealTime for wallclock time
> measurements.
> |/
> * f662e8b Rename OS_MACOSX macro to new name BENCHMARK_OS_MACOSX. Fix #169
> *   0a1f484 Merge pull request #166 from disconnect3d/master
> |\
> | * d2917bc Fixes #165: CustomArguments ret type in README
> |/
>
> Ideally I would have liked the tags fetched from gbenchmark to have a prefix
> of gbenchmark/, like the branches have, maybe something like:
>

That would require a complete redesign of how we handle remotes. I've
proposed ideas in the past but never had time and they didn't gain
much traction.

It's a known limitation that the tags namespace can only hold a single
tag name (even if remotes differ in what that tag is). I *thought*
that the tags should not be updated after you fetch it once, but it
seems this is not the behavior we get now?

My basic idea was to fetch *all* remote refs into a
refs///* such that *every* ref in a
remote can be determined by something like
"refs/tracking/origin/tags/name" instead of
"refs/remotes/origin/name", and then tags would have to be updated to
check for tags in each remote as well as locally. Additionally, you
could update the tool to warn when two remotes have the same tag at
different refs, and allow disambiguation.

Ideally, "origin/branch" should still DWIM, same for "tag" should work
unless there are conflicts.

Unfortunately, it's a pretty big change in how remotes are handled,
and I never had time to actually work towards a POC or implementation.
Mostly, we ended up on bikeshedding what the name should be now that
we can't use "refs/remotes" due to backwards compatibility. I don't
really like "tracking" as a name, but it was the best I could come up
with.

(Note, the impetus for this proposal was actually to allow easy
sharing of notes and other specialized refs).

Thanks,
Jake

> $ git fetch --tags gbenchmark
> ...
>  * [new branch]  v2  -> gbenchmark/v2
>  * [new tag] v0.0.9  -> gbenchmark/v0.0.9
>  * [new tag] v0.1.0  -> gbenchmark/v0.1.0
>  * [new tag] v1.0.0  -> gbenchmark/v1.0.0
>  * [new tag] v1.1.0  -> gbenchmark/v1.1.0
>  * [new tag] v1.2.0  -> gbenchmark/v1.2.0
>  * [new tag] v1.3.0  -> gbenchmark/v1.3.0
>  * [new tag] v1.4.0  -> gbenchmark/v1.4.0
>
>
> -- Wink


[PATCH v3] Make running git under other debugger-like programs easy

2018-04-24 Thread Elijah Newren
This allows us to run git, when using the script from bin-wrappers, under
other programs.  A few examples for usage within testsuite scripts:

   debug git checkout master
   debug --debugger=nemiver git $ARGS
   debug -d "valgrind --tool-memcheck --track-origins=yes" git $ARGS

Or, if someone has bin-wrappers/ in their $PATH and is executing git
outside the testsuite:

   GIT_DEBUGGER="gdb --args" git $ARGS
   GIT_DEBUGGER=nemiver git $ARGS
   GIT_DEBUGGER="valgrind --tool=memcheck --track-origins=yes" git $ARGS

There is also a handy shortcut of GIT_DEBUGGER=1 meaning the same as
GIT_DEBUGGER="gdb --args"

Original-patch-by: Johannes Schindelin 
Signed-off-by: Elijah Newren 
---

Finally getting back to this now that I have tied up all loose ends
with the directory rename detection series (or at least I hope they
are all tied up).

There is only one change since v2:
  s/DBG_FLAGS/GIT_DEBUGGER/, as suggested by Dscho


 t/test-lib-functions.sh | 24 
 wrap-for-bin.sh | 19 +--
 2 files changed, 33 insertions(+), 10 deletions(-)

diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index b895366fee..a407b09b48 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -145,12 +145,28 @@ test_pause () {
"$SHELL_PATH" <&6 >&5 2>&7
 }
 
-# Wrap git in gdb. Adding this to a command can make it easier to
-# understand what is going on in a failing test.
+# Wrap git with a debugger. Adding this to a command can make it easier
+# to understand what is going on in a failing test.
 #
-# Example: "debug git checkout master".
+# Examples:
+# debug git checkout master
+# debug --debugger=nemiver git $ARGS
+# debug -d "valgrind --tool=memcheck --track-origins=yes" git $ARGS
 debug () {
-GIT_TEST_GDB=1 "$@" <&6 >&5 2>&7
+   case "$1" in
+   -d)
+   GIT_DEBUGGER="$2" &&
+   shift 2
+   ;;
+   --debugger=*)
+   GIT_DEBUGGER="${1#*=}" &&
+   shift 1
+   ;;
+   *)
+   GIT_DEBUGGER=1
+   ;;
+   esac &&
+   GIT_DEBUGGER="${GIT_DEBUGGER}" "$@" <&6 >&5 2>&7
 }
 
 # Call test_commit with the arguments
diff --git a/wrap-for-bin.sh b/wrap-for-bin.sh
index 5842408817..95851b85b6 100644
--- a/wrap-for-bin.sh
+++ b/wrap-for-bin.sh
@@ -20,10 +20,17 @@ PATH='@@BUILD_DIR@@/bin-wrappers:'"$PATH"
 
 export GIT_EXEC_PATH GITPERLLIB PATH GIT_TEXTDOMAINDIR
 
-if test -n "$GIT_TEST_GDB"
-then
-   unset GIT_TEST_GDB
-   exec gdb --args "${GIT_EXEC_PATH}/@@PROG@@" "$@"
-else
+case "$GIT_DEBUGGER" in
+'')
exec "${GIT_EXEC_PATH}/@@PROG@@" "$@"
-fi
+   ;;
+1)
+   unset GIT_DEBUGGER
+   exec gdb --args "${GIT_EXEC_PATH}/@@PROG@@" "$@"
+   ;;
+*)
+   GIT_DEBUGGER_ARGS="$GIT_DEBUGGER"
+   unset GIT_DEBUGGER
+   exec ${GIT_DEBUGGER_ARGS} "${GIT_EXEC_PATH}/@@PROG@@" "$@"
+   ;;
+esac
-- 
2.17.0.2.g31fed8301b



Re: [PATCH 21/41] http: eliminate hard-coded constants

2018-04-24 Thread Junio C Hamano
Martin Ågren  writes:

>> switch (data[i]) {
>> case 'P':
>> i++;
>> -   if (i + 52 <= buf.len &&
>> +   if (i + hexsz + 12 <= buf.len &&
>> starts_with(data + i, " pack-") &&
>> -   starts_with(data + i + 46, ".pack\n")) {
>> -   get_sha1_hex(data + i + 6, sha1);
>> -   fetch_and_setup_pack_index(packs_head, sha1,
>> +   starts_with(data + i + hexsz + 6, ".pack\n")) {
>> +   get_sha1_hex(data + i + 6, hash);
>> +   fetch_and_setup_pack_index(packs_head, hash,
>>   base_url);
>> i += 51;
>
> s/51/hexsz + 11/ ?

Quite right.


Re: [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option

2018-04-24 Thread Stefan Beller
Replied to Jonathan only instead of all. My reply is below:

On Tue, Apr 24, 2018 at 3:55 PM, Stefan Beller  wrote:
> On Tue, Apr 24, 2018 at 3:35 PM, Jonathan Tan  
> wrote:
>> On Tue, 24 Apr 2018 14:03:30 -0700
>> Stefan Beller  wrote:
>>
>>> +--color-moved-[no-]ignore-space-prefix-delta::
>>> + Ignores whitespace when comparing lines when performing the move
>>> + detection for --color-moved. This ignores uniform differences
>>> + of white space at the beginning lines in moved blocks.
>>
>> Setting this option means that moved lines may be indented or dedented,
>> and if they have been indented or dedented by the same amount, they are
>> still considered to be the same block. Maybe call this
>> --color-moved-allow-indentation-change.
>
> ok, sounds good as well. I tried coming up with a name that refers to
> the block check as that is the important part.
>
>>> +struct ws_delta {
>>> + char *string; /* The prefix delta, which is the same in the block */
>>
>> Probably better described as "the difference between the '-' line and
>> the '+' line". This may be the empty string if there is no difference.
>
> Makes sense.
>
>>
>>> + int direction; /* adding or removing the line? */
>>
>> What is the value when "added" and what when "removed"? Also, it is not
>> truly "added" or "removed", so a better way might be: 1 if the '-' line
>> is longer than the '+' line, and 0 otherwise. (And make sure that the
>> documentation is correct with respect to equal lines.)
>>
>>> + int missmatch; /* in the remainder */
>>
>> s/missmatch/mismatch/
>> Also, what is this used for?
>
> The mismatch should be used for (thanks for catching!)
> checking if the remainder of a line is the same, although a boolean
> may be not the correct choice. We know that the two strings
> passed into compute_ws_delta come from a complete white space
> agnostic comparison, so consider:
>
> + SP SP more TAB more
> + SP SP text TAB text
>
> - SP more TAB more
> - SP text TAB text
>
> which would mark this as a moved block. This is the feature
> working as intended, but what about
>
> + SP SP more TAB more
> + SP SP text TAB text
>
> - SP more SP more
> - SP text TAB text
>
> Note how the length of the strings is the same, hence the current
> code of
>
> compute_ws_delta(...) {
> int d = longer->len - shorter->len;
> out->string = xmemdupz(longer->line, d);
> }
>
> would work fine and not notice the "change in the remainder".
> That ought to be caught by the mismatch variable, that
> is assigned, but not used.
>
> The compare_ws_delta(a, b) needs to be extended to
>
>   !a->mismatch && !b->mismatch && existing_condition
>
> Ideally the example from above would be different depending
> on whether the other white space flags are given or not.
>
>>> + if (diffopt->color_moved_ws_handling & COLOR_MOVED_DELTA_WHITESPACES)
>>> + /*
>>> +  * As there is not specific white space config given,
>>> +  * we'd need to check for a new block, so ignore all
>>> +  * white space. The setup of the white space
>>> +  * configuration for the next block is done else where
>>> +  */
>>> + flags |= XDF_IGNORE_WHITESPACE;
>>> +
>>>   return !xdiff_compare_lines(a->es->line, a->es->len,
>>>   b->es->line, b->es->len,
>>>   flags);
>>
>> I wrote in [1]:
>>
>>   I think we should just prohibit combining this with any of the
>>   whitespace ignoring flags except for the space-at-eol one. They seem
>>   to contradict anyway.
>
> As outlined above, I think there are corner cases in which they do not
> contradict. So I think the COLOR_MOVED_DELTA_WHITESPACES
> will go into its own variable, and then we can solve the corner cases
> correctly.
>
>> To elaborate, adding a specific flag that may interfere with other
>> user-provided flags sounds like we're unnecessarily adding combinations
>> that we must keep track of, and that it's both logical (from a user's
>> point of view) and clearer (as for the code) to just forbid such
>> combinations.
>
> Yes, I think you mentioned this before. Thanks for reminding!
>
>>> + struct ws_delta *wsd = NULL; /* white space deltas between pmb */
>>> + int wsd_alloc = 0;
>>> +
>>> + int n, flipped_block = 1, block_length = 0;
>>
>> It seems like pmb and wsd are parallel arrays - could each wsd be
>> embedded into the corresponding entry of pmb instead?
>
> I'll explore that. It sounds like a good idea for code hygiene.
> Although if you do not intend to use this feature, then keeping it separate
> would allow for a smaller footprint in memory.
>
>>
>>> --- a/diff.h
>>> +++ b/diff.h
>>> @@ -214,6 +214,8 @@ struct diff_options {
>>>   } color_moved;
>>>   #define COLOR_MOVED_DEFAULT COLOR_MOVED_ZEBRA
>>>   #define COLOR_MOVED_MIN_ALNUM_COUNT 20
>>> + /* 

Re: [PATCH 5/7] diff.c: add a blocks mode for moved code detection

2018-04-24 Thread Jonathan Tan
On Tue, 24 Apr 2018 15:37:51 -0700
Stefan Beller  wrote:

> These can be combined independently, so would
> you expect the user to expect two options for them?
> For example "--color-moved=zebra" could be split
> into  "--skip-small --alternate-blocks"

Yes, this is a good explanation. Reusing your terms below, --skip-small
controls the algorithm, and --alternate-blocks controls the presentation
layer.

> So instead of building blocks we rather want to split into algorithms
> and presentation layer?
> 
> The presentation layer would be things like:
> * use a different color for moved things
> * alternate colors for adjacent blocks
> * paint border of a block (dimmed zebra)
> 
> The algorithm side would be
> * detect moves
> * detect moves as blocks
> * skip small heuristic

Yes.

This was just brainstorming, though - this might not be the direction we
want to take in this patch. (The right solution might just be to always
use blocks - thereby simplifying the algorithm aspect.)


Re: [PATCHv2 0/7] Moved code detection: ignore space on uniform indentation

2018-04-24 Thread Stefan Beller
On Tue, Apr 24, 2018 at 3:37 PM, Jonathan Tan  wrote:
> On Tue, 24 Apr 2018 14:03:23 -0700
> Stefan Beller  wrote:
>
>> v2:
>> I think I have addressed Jonathans feedback
>> * by using a string instead of counting the first character only.
>> * refined tests slightly (easier to read)
>> * moved white space handling for moved blocks into its own flag field,
>>   keeping the enum for the actual mode of move detection.
>
> For reference, v1 is here:
> https://public-inbox.org/git/20180402224854.86922-1-sbel...@google.com/
>
>> Stefan Beller (7):
>>   xdiff/xdiff.h: remove unused flags
>>   xdiff/xdiffi.c: remove unneeded function declarations
>>   diff.c: do not pass diff options as keydata to hashmap
>>   diff.c: adjust hash function signature to match hashmap expectation
>>   diff.c: add a blocks mode for moved code detection
>>   diff.c: decouple white space treatment from move detection algorithm
>>   diff.c: add --color-moved-ignore-space-delta option
>
> I'm not sure if we should add a new "blocks" mode, or if we should
> modify the existing plain mode to have the minimum block length instead.
> I reviewed the code as if we want the new "blocks" mode.

Thanks for the review!

I think keeping plain is useful, see 176841f0c9 (diff.c: color
moved lines differently, plain mode, 2017-06-30)

diff.c: color moved lines differently, plain mode

Add the 'plain' mode for move detection of code. This omits the checking
for adjacent blocks, so it is not as useful. If you have a lot of the
same blocks moved in the same patch, the 'Zebra' would end up slow as it
is O(n^2) (n is number of same blocks). So this may be useful there and
is generally easy to add. Instead be very literal at the move detection,
do not skip over short blocks here.


Although if we do not care about that use case we can just add heuristics to
plain.

As eluded to in Ævars email, we might want to break it up into multiple
options as well?


Re: [PATCHv2 9/9] cache.h: allow oid_object_info to handle arbitrary repositories

2018-04-24 Thread Jonathan Tan
On Tue, 24 Apr 2018 14:59:09 -0700
Stefan Beller  wrote:

> This involves also adapting oid_object_info_extended and a some
> internal functions that are used to implement these. It all has to
> happen in one patch, because of a single recursive chain of calls visits
> all these functions.

I wrote about delta_base_cache in a reply [1] to an earlier version,
which is indeed safe (as discussed), but I think that other reviewers
might have questions about that too so I think it's worth noting that in
the commit message. Maybe write something like:

  Among the functions modified to handle arbitrary repositories,
  unpack_entry() is one of them. Note that it still references the
  globals "delta_base_cache" and "delta_base_cached", but those are safe
  to be referenced (the former is indexed partly by "struct packed_git
  *", which is repo-specific, and the latter is only used to limit the
  size of the former as an optimization).

[1] 
https://public-inbox.org/git/20180424112332.38c0d04d96689f030e968...@google.com/

> sha1_object_info_extended is also used in partial clones, which allow
> fetching missing objects. As this series will not add the repository
> struct to the transport code and fetch_object(), add a TODO note and
> bug out if a user tries to use a partial clone in a repository other than
> the_repository.

s/sha1_object/oid_object/ (in the 2nd paragraph)

Also, you sent 2 versions of PATCHv2 9/9.
  
> @@ -1290,9 +1291,12 @@ int oid_object_info_extended_the_repository(const 
> struct object_id *oid, struct
>   if (fetch_if_missing && repository_format_partial_clone &&
>   !already_retried) {
>   /*
> -  * TODO Investigate haveing fetch_object() return
> +  * TODO Investigate having fetch_object() return
>* TODO error/success and stopping the music here.
> +  * TODO Pass a repository struct through fetch_object.
>*/
> + if (r != the_repository)
> + die(_("partial clones only supported in 
> the_repository"));
>   fetch_object(repository_format_partial_clone, 
> real->hash);
>   already_retried = 1;
>   continue;

This most likely means that a partial clone with a submodule would
wrongly error out here. Instead, the "r == the_repository" check should
be done in the same "if" statement as repository_format_partial_clone
(and no "die"-ing occurs if it fails - just that there will be no
fetching of objects).


Re: [PATCH 5/7] diff.c: add a blocks mode for moved code detection

2018-04-24 Thread Stefan Beller
On Tue, Apr 24, 2018 at 2:50 PM, Jonathan Tan  wrote:
> On Tue, 24 Apr 2018 14:03:28 -0700
> Stefan Beller  wrote:
>
>> Suggested-by: Ævar Arnfjörð Bjarmason 
>>  (https://public-inbox.org/git/87o9j0uljo@evledraar.gmail.com/)
>> Signed-off-by: Stefan Beller 
>
> Firstly, I don't know if this is the right solution- as written
> in the linked e-mail [1], the issue might be more that the config
> conflates 2 unrelated things, not that a certain intersection is
> missing.

The "plain zebra" or as I call them "blocks", has the "heuristic
for a minimum of 20 characters" and "few colors" as its defining
features, which solves that use case.

Stepping back a bit, we have different "building blocks"
at our disposal:
* move detection by line or block
* alternating blocks
* a heuristic to skip over small chunks (20 alnum chars)

These can be combined independently, so would
you expect the user to expect two options for them?
For example "--color-moved=zebra" could be split
into  "--skip-small --alternate-blocks"

Eventually we'll use various colors to inform the user
what these building blocks made of the diff.

Ævar wrote:

> Which is what I mean by the current config conflating two (to me)
> unrelated things. One is how we, via any method, detect what's moved or
> not, and the other is what color/format we use to present this to the
> user.

So instead of building blocks we rather want to split into algorithms
and presentation layer?

The presentation layer would be things like:
* use a different color for moved things
* alternate colors for adjacent blocks
* paint border of a block (dimmed zebra)

The algorithm side would be
* detect moves
* detect moves as blocks
* skip small heuristic

Am I still missing the big picture?

> [1] https://public-inbox.org/git/87muykuij7@evledraar.gmail.com/
>
> Optional: Probably better to put the link inline, instead of in the
> trailer.

ok.

>
>> -test_expect_success 'detect permutations inside moved code -- dimmed_zebra' 
>> '
>> +test_expect_success 'detect blocks of moved code' '
>>   git reset --hard &&
>>   cat <<-\EOF >lines.txt &&
>>   long line 1
>> @@ -1271,6 +1271,52 @@ test_expect_success 'detect permutations inside moved 
>> code -- dimmed_zebra' '
>>   test_config color.diff.newMovedDimmed "normal cyan" &&
>>   test_config color.diff.oldMovedAlternativeDimmed "normal blue" &&
>>   test_config color.diff.newMovedAlternativeDimmed "normal yellow" &&
>
> Add a comment here explaining that these colors do not appear in the
> output, but merely set to recognizable values to ensure that they do not
> appear in the output.

ok.


Re: [PATCHv2 0/7] Moved code detection: ignore space on uniform indentation

2018-04-24 Thread Jonathan Tan
On Tue, 24 Apr 2018 14:03:23 -0700
Stefan Beller  wrote:

> v2:
> I think I have addressed Jonathans feedback
> * by using a string instead of counting the first character only.
> * refined tests slightly (easier to read)
> * moved white space handling for moved blocks into its own flag field,
>   keeping the enum for the actual mode of move detection.

For reference, v1 is here:
https://public-inbox.org/git/20180402224854.86922-1-sbel...@google.com/

> Stefan Beller (7):
>   xdiff/xdiff.h: remove unused flags
>   xdiff/xdiffi.c: remove unneeded function declarations
>   diff.c: do not pass diff options as keydata to hashmap
>   diff.c: adjust hash function signature to match hashmap expectation
>   diff.c: add a blocks mode for moved code detection
>   diff.c: decouple white space treatment from move detection algorithm
>   diff.c: add --color-moved-ignore-space-delta option

I'm not sure if we should add a new "blocks" mode, or if we should
modify the existing plain mode to have the minimum block length instead.
I reviewed the code as if we want the new "blocks" mode.


Re: [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option

2018-04-24 Thread Jonathan Tan
On Tue, 24 Apr 2018 14:03:30 -0700
Stefan Beller  wrote:

> +--color-moved-[no-]ignore-space-prefix-delta::
> + Ignores whitespace when comparing lines when performing the move
> + detection for --color-moved. This ignores uniform differences
> + of white space at the beginning lines in moved blocks.

Setting this option means that moved lines may be indented or dedented,
and if they have been indented or dedented by the same amount, they are
still considered to be the same block. Maybe call this
--color-moved-allow-indentation-change.

> +struct ws_delta {
> + char *string; /* The prefix delta, which is the same in the block */

Probably better described as "the difference between the '-' line and
the '+' line". This may be the empty string if there is no difference.

> + int direction; /* adding or removing the line? */

What is the value when "added" and what when "removed"? Also, it is not
truly "added" or "removed", so a better way might be: 1 if the '-' line
is longer than the '+' line, and 0 otherwise. (And make sure that the
documentation is correct with respect to equal lines.)

> + int missmatch; /* in the remainder */

s/missmatch/mismatch/
Also, what is this used for?

> + if (diffopt->color_moved_ws_handling & COLOR_MOVED_DELTA_WHITESPACES)
> + /*
> +  * As there is not specific white space config given,
> +  * we'd need to check for a new block, so ignore all
> +  * white space. The setup of the white space
> +  * configuration for the next block is done else where
> +  */
> + flags |= XDF_IGNORE_WHITESPACE;
> +
>   return !xdiff_compare_lines(a->es->line, a->es->len,
>   b->es->line, b->es->len,
>   flags);

I wrote in [1]:

  I think we should just prohibit combining this with any of the
  whitespace ignoring flags except for the space-at-eol one. They seem
  to contradict anyway.

To elaborate, adding a specific flag that may interfere with other
user-provided flags sounds like we're unnecessarily adding combinations
that we must keep track of, and that it's both logical (from a user's
point of view) and clearer (as for the code) to just forbid such
combinations.

[1] 
https://public-inbox.org/git/20180402174118.d204ec0d4b9d2fa7ebd77...@google.com/

>   struct moved_entry **pmb = NULL; /* potentially moved blocks */
>   int pmb_nr = 0, pmb_alloc = 0;
> - int n, flipped_block = 1, block_length = 0;
>  
> + struct ws_delta *wsd = NULL; /* white space deltas between pmb */
> + int wsd_alloc = 0;
> +
> + int n, flipped_block = 1, block_length = 0;

It seems like pmb and wsd are parallel arrays - could each wsd be
embedded into the corresponding entry of pmb instead?

> --- a/diff.h
> +++ b/diff.h
> @@ -214,6 +214,8 @@ struct diff_options {
>   } color_moved;
>   #define COLOR_MOVED_DEFAULT COLOR_MOVED_ZEBRA
>   #define COLOR_MOVED_MIN_ALNUM_COUNT 20
> + /* XDF_WHITESPACE_FLAGS regarding block detection are set at 2, 3, 4 */
> + #define COLOR_MOVED_DELTA_WHITESPACES   (1 << 22)
>   int color_moved_ws_handling;
>  };

Setting of DELTA_WHITESPACES should be a separate field, not as part of
ws_handling. It's fine for the ws_handling to be a bitset, since that's
how it's passed to xdiff_compare_lines(), but we don't need to do the
same for DELTA_WHITESPACES.

> +test_expect_success 'compare whitespace delta across moved blocks' '
> +
> + git reset --hard &&
> + q_to_tab <<-\EOF >text.txt &&
> + QIndented
> + QText across
> + Qthree lines
> + QBut! <- this stands out
> + Qthis one
> + QQline did
> + Qnot adjust
> + EOF
> +
> + git add text.txt &&
> + git commit -m "add text.txt" &&
> +
> + q_to_tab <<-\EOF >text.txt &&
> + QQIndented
> + QQText across
> + QQthree lines
> + QQQBut! <- this stands out
> + this one
> + Qline did
> + not adjust
> + EOF
> +
> + git diff --color --color-moved-ignore-space-prefix-delta |
> + grep -v "index" |
> + test_decode_color >actual &&
> +
> + q_to_tab <<-\EOF >expected &&
> + diff --git a/text.txt b/text.txt
> + --- a/text.txt
> + +++ b/text.txt
> + @@ -1,7 +1,7 @@
> + -QIndented
> + -QText across
> + -Qthree lines
> + -QBut! <- this stands out
> + -Qthis one
> + -QQline did
> + -Qnot adjust
> + +QQIndented
> + +QQText across
> + +QQthree lines
> + +QQQBut! <- this stands out
> + +this one
> + +Qline did
> + +not adjust
> + EOF
> +
> + test_cmp expected actual
> +'

I would have expected every line except the "this stands out" line to be
colored differently than the usual RED and GREEN. 

Re: [PATCH 6/7] diff.c: decouple white space treatment from move detection algorithm

2018-04-24 Thread Stefan Beller
Hi Jonathan,

On Tue, Apr 24, 2018 at 3:00 PM, Jonathan Tan  wrote:
> On Tue, 24 Apr 2018 14:03:29 -0700
> Stefan Beller  wrote:
>
>> As we change the default, we'll adjust the tests.
>
> This statement is probably better written as:
>
>   In some existing tests, options like --ignore-space-at-eol were used
>   to control the color of the output. They have been updated to use
>   options like --color-moved-ignore-space-at-eol instead.

I'll adjust that statement; thanks for helping me out with good commit
messages (even the "As we change the defaults, .." was proposed by
you in a previous round)

>
>> + unsigned flags = diffopt->color_moved_ws_handling
>> +  & XDF_WHITESPACE_FLAGS;
>
> No need for "& XDF_WHITESPACE_FLAGS".

This is in anticipation of the next commit, when
color_moved_ws_handling takes more flags.
I can move that over to the last commit.

>
>> + unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;
>
> Same here.

Maybe I'll just state in the commit message why we keep the masking
here.

>
>> @@ -214,6 +214,7 @@ struct diff_options {
>>   } color_moved;
>>   #define COLOR_MOVED_DEFAULT COLOR_MOVED_ZEBRA
>>   #define COLOR_MOVED_MIN_ALNUM_COUNT 20
>> + int color_moved_ws_handling;
>>  };
>
> Should the "int" be "unsigned"?

yes.

> I noticed that the flag-like xdl_opts is
> signed, but I think it's better for flags to be unsigned.

I can change those, too.

> Also, document
> what this stores.

ok, will document.

> (And also, I would limit the bits.)

Not sure I follow. you want to make it e.g.

  unsigned color_moved_ws_handling : 6;

?

Oh, that would actually work, as XDF_WHITESPACE_FLAGS
are in second to fifth bits.

But then we need to document why the XDF_WHITESPACE
need to be at these low positions.

>> + q_to_tab <<-\EOF >text.txt &&
>> + Qa long line to exceed per-line minimum
>> + Qanother long line to exceed per-line minimum
>> + new file

>
> I know I suggested "per-line minimum", but I don't think there is one -
> I think we only have a per-block minimum. Maybe delete "per-line" in
> each of the lines.

yeah, I guess this heuristic could also make for another setting, though
as of now I did not desire any other heuristic than you originally came up
with. Will reword the text. Thanks!

Thanks,
Stefan


Re: [PATCH 6/7] diff.c: decouple white space treatment from move detection algorithm

2018-04-24 Thread Jonathan Tan
On Tue, 24 Apr 2018 14:03:29 -0700
Stefan Beller  wrote:

> As we change the default, we'll adjust the tests.

This statement is probably better written as:

  In some existing tests, options like --ignore-space-at-eol were used
  to control the color of the output. They have been updated to use
  options like --color-moved-ignore-space-at-eol instead.

> + unsigned flags = diffopt->color_moved_ws_handling
> +  & XDF_WHITESPACE_FLAGS;

No need for "& XDF_WHITESPACE_FLAGS".

> + unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;

Same here.

> @@ -214,6 +214,7 @@ struct diff_options {
>   } color_moved;
>   #define COLOR_MOVED_DEFAULT COLOR_MOVED_ZEBRA
>   #define COLOR_MOVED_MIN_ALNUM_COUNT 20
> + int color_moved_ws_handling;
>  };

Should the "int" be "unsigned"? I noticed that the flag-like xdl_opts is
signed, but I think it's better for flags to be unsigned. Also, document
what this stores. (And also, I would limit the bits.)

> +test_expect_success 'only move detection ignores white spaces' '
> + git reset --hard &&
> + q_to_tab <<-\EOF >text.txt &&
> + a long line to exceed per-line minimum
> + another long line to exceed per-line minimum
> + original file
> + EOF
> + git add text.txt &&
> + git commit -m "add text" &&
> + q_to_tab <<-\EOF >text.txt &&
> + Qa long line to exceed per-line minimum
> + Qanother long line to exceed per-line minimum
> + new file
> + EOF
> +
> + # Make sure we get a different diff using -w
> + git diff --color --color-moved -w \
> + --color-moved-no-ignore-all-space \
> + --color-moved-no-ignore-space-change \
> + --color-moved-no-ignore-space-at-eol |
> + grep -v "index" |
> + test_decode_color >actual &&
> + q_to_tab <<-\EOF >expected &&
> + diff --git a/text.txt b/text.txt
> + --- a/text.txt
> + +++ b/text.txt
> + @@ -1,3 +1,3 @@
> +  Qa long line to exceed per-line minimum
> +  Qanother long line to exceed per-line minimum
> + -original file
> + +new file
> + EOF
> + test_cmp expected actual &&
> +
> + # And now ignoring white space only in the move detection
> + git diff --color --color-moved \
> + --color-moved-ignore-all-space \
> + --color-moved-ignore-space-change \
> + --color-moved-ignore-space-at-eol |
> + grep -v "index" |
> + test_decode_color >actual &&
> + q_to_tab <<-\EOF >expected &&
> + diff --git a/text.txt b/text.txt
> + --- a/text.txt
> + +++ b/text.txt
> + @@ -1,3 +1,3 @@
> + -a long line to exceed per-line minimum
> + -another long line to exceed per-line minimum
> + -original file
> + +Qa long line to exceed per-line 
> minimum
> + +Qanother long line to exceed per-line 
> minimum
> + +new file
> + EOF
> + test_cmp expected actual
>  '

I know I suggested "per-line minimum", but I don't think there is one -
I think we only have a per-block minimum. Maybe delete "per-line" in
each of the lines.


[PATCHv2 9/9] cache.h: allow sha1_object_info to handle arbitrary repositories

2018-04-24 Thread Stefan Beller
This involves also adapting sha1_object_info_extended and a some
internal functions that are used to implement these. It all has to
happen in one patch, because of a single recursive chain of calls visits
all these functions.

sha1_object_info_extended is also used in partial clones, which allow
fetching missing objects. As this series will not add the repository
struct to the transport code and fetch_object(), add a TODO note and
bug out if a user tries to use a partial clone in a repository other than
the_repository.

Helped-by: Brandon Williams 
Helped-by: Jonathan Tan 
Signed-off-by: Jonathan Nieder 
Signed-off-by: Stefan Beller 
---
 cache.h |  9 -
 packfile.c  | 58 ++---
 packfile.h  |  8 
 sha1_file.c | 30 ---
 4 files changed, 54 insertions(+), 51 deletions(-)

diff --git a/cache.h b/cache.h
index 6340b2c572..3a4d80e92b 100644
--- a/cache.h
+++ b/cache.h
@@ -1192,8 +1192,7 @@ static inline void *read_object_file(const struct 
object_id *oid, enum object_ty
 }
 
 /* Read and unpack an object file into memory, write memory to an object file 
*/
-#define oid_object_info(r, o, f) oid_object_info_##r(o, f)
-int oid_object_info_the_repository(const struct object_id *, unsigned long *);
+int oid_object_info(struct repository *r, const struct object_id *, unsigned 
long *);
 
 extern int hash_object_file(const void *buf, unsigned long len,
const char *type, struct object_id *oid);
@@ -1675,9 +1674,9 @@ struct object_info {
 /* Do not check loose object */
 #define OBJECT_INFO_IGNORE_LOOSE 16
 
-#define oid_object_info_extended(r, oid, oi, flags) \
-   oid_object_info_extended_##r(oid, oi, flags)
-int oid_object_info_extended_the_repository(const struct object_id *, struct 
object_info *, unsigned flags);
+int oid_object_info_extended(struct repository *r,
+const struct object_id *,
+struct object_info *, unsigned flags);
 
 /*
  * Set this to 0 to prevent sha1_object_info_extended() from fetching missing
diff --git a/packfile.c b/packfile.c
index 8de87f904b..55d383ed0a 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1104,9 +1104,9 @@ static const unsigned char *get_delta_base_sha1(struct 
packed_git *p,
return NULL;
 }
 
-#define retry_bad_packed_offset(r, p, o) \
-   retry_bad_packed_offset_##r(p, o)
-static int retry_bad_packed_offset_the_repository(struct packed_git *p, off_t 
obj_offset)
+static int retry_bad_packed_offset(struct repository *r,
+  struct packed_git *p,
+  off_t obj_offset)
 {
int type;
struct revindex_entry *revidx;
@@ -1116,7 +1116,7 @@ static int retry_bad_packed_offset_the_repository(struct 
packed_git *p, off_t ob
return OBJ_BAD;
nth_packed_object_oid(, p, revidx->nr);
mark_bad_packed_object(p, oid.hash);
-   type = oid_object_info(the_repository, , NULL);
+   type = oid_object_info(r, , NULL);
if (type <= OBJ_NONE)
return OBJ_BAD;
return type;
@@ -1124,13 +1124,12 @@ static int 
retry_bad_packed_offset_the_repository(struct packed_git *p, off_t ob
 
 #define POI_STACK_PREALLOC 64
 
-#define packed_to_object_type(r, p, o, t, w, c) \
-   packed_to_object_type_##r(p, o, t, w, c)
-static enum object_type packed_to_object_type_the_repository(struct packed_git 
*p,
-off_t obj_offset,
-enum object_type 
type,
-struct pack_window 
**w_curs,
-off_t curpos)
+static enum object_type packed_to_object_type(struct repository *r,
+ struct packed_git *p,
+ off_t obj_offset,
+ enum object_type type,
+ struct pack_window **w_curs,
+ off_t curpos)
 {
off_t small_poi_stack[POI_STACK_PREALLOC];
off_t *poi_stack = small_poi_stack;
@@ -1157,7 +1156,7 @@ static enum object_type 
packed_to_object_type_the_repository(struct packed_git *
if (type <= OBJ_NONE) {
/* If getting the base itself fails, we first
 * retry the base, otherwise unwind */
-   type = retry_bad_packed_offset(the_repository, p, 
base_offset);
+   type = retry_bad_packed_offset(r, p, base_offset);
if (type > OBJ_NONE)
goto out;
goto unwind;
@@ -1185,7 +1184,7 

[PATCHv2 7/9] packfile: add repository argument to unpack_entry

2018-04-24 Thread Stefan Beller
Add a repository argument to allow the callers of unpack_entry
to be more specific about which repository to act on. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Jonathan Nieder 
Signed-off-by: Stefan Beller 
---
 fast-import.c | 2 +-
 pack-check.c  | 3 ++-
 packfile.c| 7 ---
 packfile.h| 3 ++-
 4 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index afe06bd7c1..b009353e93 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -1376,7 +1376,7 @@ static void *gfi_unpack_entry(
 */
p->pack_size = pack_size + the_hash_algo->rawsz;
}
-   return unpack_entry(p, oe->idx.offset, , sizep);
+   return unpack_entry(the_repository, p, oe->idx.offset, , sizep);
 }
 
 static const char *get_mode(const char *str, uint16_t *modep)
diff --git a/pack-check.c b/pack-check.c
index 385d964bdd..d3a57df34f 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -1,4 +1,5 @@
 #include "cache.h"
+#include "repository.h"
 #include "pack.h"
 #include "pack-revindex.h"
 #include "progress.h"
@@ -134,7 +135,7 @@ static int verify_packfile(struct packed_git *p,
data = NULL;
data_valid = 0;
} else {
-   data = unpack_entry(p, entries[i].offset, , );
+   data = unpack_entry(the_repository, p, 
entries[i].offset, , );
data_valid = 1;
}
 
diff --git a/packfile.c b/packfile.c
index 2876e04bb1..d5ac48ef18 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1279,7 +1279,7 @@ static void *cache_or_unpack_entry(struct packed_git *p, 
off_t base_offset,
 
ent = get_delta_base_cache_entry(p, base_offset);
if (!ent)
-   return unpack_entry(p, base_offset, type, base_size);
+   return unpack_entry(the_repository, p, base_offset, type, 
base_size);
 
if (type)
*type = ent->type;
@@ -1485,8 +1485,9 @@ static void *read_object_the_repository(const struct 
object_id *oid,
return content;
 }
 
-void *unpack_entry(struct packed_git *p, off_t obj_offset,
-  enum object_type *final_type, unsigned long *final_size)
+void *unpack_entry_the_repository(struct packed_git *p, off_t obj_offset,
+ enum object_type *final_type,
+ unsigned long *final_size)
 {
struct pack_window *w_curs = NULL;
off_t curpos = obj_offset;
diff --git a/packfile.h b/packfile.h
index bc8d840b1b..1efa57a90e 100644
--- a/packfile.h
+++ b/packfile.h
@@ -115,7 +115,8 @@ extern off_t nth_packed_object_offset(const struct 
packed_git *, uint32_t n);
 extern off_t find_pack_entry_one(const unsigned char *sha1, struct packed_git 
*);
 
 extern int is_pack_valid(struct packed_git *);
-extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, 
unsigned long *);
+#define unpack_entry(r, p, of, ot, s) unpack_entry_##r(p, of, ot, s)
+extern void *unpack_entry_the_repository(struct packed_git *, off_t, enum 
object_type *, unsigned long *);
 extern unsigned long unpack_object_header_buffer(const unsigned char *buf, 
unsigned long len, enum object_type *type, unsigned long *sizep);
 extern unsigned long get_size_from_delta(struct packed_git *, struct 
pack_window **, off_t);
 extern int unpack_object_header(struct packed_git *, struct pack_window **, 
off_t *, unsigned long *);
-- 
2.17.0.441.gb46fe60e1d-goog



[PATCHv2 6/9] packfile: add repository argument to read_object

2018-04-24 Thread Stefan Beller
Add a repository argument to allow the callers of read_object
to be more specific about which repository to act on. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Stefan Beller 
---
 packfile.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/packfile.c b/packfile.c
index 5fa7d27d3b..2876e04bb1 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1469,8 +1469,10 @@ struct unpack_entry_stack_ent {
unsigned long size;
 };
 
-static void *read_object(const struct object_id *oid, enum object_type *type,
-unsigned long *size)
+#define read_object(r, o, t, s) read_object_##r(o, t, s)
+static void *read_object_the_repository(const struct object_id *oid,
+   enum object_type *type,
+   unsigned long *size)
 {
struct object_info oi = OBJECT_INFO_INIT;
void *content;
@@ -1614,7 +1616,7 @@ void *unpack_entry(struct packed_git *p, off_t obj_offset,
  oid_to_hex(_oid), 
(uintmax_t)obj_offset,
  p->pack_name);
mark_bad_packed_object(p, base_oid.hash);
-   base = read_object(_oid, , 
_size);
+   base = read_object(the_repository, _oid, 
, _size);
external_base = base;
}
}
-- 
2.17.0.441.gb46fe60e1d-goog



[PATCHv2 9/9] cache.h: allow oid_object_info to handle arbitrary repositories

2018-04-24 Thread Stefan Beller
This involves also adapting oid_object_info_extended and a some
internal functions that are used to implement these. It all has to
happen in one patch, because of a single recursive chain of calls visits
all these functions.

sha1_object_info_extended is also used in partial clones, which allow
fetching missing objects. As this series will not add the repository
struct to the transport code and fetch_object(), add a TODO note and
bug out if a user tries to use a partial clone in a repository other than
the_repository.

Helped-by: Brandon Williams 
Helped-by: Jonathan Tan 
Signed-off-by: Jonathan Nieder 
Signed-off-by: Stefan Beller 
---
 cache.h |  9 -
 packfile.c  | 58 ++---
 packfile.h  |  8 
 sha1_file.c | 30 ---
 4 files changed, 54 insertions(+), 51 deletions(-)

diff --git a/cache.h b/cache.h
index 6340b2c572..3a4d80e92b 100644
--- a/cache.h
+++ b/cache.h
@@ -1192,8 +1192,7 @@ static inline void *read_object_file(const struct 
object_id *oid, enum object_ty
 }
 
 /* Read and unpack an object file into memory, write memory to an object file 
*/
-#define oid_object_info(r, o, f) oid_object_info_##r(o, f)
-int oid_object_info_the_repository(const struct object_id *, unsigned long *);
+int oid_object_info(struct repository *r, const struct object_id *, unsigned 
long *);
 
 extern int hash_object_file(const void *buf, unsigned long len,
const char *type, struct object_id *oid);
@@ -1675,9 +1674,9 @@ struct object_info {
 /* Do not check loose object */
 #define OBJECT_INFO_IGNORE_LOOSE 16
 
-#define oid_object_info_extended(r, oid, oi, flags) \
-   oid_object_info_extended_##r(oid, oi, flags)
-int oid_object_info_extended_the_repository(const struct object_id *, struct 
object_info *, unsigned flags);
+int oid_object_info_extended(struct repository *r,
+const struct object_id *,
+struct object_info *, unsigned flags);
 
 /*
  * Set this to 0 to prevent sha1_object_info_extended() from fetching missing
diff --git a/packfile.c b/packfile.c
index 8de87f904b..55d383ed0a 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1104,9 +1104,9 @@ static const unsigned char *get_delta_base_sha1(struct 
packed_git *p,
return NULL;
 }
 
-#define retry_bad_packed_offset(r, p, o) \
-   retry_bad_packed_offset_##r(p, o)
-static int retry_bad_packed_offset_the_repository(struct packed_git *p, off_t 
obj_offset)
+static int retry_bad_packed_offset(struct repository *r,
+  struct packed_git *p,
+  off_t obj_offset)
 {
int type;
struct revindex_entry *revidx;
@@ -1116,7 +1116,7 @@ static int retry_bad_packed_offset_the_repository(struct 
packed_git *p, off_t ob
return OBJ_BAD;
nth_packed_object_oid(, p, revidx->nr);
mark_bad_packed_object(p, oid.hash);
-   type = oid_object_info(the_repository, , NULL);
+   type = oid_object_info(r, , NULL);
if (type <= OBJ_NONE)
return OBJ_BAD;
return type;
@@ -1124,13 +1124,12 @@ static int 
retry_bad_packed_offset_the_repository(struct packed_git *p, off_t ob
 
 #define POI_STACK_PREALLOC 64
 
-#define packed_to_object_type(r, p, o, t, w, c) \
-   packed_to_object_type_##r(p, o, t, w, c)
-static enum object_type packed_to_object_type_the_repository(struct packed_git 
*p,
-off_t obj_offset,
-enum object_type 
type,
-struct pack_window 
**w_curs,
-off_t curpos)
+static enum object_type packed_to_object_type(struct repository *r,
+ struct packed_git *p,
+ off_t obj_offset,
+ enum object_type type,
+ struct pack_window **w_curs,
+ off_t curpos)
 {
off_t small_poi_stack[POI_STACK_PREALLOC];
off_t *poi_stack = small_poi_stack;
@@ -1157,7 +1156,7 @@ static enum object_type 
packed_to_object_type_the_repository(struct packed_git *
if (type <= OBJ_NONE) {
/* If getting the base itself fails, we first
 * retry the base, otherwise unwind */
-   type = retry_bad_packed_offset(the_repository, p, 
base_offset);
+   type = retry_bad_packed_offset(r, p, base_offset);
if (type > OBJ_NONE)
goto out;
goto unwind;
@@ -1185,7 +1184,7 @@ 

[PATCHv2 8/9] packfile: add repository argument to cache_or_unpack_entry

2018-04-24 Thread Stefan Beller
Add a repository argument to allow the callers of cache_or_unpack_entry
to be more specific about which repository to act on. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Jonathan Nieder 
Signed-off-by: Stefan Beller 
---
 packfile.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/packfile.c b/packfile.c
index d5ac48ef18..8de87f904b 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1272,7 +1272,8 @@ static void detach_delta_base_cache_entry(struct 
delta_base_cache_entry *ent)
free(ent);
 }
 
-static void *cache_or_unpack_entry(struct packed_git *p, off_t base_offset,
+#define cache_or_unpack_entry(r, p, bo, bs, t) cache_or_unpack_entry_##r(p, 
bo, bs, t)
+static void *cache_or_unpack_entry_the_repository(struct packed_git *p, off_t 
base_offset,
unsigned long *base_size, enum object_type *type)
 {
struct delta_base_cache_entry *ent;
@@ -1346,7 +1347,7 @@ int packed_object_info_the_repository(struct packed_git 
*p, off_t obj_offset,
 * a "real" type later if the caller is interested.
 */
if (oi->contentp) {
-   *oi->contentp = cache_or_unpack_entry(p, obj_offset, oi->sizep,
+   *oi->contentp = cache_or_unpack_entry(the_repository, p, 
obj_offset, oi->sizep,
  );
if (!*oi->contentp)
type = OBJ_BAD;
-- 
2.17.0.441.gb46fe60e1d-goog



[PATCHv2 5/9] packfile: add repository argument to packed_object_info

2018-04-24 Thread Stefan Beller
From: Jonathan Nieder 

Add a repository argument to allow callers of packed_object_info to be
more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Jonathan Nieder 
Signed-off-by: Stefan Beller 
---
 builtin/pack-objects.c | 3 ++-
 packfile.c | 4 ++--
 packfile.h | 3 ++-
 sha1_file.c| 2 +-
 4 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 8d4111f748..d65eb4a947 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1572,7 +1572,8 @@ static void drop_reused_delta(struct object_entry *entry)
 
oi.sizep = >size;
oi.typep = >type;
-   if (packed_object_info(entry->in_pack, entry->in_pack_offset, ) < 0) 
{
+   if (packed_object_info(the_repository, entry->in_pack,
+  entry->in_pack_offset, ) < 0) {
/*
 * We failed to get the info from this pack for some reason;
 * fall back to sha1_object_info, which may find another copy.
diff --git a/packfile.c b/packfile.c
index 3ecfba66af..5fa7d27d3b 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1333,8 +1333,8 @@ static void add_delta_base_cache(struct packed_git *p, 
off_t base_offset,
hashmap_add(_base_cache, ent);
 }
 
-int packed_object_info(struct packed_git *p, off_t obj_offset,
-  struct object_info *oi)
+int packed_object_info_the_repository(struct packed_git *p, off_t obj_offset,
+ struct object_info *oi)
 {
struct pack_window *w_curs = NULL;
unsigned long size;
diff --git a/packfile.h b/packfile.h
index a92c0b241c..bc8d840b1b 100644
--- a/packfile.h
+++ b/packfile.h
@@ -125,7 +125,8 @@ extern void release_pack_memory(size_t);
 /* global flag to enable extra checks when accessing packed objects */
 extern int do_check_packed_object_crc;
 
-extern int packed_object_info(struct packed_git *pack, off_t offset, struct 
object_info *);
+#define packed_object_info(r, p, o, oi) packed_object_info_##r(p, o, oi)
+extern int packed_object_info_the_repository(struct packed_git *pack, off_t 
offset, struct object_info *);
 
 extern void mark_bad_packed_object(struct packed_git *p, const unsigned char 
*sha1);
 extern const struct packed_git *has_packed_and_bad(const unsigned char *sha1);
diff --git a/sha1_file.c b/sha1_file.c
index 93f25c6c6a..746ff8297a 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1307,7 +1307,7 @@ int oid_object_info_extended_the_repository(const struct 
object_id *oid, struct
 * information below, so return early.
 */
return 0;
-   rtype = packed_object_info(e.p, e.offset, oi);
+   rtype = packed_object_info(the_repository, e.p, e.offset, oi);
if (rtype < 0) {
mark_bad_packed_object(e.p, real->hash);
return oid_object_info_extended(the_repository, real, oi, 0);
-- 
2.17.0.441.gb46fe60e1d-goog



[PATCHv2 2/9] cache.h: add repository argument to oid_object_info

2018-04-24 Thread Stefan Beller
Add a repository argument to allow the callers of oid_object_info
to be more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Stefan Beller 
---
 archive-tar.c|  2 +-
 archive-zip.c|  3 ++-
 blame.c  |  4 ++--
 builtin/blame.c  |  2 +-
 builtin/cat-file.c   |  6 +++---
 builtin/describe.c   |  2 +-
 builtin/fast-export.c|  2 +-
 builtin/fetch.c  |  2 +-
 builtin/fsck.c   |  3 ++-
 builtin/index-pack.c |  4 ++--
 builtin/ls-tree.c|  2 +-
 builtin/mktree.c |  2 +-
 builtin/pack-objects.c   |  8 +---
 builtin/prune.c  |  3 ++-
 builtin/replace.c| 11 ++-
 builtin/tag.c|  4 ++--
 builtin/unpack-objects.c |  2 +-
 cache.h  |  3 ++-
 diff.c   |  3 ++-
 fast-import.c| 14 +-
 list-objects-filter.c|  2 +-
 object.c |  2 +-
 pack-bitmap-write.c  |  3 ++-
 packfile.c   |  2 +-
 reachable.c  |  2 +-
 refs.c   |  2 +-
 remote.c |  2 +-
 sequencer.c  |  3 ++-
 sha1_file.c  |  4 ++--
 sha1_name.c  | 12 ++--
 submodule.c  |  2 +-
 tag.c|  2 +-
 32 files changed, 67 insertions(+), 53 deletions(-)

diff --git a/archive-tar.c b/archive-tar.c
index 3563bcb9f2..f93409324f 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -276,7 +276,7 @@ static int write_tar_entry(struct archiver_args *args,
memcpy(header.name, path, pathlen);
 
if (S_ISREG(mode) && !args->convert &&
-   oid_object_info(oid, ) == OBJ_BLOB &&
+   oid_object_info(the_repository, oid, ) == OBJ_BLOB &&
size > big_file_threshold)
buffer = NULL;
else if (S_ISLNK(mode) || S_ISREG(mode)) {
diff --git a/archive-zip.c b/archive-zip.c
index 6b20bce4d1..74f3fe9103 100644
--- a/archive-zip.c
+++ b/archive-zip.c
@@ -325,7 +325,8 @@ static int write_zip_entry(struct archiver_args *args,
compressed_size = 0;
buffer = NULL;
} else if (S_ISREG(mode) || S_ISLNK(mode)) {
-   enum object_type type = oid_object_info(oid, );
+   enum object_type type = oid_object_info(the_repository, oid,
+   );
 
method = 0;
attr2 = S_ISLNK(mode) ? ((mode | 0777) << 16) :
diff --git a/blame.c b/blame.c
index 78c9808bd1..dfa24473dc 100644
--- a/blame.c
+++ b/blame.c
@@ -81,7 +81,7 @@ static void verify_working_tree_path(struct commit 
*work_tree, const char *path)
unsigned mode;
 
if (!get_tree_entry(commit_oid, path, _oid, ) &&
-   oid_object_info(_oid, NULL) == OBJ_BLOB)
+   oid_object_info(the_repository, _oid, NULL) == 
OBJ_BLOB)
return;
}
 
@@ -504,7 +504,7 @@ static int fill_blob_sha1_and_mode(struct blame_origin 
*origin)
return 0;
if (get_tree_entry(>commit->object.oid, origin->path, 
>blob_oid, >mode))
goto error_out;
-   if (oid_object_info(>blob_oid, NULL) != OBJ_BLOB)
+   if (oid_object_info(the_repository, >blob_oid, NULL) != 
OBJ_BLOB)
goto error_out;
return 0;
  error_out:
diff --git a/builtin/blame.c b/builtin/blame.c
index db38c0b307..bfdf7cc132 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -655,7 +655,7 @@ static int is_a_rev(const char *name)
 
if (get_oid(name, ))
return 0;
-   return OBJ_NONE < oid_object_info(, NULL);
+   return OBJ_NONE < oid_object_info(the_repository, , NULL);
 }
 
 int cmd_blame(int argc, const char **argv, const char *prefix)
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 4ecdb9ff54..b8ecbea98e 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -116,7 +116,7 @@ static int cat_one_file(int opt, const char *exp_type, 
const char *obj_name,
/* else fallthrough */
 
case 'p':
-   type = oid_object_info(, NULL);
+   type = oid_object_info(the_repository, , NULL);
if (type < 0)
die("Not a valid object name %s", obj_name);
 
@@ -140,7 +140,7 @@ static int cat_one_file(int opt, const char *exp_type, 
const char *obj_name,
case 0:
if (type_from_string(exp_type) == OBJ_BLOB) {
struct object_id blob_oid;
-   if (oid_object_info(, NULL) == OBJ_TAG) {
+   if (oid_object_info(the_repository, , NULL) == 
OBJ_TAG) {

[PATCHv2 4/9] packfile: add repository argument to packed_to_object_type

2018-04-24 Thread Stefan Beller
Add a repository argument to allow the callers of packed_to_object_type
to be more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Jonathan Nieder 
Signed-off-by: Stefan Beller 
---
 packfile.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/packfile.c b/packfile.c
index d2b3f3503b..3ecfba66af 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1124,11 +1124,13 @@ static int 
retry_bad_packed_offset_the_repository(struct packed_git *p, off_t ob
 
 #define POI_STACK_PREALLOC 64
 
-static enum object_type packed_to_object_type(struct packed_git *p,
- off_t obj_offset,
- enum object_type type,
- struct pack_window **w_curs,
- off_t curpos)
+#define packed_to_object_type(r, p, o, t, w, c) \
+   packed_to_object_type_##r(p, o, t, w, c)
+static enum object_type packed_to_object_type_the_repository(struct packed_git 
*p,
+off_t obj_offset,
+enum object_type 
type,
+struct pack_window 
**w_curs,
+off_t curpos)
 {
off_t small_poi_stack[POI_STACK_PREALLOC];
off_t *poi_stack = small_poi_stack;
@@ -1378,8 +1380,8 @@ int packed_object_info(struct packed_git *p, off_t 
obj_offset,
 
if (oi->typep || oi->type_name) {
enum object_type ptot;
-   ptot = packed_to_object_type(p, obj_offset, type, _curs,
-curpos);
+   ptot = packed_to_object_type(the_repository, p, obj_offset,
+type, _curs, curpos);
if (oi->typep)
*oi->typep = ptot;
if (oi->type_name) {
-- 
2.17.0.441.gb46fe60e1d-goog



[PATCHv2 0/9] object store: oid_object_info is the next contender

2018-04-24 Thread Stefan Beller
v2:

* fixed the sha1/oid typo
* removed spurious new line
* Brandon and Jonthan discovered another dependency that I missed due
  to cherrypicking that commit from a tree before partial clone was a thing.
  We error out when attempting to use fetch_object for repos that are not
  the_repository.
  
Thanks,
Stefan

v1:
This applies on top of origin/sb/object-store-replace and is available as
https://github.com/stefanbeller/git/tree/oid_object_info

This continues the work of sb/packfiles-in-repository,
extending the layer at which we have to pass in an explicit
repository object to oid_object_info.

A test merge to next shows only a minor merge conflicit (adding
different #include lines in one c file), so this might be a good next
step for the object store series.

Notes on further object store series:
I plan on converting the "parsed object store" next,
which would be {alloc, object, tree, commit, tag}.c as that is a prerequisite
for migrating shallow (which is intermingled with grafts) information to the
object store.

There is currently work going on in allocation (mempool - Jameson Miller)
and grafts (deprecate grafts - DScho), which is why I am sending this
series first. I think it can go in parallel to the "parsed object store"
that is coming next.

Thanks,
Stefan

Jonathan Nieder (1):
  packfile: add repository argument to packed_object_info

Stefan Beller (8):
  cache.h: add repository argument to oid_object_info_extended
  cache.h: add repository argument to oid_object_info
  packfile: add repository argument to retry_bad_packed_offset
  packfile: add repository argument to packed_to_object_type
  packfile: add repository argument to read_object
  packfile: add repository argument to unpack_entry
  packfile: add repository argument to cache_or_unpack_entry
  cache.h: allow oid_object_info to handle arbitrary repositories

 archive-tar.c|  2 +-
 archive-zip.c|  3 ++-
 blame.c  |  4 ++--
 builtin/blame.c  |  2 +-
 builtin/cat-file.c   | 12 ++--
 builtin/describe.c   |  2 +-
 builtin/fast-export.c|  2 +-
 builtin/fetch.c  |  2 +-
 builtin/fsck.c   |  3 ++-
 builtin/index-pack.c |  4 ++--
 builtin/ls-tree.c|  2 +-
 builtin/mktree.c |  2 +-
 builtin/pack-objects.c   | 11 +++
 builtin/prune.c  |  3 ++-
 builtin/replace.c| 11 ++-
 builtin/tag.c|  4 ++--
 builtin/unpack-objects.c |  2 +-
 cache.h  |  7 +--
 diff.c   |  3 ++-
 fast-import.c| 16 ++--
 list-objects-filter.c|  2 +-
 object.c |  2 +-
 pack-bitmap-write.c  |  3 ++-
 pack-check.c |  3 ++-
 packfile.c   | 40 +++-
 packfile.h   |  6 --
 reachable.c  |  2 +-
 refs.c   |  2 +-
 remote.c |  2 +-
 sequencer.c  |  3 ++-
 sha1_file.c  | 36 +---
 sha1_name.c  | 12 ++--
 streaming.c  |  2 +-
 submodule.c  |  2 +-
 tag.c|  2 +-
 35 files changed, 124 insertions(+), 92 deletions(-)

-- 
2.17.0.441.gb46fe60e1d-goog



[PATCHv2 3/9] packfile: add repository argument to retry_bad_packed_offset

2018-04-24 Thread Stefan Beller
Add a repository argument to allow the callers of retry_bad_packed_offset
to be more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Jonathan Nieder 
Signed-off-by: Stefan Beller 
---
 packfile.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/packfile.c b/packfile.c
index 80c7fa734f..d2b3f3503b 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1104,7 +1104,9 @@ static const unsigned char *get_delta_base_sha1(struct 
packed_git *p,
return NULL;
 }
 
-static int retry_bad_packed_offset(struct packed_git *p, off_t obj_offset)
+#define retry_bad_packed_offset(r, p, o) \
+   retry_bad_packed_offset_##r(p, o)
+static int retry_bad_packed_offset_the_repository(struct packed_git *p, off_t 
obj_offset)
 {
int type;
struct revindex_entry *revidx;
@@ -1153,7 +1155,7 @@ static enum object_type packed_to_object_type(struct 
packed_git *p,
if (type <= OBJ_NONE) {
/* If getting the base itself fails, we first
 * retry the base, otherwise unwind */
-   type = retry_bad_packed_offset(p, base_offset);
+   type = retry_bad_packed_offset(the_repository, p, 
base_offset);
if (type > OBJ_NONE)
goto out;
goto unwind;
@@ -1181,7 +1183,7 @@ static enum object_type packed_to_object_type(struct 
packed_git *p,
 unwind:
while (poi_stack_nr) {
obj_offset = poi_stack[--poi_stack_nr];
-   type = retry_bad_packed_offset(p, obj_offset);
+   type = retry_bad_packed_offset(the_repository, p, obj_offset);
if (type > OBJ_NONE)
goto out;
}
-- 
2.17.0.441.gb46fe60e1d-goog



[PATCHv2 1/9] cache.h: add repository argument to oid_object_info_extended

2018-04-24 Thread Stefan Beller
Add a repository argument to allow oid_object_info_extended callers
to be more specific about which repository to act on. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

Signed-off-by: Stefan Beller 
---
 builtin/cat-file.c |  6 +++---
 cache.h|  5 -
 packfile.c |  2 +-
 sha1_file.c| 10 +-
 streaming.c|  2 +-
 5 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 2c46d257cd..4ecdb9ff54 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -77,7 +77,7 @@ static int cat_one_file(int opt, const char *exp_type, const 
char *obj_name,
switch (opt) {
case 't':
oi.type_name = 
-   if (oid_object_info_extended(, , flags) < 0)
+   if (oid_object_info_extended(the_repository, , , flags) 
< 0)
die("git cat-file: could not get object info");
if (sb.len) {
printf("%s\n", sb.buf);
@@ -88,7 +88,7 @@ static int cat_one_file(int opt, const char *exp_type, const 
char *obj_name,
 
case 's':
oi.sizep = 
-   if (oid_object_info_extended(, , flags) < 0)
+   if (oid_object_info_extended(the_repository, , , flags) 
< 0)
die("git cat-file: could not get object info");
printf("%lu\n", size);
return 0;
@@ -342,7 +342,7 @@ static void batch_object_write(const char *obj_name, struct 
batch_options *opt,
struct strbuf buf = STRBUF_INIT;
 
if (!data->skip_object_info &&
-   oid_object_info_extended(>oid, >info,
+   oid_object_info_extended(the_repository, >oid, >info,
 OBJECT_INFO_LOOKUP_REPLACE) < 0) {
printf("%s missing\n",
   obj_name ? obj_name : oid_to_hex(>oid));
diff --git a/cache.h b/cache.h
index 027bd7ffc8..588c4fff9a 100644
--- a/cache.h
+++ b/cache.h
@@ -1673,7 +1673,10 @@ struct object_info {
 #define OBJECT_INFO_QUICK 8
 /* Do not check loose object */
 #define OBJECT_INFO_IGNORE_LOOSE 16
-extern int oid_object_info_extended(const struct object_id *, struct 
object_info *, unsigned flags);
+
+#define oid_object_info_extended(r, oid, oi, flags) \
+   oid_object_info_extended_##r(oid, oi, flags)
+int oid_object_info_extended_the_repository(const struct object_id *, struct 
object_info *, unsigned flags);
 
 /*
  * Set this to 0 to prevent sha1_object_info_extended() from fetching missing
diff --git a/packfile.c b/packfile.c
index 0bc67d0e00..d9914ba723 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1474,7 +1474,7 @@ static void *read_object(const struct object_id *oid, 
enum object_type *type,
oi.sizep = size;
oi.contentp = 
 
-   if (oid_object_info_extended(oid, , 0) < 0)
+   if (oid_object_info_extended(the_repository, oid, , 0) < 0)
return NULL;
return content;
 }
diff --git a/sha1_file.c b/sha1_file.c
index 64a5bd7d87..50a2dc5f0a 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1231,7 +1231,7 @@ static int sha1_loose_object_info(struct repository *r,
 
 int fetch_if_missing = 1;
 
-int oid_object_info_extended(const struct object_id *oid, struct object_info 
*oi, unsigned flags)
+int oid_object_info_extended_the_repository(const struct object_id *oid, 
struct object_info *oi, unsigned flags)
 {
static struct object_info blank_oi = OBJECT_INFO_INIT;
struct pack_entry e;
@@ -1310,7 +1310,7 @@ int oid_object_info_extended(const struct object_id *oid, 
struct object_info *oi
rtype = packed_object_info(e.p, e.offset, oi);
if (rtype < 0) {
mark_bad_packed_object(e.p, real->hash);
-   return oid_object_info_extended(real, oi, 0);
+   return oid_object_info_extended(the_repository, real, oi, 0);
} else if (oi->whence == OI_PACKED) {
oi->u.packed.offset = e.offset;
oi->u.packed.pack = e.p;
@@ -1329,7 +1329,7 @@ int oid_object_info(const struct object_id *oid, unsigned 
long *sizep)
 
oi.typep = 
oi.sizep = sizep;
-   if (oid_object_info_extended(oid, ,
+   if (oid_object_info_extended(the_repository, oid, ,
 OBJECT_INFO_LOOKUP_REPLACE) < 0)
return -1;
return type;
@@ -1347,7 +1347,7 @@ static void *read_object(const unsigned char *sha1, enum 
object_type *type,
 
hashcpy(oid.hash, sha1);
 
-   if (oid_object_info_extended(, , 0) < 0)
+   if (oid_object_info_extended(the_repository, , , 0) < 0)
return NULL;
return content;
 }
@@ -1745,7 +1745,7 @@ int has_sha1_file_with_flags(const unsigned char *sha1, 
int flags)
if (!startup_info->have_repository)
return 0;
hashcpy(oid.hash, sha1);
-  

[PATCH v9 3/4] worktree: factor out dwim_branch function

2018-04-24 Thread Thomas Gummerer
Factor out a dwim_branch function, which takes care of the dwim'ery in
'git worktree add '.  It's not too much code currently, but we're
adding a new kind of dwim in a subsequent patch, at which point it makes
more sense to have it as a separate function.

Factor it out now to reduce the patch noise in the next patch.

Signed-off-by: Thomas Gummerer 
---
 builtin/worktree.c | 29 ++---
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/builtin/worktree.c b/builtin/worktree.c
index 39bf1ea865..6bd32b6090 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -387,6 +387,21 @@ static void print_preparing_worktree_line(int detach,
}
 }
 
+static const char *dwim_branch(const char *path, const char **new_branch)
+{
+   int n;
+   const char *s = worktree_basename(path, );
+   *new_branch = xstrndup(s, n);
+   UNLEAK(*new_branch);
+   if (guess_remote) {
+   struct object_id oid;
+   const char *remote =
+   unique_tracking_name(*new_branch, );
+   return remote;
+   }
+   return NULL;
+}
+
 static int add(int ac, const char **av, const char *prefix)
 {
struct add_opts opts;
@@ -439,17 +454,9 @@ static int add(int ac, const char **av, const char *prefix)
}
 
if (ac < 2 && !new_branch && !opts.detach) {
-   int n;
-   const char *s = worktree_basename(path, );
-   new_branch = xstrndup(s, n);
-   UNLEAK(new_branch);
-   if (guess_remote) {
-   struct object_id oid;
-   const char *remote =
-   unique_tracking_name(new_branch, );
-   if (remote)
-   branch = remote;
-   }
+   const char *s = dwim_branch(path, _branch);
+   if (s)
+   branch = s;
}
 
if (ac == 2 && !new_branch && !opts.detach) {
-- 
2.16.1.74.g7afd1c25cc.dirty



[PATCH v9 4/4] worktree: teach "add" to check out existing branches

2018-04-24 Thread Thomas Gummerer
Currently 'git worktree add ' creates a new branch named after the
basename of the path by default.  If a branch with that name already
exists, the command refuses to do anything, unless the '--force' option
is given.

However we can do a little better than that, and check the branch out if
it is not checked out anywhere else.  This will help users who just want
to check an existing branch out into a new worktree, and save a few
keystrokes.

As the current behaviour is to simply 'die()' when a branch with the name
of the basename of the path already exists, there are no backwards
compatibility worries here.

We will still 'die()' if the branch is checked out in another worktree,
unless the --force flag is passed.

Helped-by: Eric Sunshine 
Signed-off-by: Thomas Gummerer 
---
 Documentation/git-worktree.txt |  9 +++--
 builtin/worktree.c | 13 +++--
 t/t2025-worktree-add.sh| 26 +++---
 3 files changed, 37 insertions(+), 11 deletions(-)

diff --git a/Documentation/git-worktree.txt b/Documentation/git-worktree.txt
index 41585f535d..5d54f36a71 100644
--- a/Documentation/git-worktree.txt
+++ b/Documentation/git-worktree.txt
@@ -61,8 +61,13 @@ $ git worktree add --track -b   
/
 
 +
 If `` is omitted and neither `-b` nor `-B` nor `--detach` used,
-then, as a convenience, a new branch based at HEAD is created automatically,
-as if `-b $(basename )` was specified.
+then, as a convenience, the new worktree is associated with a branch
+(call it ``) named after `$(basename )`.  If ``
+doesn't exist, a new branch based on HEAD is automatically created as
+if `-b ` was given.  If `` does exist, it will be
+checked out in the new worktree, if it's not checked out anywhere
+else, otherwise the command will refuse to create the worktree (unless
+`--force` is used).
 
 list::
 
diff --git a/builtin/worktree.c b/builtin/worktree.c
index 6bd32b6090..d3aeb4877d 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -391,8 +391,17 @@ static const char *dwim_branch(const char *path, const 
char **new_branch)
 {
int n;
const char *s = worktree_basename(path, );
-   *new_branch = xstrndup(s, n);
-   UNLEAK(*new_branch);
+   const char *branchname = xstrndup(s, n);
+   struct strbuf ref = STRBUF_INIT;
+
+   UNLEAK(branchname);
+   if (!strbuf_check_branch_ref(, branchname) &&
+   ref_exists(ref.buf)) {
+   strbuf_release();
+   return branchname;
+   }
+
+   *new_branch = branchname;
if (guess_remote) {
struct object_id oid;
const char *remote =
diff --git a/t/t2025-worktree-add.sh b/t/t2025-worktree-add.sh
index 2b95944973..ad38507d00 100755
--- a/t/t2025-worktree-add.sh
+++ b/t/t2025-worktree-add.sh
@@ -198,13 +198,25 @@ test_expect_success '"add" with  omitted' '
test_cmp_rev HEAD bat
 '
 
-test_expect_success '"add" auto-vivify does not clobber existing branch' '
-   test_commit c1 &&
-   test_commit c2 &&
-   git branch precious HEAD~1 &&
-   test_must_fail git worktree add precious &&
-   test_cmp_rev HEAD~1 precious &&
-   test_path_is_missing precious
+test_expect_success '"add" checks out existing branch of dwimd name' '
+   git branch dwim HEAD~1 &&
+   git worktree add dwim &&
+   test_cmp_rev HEAD~1 dwim &&
+   (
+   cd dwim &&
+   test_cmp_rev HEAD dwim
+   )
+'
+
+test_expect_success '"add " dwim fails with checked out branch' '
+   git checkout -b test-branch &&
+   test_must_fail git worktree add test-branch &&
+   test_path_is_missing test-branch
+'
+
+test_expect_success '"add --force" with existing dwimd name doesnt die' '
+   git checkout test-branch &&
+   git worktree add --force test-branch
 '
 
 test_expect_success '"add" no auto-vivify with --detach and  omitted' '
-- 
2.16.1.74.g7afd1c25cc.dirty



[PATCH v9 1/4] worktree: remove extra members from struct add_opts

2018-04-24 Thread Thomas Gummerer
There are two members of 'struct add_opts', which are only used inside
the 'add()' function, but being part of 'struct add_opts' they are
needlessly also passed to the 'add_worktree' function.

Make them local to the 'add()' function to make it clearer where they
are used.

Signed-off-by: Thomas Gummerer 
---
 builtin/worktree.c | 33 -
 1 file changed, 16 insertions(+), 17 deletions(-)

diff --git a/builtin/worktree.c b/builtin/worktree.c
index 7cef5b120b..bf305c8b7b 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -27,8 +27,6 @@ struct add_opts {
int detach;
int checkout;
int keep_locked;
-   const char *new_branch;
-   int force_new_branch;
 };
 
 static int show_only;
@@ -363,10 +361,11 @@ static int add(int ac, const char **av, const char 
*prefix)
const char *new_branch_force = NULL;
char *path;
const char *branch;
+   const char *new_branch = NULL;
const char *opt_track = NULL;
struct option options[] = {
OPT__FORCE(, N_("checkout  even if already 
checked out in other worktree")),
-   OPT_STRING('b', NULL, _branch, N_("branch"),
+   OPT_STRING('b', NULL, _branch, N_("branch"),
   N_("create a new branch")),
OPT_STRING('B', NULL, _branch_force, N_("branch"),
   N_("create or reset a branch")),
@@ -384,7 +383,7 @@ static int add(int ac, const char **av, const char *prefix)
memset(, 0, sizeof(opts));
opts.checkout = 1;
ac = parse_options(ac, av, prefix, options, worktree_usage, 0);
-   if (!!opts.detach + !!opts.new_branch + !!new_branch_force > 1)
+   if (!!opts.detach + !!new_branch + !!new_branch_force > 1)
die(_("-b, -B, and --detach are mutually exclusive"));
if (ac < 1 || ac > 2)
usage_with_options(worktree_usage, options);
@@ -395,33 +394,33 @@ static int add(int ac, const char **av, const char 
*prefix)
if (!strcmp(branch, "-"))
branch = "@{-1}";
 
-   opts.force_new_branch = !!new_branch_force;
-   if (opts.force_new_branch) {
+   if (new_branch_force) {
struct strbuf symref = STRBUF_INIT;
 
-   opts.new_branch = new_branch_force;
+   new_branch = new_branch_force;
 
if (!opts.force &&
-   !strbuf_check_branch_ref(, opts.new_branch) &&
+   !strbuf_check_branch_ref(, new_branch) &&
ref_exists(symref.buf))
die_if_checked_out(symref.buf, 0);
strbuf_release();
}
 
-   if (ac < 2 && !opts.new_branch && !opts.detach) {
+   if (ac < 2 && !new_branch && !opts.detach) {
int n;
const char *s = worktree_basename(path, );
-   opts.new_branch = xstrndup(s, n);
+   new_branch = xstrndup(s, n);
+   UNLEAK(new_branch);
if (guess_remote) {
struct object_id oid;
const char *remote =
-   unique_tracking_name(opts.new_branch, );
+   unique_tracking_name(new_branch, );
if (remote)
branch = remote;
}
}
 
-   if (ac == 2 && !opts.new_branch && !opts.detach) {
+   if (ac == 2 && !new_branch && !opts.detach) {
struct object_id oid;
struct commit *commit;
const char *remote;
@@ -430,25 +429,25 @@ static int add(int ac, const char **av, const char 
*prefix)
if (!commit) {
remote = unique_tracking_name(branch, );
if (remote) {
-   opts.new_branch = branch;
+   new_branch = branch;
branch = remote;
}
}
}
 
-   if (opts.new_branch) {
+   if (new_branch) {
struct child_process cp = CHILD_PROCESS_INIT;
cp.git_cmd = 1;
argv_array_push(, "branch");
-   if (opts.force_new_branch)
+   if (new_branch_force)
argv_array_push(, "--force");
-   argv_array_push(, opts.new_branch);
+   argv_array_push(, new_branch);
argv_array_push(, branch);
if (opt_track)
argv_array_push(, opt_track);
if (run_command())
return -1;
-   branch = opts.new_branch;
+   branch = new_branch;
} else if (opt_track) {
die(_("--[no-]track can only be used if a new branch is 
created"));
}
-- 
2.16.1.74.g7afd1c25cc.dirty



[PATCH v9 0/4] worktree: teach "add" to check out existing branches

2018-04-24 Thread Thomas Gummerer
Previous rounds are at <20180121120208.12760-1-t.gumme...@gmail.com>,
<20180204221305.28300-1-t.gumme...@gmail.com>,
<20180317220830.30963-1-t.gumme...@gmail.com>,
<2018031719.4940-1-t.gumme...@gmail.com>,
<20180325134947.25828-1-t.gumme...@gmail.com>,
<20180331151804.30380-1-t.gumme...@gmail.com>,
<20180415202917.4360-1-t.gumme...@gmail.com> and
<20180423193848.5159-1-t.gumme...@gmail.com>.

Thanks Eric for the review and the suggestions on the previous round.

Changes since the previous round:

- UNLEAK new_branch after it was xstrndup'd
- update the commit message of 2/4 according to Eric's suggestions
- make force_new_branch a boolean flag in
  print_preparing_worktree_line, instead of using it as the branch
  name.  Instead use new_branch as the new branch name everywhere in
  that function.
- get rid of the ckeckout_existing_branch flag

Interdiff below:

diff --git a/builtin/worktree.c b/builtin/worktree.c
index d52495f312..d3aeb4877d 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -356,18 +356,15 @@ static int add_worktree(const char *path, const char 
*refname,
 static void print_preparing_worktree_line(int detach,
  const char *branch,
  const char *new_branch,
- const char *new_branch_force,
- int checkout_existing_branch)
+ int force_new_branch)
 {
-   if (checkout_existing_branch) {
-   printf_ln(_("Preparing worktree (checking out '%s')"), branch);
-   } else if (new_branch_force) {
-   struct commit *commit = 
lookup_commit_reference_by_name(new_branch_force);
+   if (force_new_branch) {
+   struct commit *commit = 
lookup_commit_reference_by_name(new_branch);
if (!commit)
-   printf_ln(_("Preparing worktree (new branch '%s')"), 
new_branch_force);
+   printf_ln(_("Preparing worktree (new branch '%s')"), 
new_branch);
else
printf_ln(_("Preparing worktree (resetting branch '%s'; 
was at %s)"),
- new_branch_force,
+ new_branch,
  find_unique_abbrev(commit->object.oid.hash,
 DEFAULT_ABBREV));
} else if (new_branch) {
@@ -390,19 +387,17 @@ static void print_preparing_worktree_line(int detach,
}
 }
 
-static const char *dwim_branch(const char *path, const char **new_branch,
-  int *checkout_existing_branch)
+static const char *dwim_branch(const char *path, const char **new_branch)
 {
int n;
const char *s = worktree_basename(path, );
const char *branchname = xstrndup(s, n);
struct strbuf ref = STRBUF_INIT;
 
+   UNLEAK(branchname);
if (!strbuf_check_branch_ref(, branchname) &&
ref_exists(ref.buf)) {
-   *checkout_existing_branch = 1;
strbuf_release();
-   UNLEAK(branchname);
return branchname;
}
 
@@ -421,7 +416,6 @@ static int add(int ac, const char **av, const char *prefix)
struct add_opts opts;
const char *new_branch_force = NULL;
char *path;
-   int checkout_existing_branch = 0;
const char *branch;
const char *new_branch = NULL;
const char *opt_track = NULL;
@@ -469,8 +463,7 @@ static int add(int ac, const char **av, const char *prefix)
}
 
if (ac < 2 && !new_branch && !opts.detach) {
-   const char *s = dwim_branch(path, _branch,
-   _existing_branch);
+   const char *s = dwim_branch(path, _branch);
if (s)
branch = s;
}
@@ -490,8 +483,7 @@ static int add(int ac, const char **av, const char *prefix)
}
}
 
-   print_preparing_worktree_line(opts.detach, branch, new_branch, 
new_branch_force,
- checkout_existing_branch);
+   print_preparing_worktree_line(opts.detach, branch, new_branch, 
!!new_branch_force);
 
if (new_branch) {
struct child_process cp = CHILD_PROCESS_INIT;

Thomas Gummerer (4):
  worktree: remove extra members from struct add_opts
  worktree: improve message when creating a new worktree
  worktree: factor out dwim_branch function
  worktree: teach "add" to check out existing branches

 Documentation/git-worktree.txt |   9 +++-
 builtin/worktree.c | 103 ++---
 t/t2025-worktree-add.sh|  26 ---
 3 files changed, 102 insertions(+), 36 deletions(-)

-- 
2.16.1.74.g7afd1c25cc.dirty



[PATCH v9 2/4] worktree: improve message when creating a new worktree

2018-04-24 Thread Thomas Gummerer
Currently 'git worktree add' produces output like the following:

Preparing ../foo (identifier foo)
HEAD is now at 26da330922 

The '../foo' is the path where the worktree is created, which the user
has just given on the command line.  The identifier is an internal
implementation detail, which is not particularly relevant for the user
and indeed isn't mentioned explicitly anywhere in the man page.

Instead of this message, print a message that gives the user a bit more
detail of what exactly 'git worktree' is doing.  There are various dwim
modes which perform some magic under the hood, which should be
helpful to users.  Just from the output of the command it is not always
visible to users what exactly has happened.

Help the users a bit more by modifying the "Preparing ..." message and
adding some additional information of what 'git worktree add' did under
the hood, while not displaying the identifier anymore.

Currently there are several different cases:

  - 'git worktree add -b ...' or 'git worktree add ', both of
which create a new branch, either through the user explicitly
requesting it, or through 'git worktree add' implicitly creating
it.  This will end up with the following output:

  Preparing worktree (new branch '')
  HEAD is now at 26da330922 

  - 'git worktree add -B ...', which may either create a new branch if
the branch with the given name does not exist yet, or resets an
existing branch to the current HEAD, or the commit-ish given.
Depending on which action is taken, we'll end up with the following
output:

  Preparing worktree (resetting branch ''; was at caa68db14)
  HEAD is now at 26da330922 

or:

  Preparing worktree (new branch '')
  HEAD is now at 26da330922 

  - 'git worktree add --detach' or 'git worktree add 
', both of which create a new worktree with a detached
HEAD, for which we will print the following output:

  Preparing worktree (detached HEAD 26da330922)
  HEAD is now at 26da330922 

  - 'git worktree add  ', which checks out the
branch and prints the following output:

  Preparing worktree (checking out '')
  HEAD is now at 47007d5 

Additionally currently the "Preparing ..." line is printed to stderr,
while the "HEAD is now at ..." line is printed to stdout by 'git reset
--hard', which is used internally by 'git worktree add'.  Fix this
inconsistency by printing the "Preparing ..." message to stdout as
well.  As "Preparing ..." is not an error, stdout also seems like the
more appropriate output stream.

Helped-by: Eric Sunshine 
Signed-off-by: Thomas Gummerer 
---
 builtin/worktree.c | 38 --
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/builtin/worktree.c b/builtin/worktree.c
index bf305c8b7b..39bf1ea865 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -301,8 +301,6 @@ static int add_worktree(const char *path, const char 
*refname,
strbuf_addf(, "%s/commondir", sb_repo.buf);
write_file(sb.buf, "../..");
 
-   fprintf_ln(stderr, _("Preparing %s (identifier %s)"), path, name);
-
argv_array_pushf(_env, "%s=%s", GIT_DIR_ENVIRONMENT, sb_git.buf);
argv_array_pushf(_env, "%s=%s", GIT_WORK_TREE_ENVIRONMENT, path);
cp.git_cmd = 1;
@@ -355,6 +353,40 @@ static int add_worktree(const char *path, const char 
*refname,
return ret;
 }
 
+static void print_preparing_worktree_line(int detach,
+ const char *branch,
+ const char *new_branch,
+ int force_new_branch)
+{
+   if (force_new_branch) {
+   struct commit *commit = 
lookup_commit_reference_by_name(new_branch);
+   if (!commit)
+   printf_ln(_("Preparing worktree (new branch '%s')"), 
new_branch);
+   else
+   printf_ln(_("Preparing worktree (resetting branch '%s'; 
was at %s)"),
+ new_branch,
+ find_unique_abbrev(commit->object.oid.hash,
+DEFAULT_ABBREV));
+   } else if (new_branch) {
+   printf_ln(_("Preparing worktree (new branch '%s')"), 
new_branch);
+   } else {
+   struct strbuf s = STRBUF_INIT;
+   if (!detach && !strbuf_check_branch_ref(, branch) &&
+   ref_exists(s.buf))
+   printf_ln(_("Preparing worktree (checking out '%s')"),
+ branch);
+   else {
+   struct commit *commit = 
lookup_commit_reference_by_name(branch);
+   if (!commit)
+   die(_("invalid reference: %s"), branch);
+   printf_ln(_("Preparing worktree (detached HEAD %s)"),
+ 

Re: [PATCH 5/7] diff.c: add a blocks mode for moved code detection

2018-04-24 Thread Jonathan Tan
On Tue, 24 Apr 2018 14:03:28 -0700
Stefan Beller  wrote:

> Suggested-by: Ævar Arnfjörð Bjarmason 
>  (https://public-inbox.org/git/87o9j0uljo@evledraar.gmail.com/)
> Signed-off-by: Stefan Beller 

Firstly, I don't know if this is the right solution- as written
in the linked e-mail [1], the issue might be more that the config
conflates 2 unrelated things, not that a certain intersection is
missing.

[1] https://public-inbox.org/git/87muykuij7@evledraar.gmail.com/

Optional: Probably better to put the link inline, instead of in the
trailer.

> -test_expect_success 'detect permutations inside moved code -- dimmed_zebra' '
> +test_expect_success 'detect blocks of moved code' '
>   git reset --hard &&
>   cat <<-\EOF >lines.txt &&
>   long line 1
> @@ -1271,6 +1271,52 @@ test_expect_success 'detect permutations inside moved 
> code -- dimmed_zebra' '
>   test_config color.diff.newMovedDimmed "normal cyan" &&
>   test_config color.diff.oldMovedAlternativeDimmed "normal blue" &&
>   test_config color.diff.newMovedAlternativeDimmed "normal yellow" &&

Add a comment here explaining that these colors do not appear in the
output, but merely set to recognizable values to ensure that they do not
appear in the output.

> +
> + git diff HEAD --no-renames --color-moved=blocks --color |
> + grep -v "index" |
> + test_decode_color >actual &&
> + cat <<-\EOF >expected &&
> + diff --git a/lines.txt b/lines.txt
> + --- a/lines.txt
> + +++ b/lines.txt
> + @@ -1,16 +1,16 @@
> + -long line 1
> + -long line 2
> + -long line 3
> +  line 4
> +  line 5
> +  line 6
> +  line 7
> +  line 8
> +  line 9
> + +long line 1
> + +long line 2
> + +long line 3
> + +long line 14
> + +long line 15
> + +long line 16
> +  line 10
> +  line 11
> +  line 12
> +  line 13
> + -long line 14
> + -long line 15
> + -long line 16
> + EOF
> + test_cmp expected actual
> +
> +'

[snip]


Re: [PATCH v3 09/11] technical/shallow: describe the relationship with replace refs

2018-04-24 Thread Philip Oakley

Hi dscho

From: "Johannes Schindelin"  : Tuesday, April 
24, 2018 8:10 PM

On Sun, 22 Apr 2018, Philip Oakley wrote:


From: "Johannes Schindelin" 
> Now that grafts are deprecated, we should start to assume that readers
> have no idea what grafts are. So it makes more sense to describe the
> "shallow" feature in terms of replace refs.


Here we say we should drop the term "grafts"

>
> Suggested-by: Eric Sunshine 
> Signed-off-by: Johannes Schindelin 
> ---
> Documentation/technical/shallow.txt | 19 +++
> 1 file changed, 11 insertions(+), 8 deletions(-)
>
> diff --git a/Documentation/technical/shallow.txt
> b/Documentation/technical/shallow.txt
> index 5183b154229..b3ff23c25f6 100644
> --- a/Documentation/technical/shallow.txt
> +++ b/Documentation/technical/shallow.txt
> @@ -9,14 +9,17 @@ these commits have no parents.
> *
>
> The basic idea is to write the SHA-1s of shallow commits into
> -$GIT_DIR/shallow, and handle its contents like the contents
> -of $GIT_DIR/info/grafts (with the difference that shallow
> -cannot contain parent information).
> -
> -This information is stored in a new file instead of grafts, or
> -even the config, since the user should not touch that file
> -at all (even throughout development of the shallow clone, it
> -was never manually edited!).
> +$GIT_DIR/shallow, and handle its contents similar to replace
> +refs (with the difference that shallow does not actually
> +create those replace refs) and

If grafts are deprecated, why not alse get rid of this mention and simply
leave the 'what it does' part.


Internally, shallow commits are implemented using the graft code path, and


however the change here is just to the documentation, independent of th code 
path's name.



they always will be: we will always need a list of the shallow commits,
and we will always need to be able to lift the "shallow" attribute
quickly, when deepening a shallow clone.

So it makes sense to mention that here, because we are deep in technical
details in Documentation/technical/.

>   very much like the 
> deprecated

> +graft file (with


I was looking to snip this 'graft' reference, as per the commit message..



>   the difference that shallow commits will
> +always have their parents grafted away, not replaced by
s/their parents grafted away/no parents/ (rather than being replaced..)


Then I botched this substitution



But the commits will typically have parents. So they really will have
their parents grafted away as long as they are marked "shallow"...


OK, maybe I mis-used the figurative 'no parents', when it means the literal 
'parents not present'.


Perhaps something like:
+$GIT_DIR/shallow, and handle its contents similar to replace
+refs (with the difference that shallow does not actually
+create those replace refs) with the difference that shallow commits will
+always have their parents not present.

--
Philip 



Confidential Request!!

2018-04-24 Thread Mr. James W. Abbas
-- 
Hello Dear Friend,


I am Williams Abbas, I need your services in a confidential matter
regarding money transfer.

This requires a private arrangement though the details of the
transaction will be furnished to you if you indicate your interest in
this proposal.We have all the legal documents to back up the
transaction, besides we have worked out the modalities to ensure
smooth and risky free transfer.We are willing to offer you 35% of the
money, the fund in question is quite large. All correspondences will
be via email for now. I am expecting to hear from you, if you are
willing to do the business with us to send your response to my private
email address with your private phone number is needed and please note
that this is not scam, but legitimate business offer.


Thanks,
Yours Faithfully,
Williams Abbas


Re: GSoC students and mentors in 2018

2018-04-24 Thread Paul-Sebastian Ungureanu



On 24.04.2018 00:01, Stefan Beller wrote:

Hi Git community,

This year we'll participate once again in Google Summer or Code!
We'll have 3 students and 3 mentors, which is more than in recent years.

Paul-Sebastian Ungureanu mentored by DScho, wants to convert git-stash
into a builtin.

Alban Gruin and Pratik Karki want to convert parts of git-rebase into
a builtin. Both are mentored by Christian and myself.

The slots were just announced today, please join me in welcoming them
to the Git mailing list! (Although you may remember them from the
micro projects[1,2,3])

[1] 
https://public-inbox.org/git/20180319155929.7000-1-ungureanupaulsebast...@gmail.com/
[2] https://public-inbox.org/git/2018030907.17607-1-alban.gr...@gmail.com/
[3] https://public-inbox.org/git/20180327173137.5970-1-predatoram...@gmail.com/

Thanks,
Stefan



Hello,

It is a pleasure and an honor for me to take part in Google Summer of 
Code. I am sure it will be a exciting summer and I will definitely give 
100% to successfully fulfill 'git stash' project!


Best regards,
Paul Ungureanu


[PATCH 5/7] diff.c: add a blocks mode for moved code detection

2018-04-24 Thread Stefan Beller
The new "blocks" mode provides a middle ground between plain and zebra.
It is as intuitive (few colors) as plain, but still has the requirement
for a minimum of lines/characters to count a block as moved.

Suggested-by: Ævar Arnfjörð Bjarmason 
 (https://public-inbox.org/git/87o9j0uljo@evledraar.gmail.com/)
Signed-off-by: Stefan Beller 
---
 Documentation/diff-options.txt |  8 --
 diff.c |  6 +++--
 diff.h |  5 ++--
 t/t4015-diff-whitespace.sh | 48 +-
 4 files changed, 60 insertions(+), 7 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index e3a44f03cd..bb9f1b7cd8 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -276,10 +276,14 @@ plain::
that are added somewhere else in the diff. This mode picks up any
moved line, but it is not very useful in a review to determine
if a block of code was moved without permutation.
-zebra::
+blocks:
Blocks of moved text of at least 20 alphanumeric characters
are detected greedily. The detected blocks are
-   painted using either the 'color.diff.{old,new}Moved' color or
+   painted using either the 'color.diff.{old,new}Moved' color.
+   Adjacent blocks cannot be told apart.
+zebra::
+   Blocks of moved text are detected as in 'blocks' mode. The blocks
+   are painted using either the 'color.diff.{old,new}Moved' color or
'color.diff.{old,new}MovedAlternative'. The change between
the two colors indicates that a new block was detected.
 dimmed_zebra::
diff --git a/diff.c b/diff.c
index d1bae900cd..95c51c0b7d 100644
--- a/diff.c
+++ b/diff.c
@@ -271,6 +271,8 @@ static int parse_color_moved(const char *arg)
return COLOR_MOVED_NO;
else if (!strcmp(arg, "plain"))
return COLOR_MOVED_PLAIN;
+   else if (!strcmp(arg, "blocks"))
+   return COLOR_MOVED_BLOCKS;
else if (!strcmp(arg, "zebra"))
return COLOR_MOVED_ZEBRA;
else if (!strcmp(arg, "default"))
@@ -278,7 +280,7 @@ static int parse_color_moved(const char *arg)
else if (!strcmp(arg, "dimmed_zebra"))
return COLOR_MOVED_ZEBRA_DIM;
else
-   return error(_("color moved setting must be one of 'no', 
'default', 'zebra', 'dimmed_zebra', 'plain'"));
+   return error(_("color moved setting must be one of 'no', 
'default', 'blocks', 'zebra', 'dimmed_zebra', 'plain'"));
 }
 
 int git_diff_ui_config(const char *var, const char *value, void *cb)
@@ -903,7 +905,7 @@ static void mark_color_as_moved(struct diff_options *o,
 
block_length++;
 
-   if (flipped_block)
+   if (flipped_block && o->color_moved != COLOR_MOVED_BLOCKS)
l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
}
adjust_last_block(o, n, block_length);
diff --git a/diff.h b/diff.h
index d29560f822..7bd4f182c3 100644
--- a/diff.h
+++ b/diff.h
@@ -208,8 +208,9 @@ struct diff_options {
enum {
COLOR_MOVED_NO = 0,
COLOR_MOVED_PLAIN = 1,
-   COLOR_MOVED_ZEBRA = 2,
-   COLOR_MOVED_ZEBRA_DIM = 3,
+   COLOR_MOVED_BLOCKS = 2,
+   COLOR_MOVED_ZEBRA = 3,
+   COLOR_MOVED_ZEBRA_DIM = 4,
} color_moved;
#define COLOR_MOVED_DEFAULT COLOR_MOVED_ZEBRA
#define COLOR_MOVED_MIN_ALNUM_COUNT 20
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 17df491a3a..45091abb19 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1223,7 +1223,7 @@ test_expect_success 'plain moved code, inside file' '
test_cmp expected actual
 '
 
-test_expect_success 'detect permutations inside moved code -- dimmed_zebra' '
+test_expect_success 'detect blocks of moved code' '
git reset --hard &&
cat <<-\EOF >lines.txt &&
long line 1
@@ -1271,6 +1271,52 @@ test_expect_success 'detect permutations inside moved 
code -- dimmed_zebra' '
test_config color.diff.newMovedDimmed "normal cyan" &&
test_config color.diff.oldMovedAlternativeDimmed "normal blue" &&
test_config color.diff.newMovedAlternativeDimmed "normal yellow" &&
+
+   git diff HEAD --no-renames --color-moved=blocks --color |
+   grep -v "index" |
+   test_decode_color >actual &&
+   cat <<-\EOF >expected &&
+   diff --git a/lines.txt b/lines.txt
+   --- a/lines.txt
+   +++ b/lines.txt
+   @@ -1,16 +1,16 @@
+   -long line 1
+   -long line 2
+   -long line 3
+line 4
+line 5
+line 6
+line 7
+line 8
+line 9
+   +long line 1
+   +long line 2
+   +long line 3
+   +long line 14
+   +long line 15
+   +long line 16
+line 10
+  

[PATCH 6/7] diff.c: decouple white space treatment from move detection algorithm

2018-04-24 Thread Stefan Beller
In the original implementation of the move detection logic the choice for
ignoring white space changes is the same for the move detection as it is
for the regular diff.  Some cases came up where different treatment would
have been nice.

Allow the user to specify that whitespace should be ignored differently
during detection of moved lines than during generation of added and removed
lines. This is done by providing analogs to the --ignore-space-at-eol,
-b, and -w options (namely,
  --color-moved-[no-]ignore-space-at-eol
  --color-moved-[no-]ignore-space-change
  --color-moved-[no-]ignore-all-space) that affect only the color of the
output, and making the existing --ignore-space-at-eol, -b, and -w options
no longer affect the color of the output.

As we change the default, we'll adjust the tests.

For now we do not infer any options to treat whitespaces in the move
detection from the generic white space options given to diff.
This can be tuned later to reasonable default.

Signed-off-by: Stefan Beller 
---
 Documentation/diff-options.txt | 13 +
 diff.c | 19 ++-
 diff.h |  1 +
 t/t4015-diff-whitespace.sh | 90 +++---
 4 files changed, 114 insertions(+), 9 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index bb9f1b7cd8..7b2527b9a1 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -292,6 +292,19 @@ dimmed_zebra::
blocks are considered interesting, the rest is uninteresting.
 --
 
+--color-moved-[no-]ignore-space-at-eol::
+   Ignore changes in whitespace at EOL when performing the move
+   detection for --color-moved.
+--color-moved-[no-]ignore-space-change::
+   Ignore changes in amount of whitespace when performing the move
+   detection for --color-moved.  This ignores whitespace
+   at line end, and considers all other sequences of one or
+   more whitespace characters to be equivalent.
+--color-moved-[no-]ignore-all-space::
+   Ignore whitespace when comparing lines when performing the move
+   detection for --color-moved.  This ignores differences even if
+   one line has whitespace where the other line has none.
+
 --word-diff[=]::
Show a word diff, using the  to delimit changed words.
By default, words are delimited by whitespace; see
diff --git a/diff.c b/diff.c
index 95c51c0b7d..b5819dd538 100644
--- a/diff.c
+++ b/diff.c
@@ -717,10 +717,12 @@ static int moved_entry_cmp(const void 
*hashmap_cmp_fn_data,
const struct diff_options *diffopt = hashmap_cmp_fn_data;
const struct moved_entry *a = entry;
const struct moved_entry *b = entry_or_key;
+   unsigned flags = diffopt->color_moved_ws_handling
+& XDF_WHITESPACE_FLAGS;
 
return !xdiff_compare_lines(a->es->line, a->es->len,
b->es->line, b->es->len,
-   diffopt->xdl_opts);
+   flags);
 }
 
 static struct moved_entry *prepare_entry(struct diff_options *o,
@@ -728,8 +730,9 @@ static struct moved_entry *prepare_entry(struct 
diff_options *o,
 {
struct moved_entry *ret = xmalloc(sizeof(*ret));
struct emitted_diff_symbol *l = >emitted_symbols->buf[line_no];
+   unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;
 
-   ret->ent.hash = xdiff_hash_string(l->line, l->len, o->xdl_opts);
+   ret->ent.hash = xdiff_hash_string(l->line, l->len, flags);
ret->es = l;
ret->next_line = NULL;
 
@@ -4638,6 +4641,18 @@ int diff_opt_parse(struct diff_options *options,
DIFF_XDL_SET(options, IGNORE_CR_AT_EOL);
else if (!strcmp(arg, "--ignore-blank-lines"))
DIFF_XDL_SET(options, IGNORE_BLANK_LINES);
+   else if (!strcmp(arg, "--color-moved-no-ignore-all-space"))
+   options->color_moved_ws_handling &= ~XDF_IGNORE_WHITESPACE;
+   else if (!strcmp(arg, "--color-moved-no-ignore-space-change"))
+   options->color_moved_ws_handling &= 
~XDF_IGNORE_WHITESPACE_CHANGE;
+   else if (!strcmp(arg, "--color-moved-no-ignore-space-at-eol"))
+   options->color_moved_ws_handling &= 
~XDF_IGNORE_WHITESPACE_AT_EOL;
+   else if (!strcmp(arg, "--color-moved-ignore-all-space"))
+   options->color_moved_ws_handling |= XDF_IGNORE_WHITESPACE;
+   else if (!strcmp(arg, "--color-moved-ignore-space-change"))
+   options->color_moved_ws_handling |= 
XDF_IGNORE_WHITESPACE_CHANGE;
+   else if (!strcmp(arg, "--color-moved-ignore-space-at-eol"))
+   options->color_moved_ws_handling |= 
XDF_IGNORE_WHITESPACE_AT_EOL;
else if (!strcmp(arg, "--indent-heuristic"))
DIFF_XDL_SET(options, INDENT_HEURISTIC);
else if (!strcmp(arg, "--no-indent-heuristic"))
diff --git a/diff.h b/diff.h
index 

[PATCH 4/7] diff.c: adjust hash function signature to match hashmap expectation

2018-04-24 Thread Stefan Beller
This makes the follow up patch easier.

Signed-off-by: Stefan Beller 
---
 diff.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/diff.c b/diff.c
index ce7bedc1b9..d1bae900cd 100644
--- a/diff.c
+++ b/diff.c
@@ -707,11 +707,15 @@ struct moved_entry {
struct moved_entry *next_line;
 };
 
-static int moved_entry_cmp(const struct diff_options *diffopt,
-  const struct moved_entry *a,
-  const struct moved_entry *b,
+static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
+  const void *entry,
+  const void *entry_or_key,
   const void *keydata)
 {
+   const struct diff_options *diffopt = hashmap_cmp_fn_data;
+   const struct moved_entry *a = entry;
+   const struct moved_entry *b = entry_or_key;
+
return !xdiff_compare_lines(a->es->line, a->es->len,
b->es->line, b->es->len,
diffopt->xdl_opts);
@@ -5534,10 +5538,8 @@ static void diff_flush_patch_all_file_pairs(struct 
diff_options *o)
if (o->color_moved) {
struct hashmap add_lines, del_lines;
 
-   hashmap_init(_lines,
-(hashmap_cmp_fn)moved_entry_cmp, o, 0);
-   hashmap_init(_lines,
-(hashmap_cmp_fn)moved_entry_cmp, o, 0);
+   hashmap_init(_lines, moved_entry_cmp, o, 0);
+   hashmap_init(_lines, moved_entry_cmp, o, 0);
 
add_lines_to_move_detection(o, _lines, _lines);
mark_color_as_moved(o, _lines, _lines);
-- 
2.17.0.441.gb46fe60e1d-goog



[PATCH 1/7] xdiff/xdiff.h: remove unused flags

2018-04-24 Thread Stefan Beller
These flags were there since the beginning (3443546f6e (Use a *real*
built-in diff generator, 2006-03-24), but were never used. Remove them.

Signed-off-by: Stefan Beller 
---
 xdiff/xdiff.h | 8 
 1 file changed, 8 deletions(-)

diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index c1937a2911..2356da5f78 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -52,14 +52,6 @@ extern "C" {
 #define XDL_EMIT_FUNCNAMES (1 << 0)
 #define XDL_EMIT_FUNCCONTEXT (1 << 2)
 
-#define XDL_MMB_READONLY (1 << 0)
-
-#define XDL_MMF_ATOMIC (1 << 0)
-
-#define XDL_BDOP_INS 1
-#define XDL_BDOP_CPY 2
-#define XDL_BDOP_INSB 3
-
 /* merge simplification levels */
 #define XDL_MERGE_MINIMAL 0
 #define XDL_MERGE_EAGER 1
-- 
2.17.0.441.gb46fe60e1d-goog



[PATCHv2 0/7] Moved code detection: ignore space on uniform indentation

2018-04-24 Thread Stefan Beller
v2:
I think I have addressed Jonathans feedback
* by using a string instead of counting the first character only.
* refined tests slightly (easier to read)
* moved white space handling for moved blocks into its own flag field,
  keeping the enum for the actual mode of move detection.

v1:
This is a re-attempt of [1], which allows the moved code detection to
ignore blanks in various modes.

patches 1-5 are refactoring, patch 6 adds all existing white space options
of regular diff to the move detection. (I am unsure about this patch,
as I presume we want to keep the option space at a minimum if possible).

The fun is in the last patch, which allows white space sensitive
languages to trust the move detection, too. Each block that is marked as
moved will have the same delta in {in-, de-}dentation.
I would think this mode might be a reasonable default eventually.

Thanks,
Stefan

[1] https://public-inbox.org/git/20171025224620.27657-1-sbel...@google.com/

Stefan Beller (7):
  xdiff/xdiff.h: remove unused flags
  xdiff/xdiffi.c: remove unneeded function declarations
  diff.c: do not pass diff options as keydata to hashmap
  diff.c: adjust hash function signature to match hashmap expectation
  diff.c: add a blocks mode for moved code detection
  diff.c: decouple white space treatment from move detection algorithm
  diff.c: add --color-moved-ignore-space-delta option

 Documentation/diff-options.txt |  25 -
 diff.c | 128 ++
 diff.h |   8 +-
 t/t4015-diff-whitespace.sh | 192 +++--
 xdiff/xdiff.h  |   8 --
 xdiff/xdiffi.c |  17 ---
 6 files changed, 322 insertions(+), 56 deletions(-)

-- 
2.17.0.441.gb46fe60e1d-goog



[PATCH 2/7] xdiff/xdiffi.c: remove unneeded function declarations

2018-04-24 Thread Stefan Beller
There is no need to forward-declare these functions, as they are used
after their implementation only.

Signed-off-by: Stefan Beller 
---
 xdiff/xdiffi.c | 17 -
 1 file changed, 17 deletions(-)

diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 0de1ef463b..3e8aff92bc 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -22,34 +22,17 @@
 
 #include "xinclude.h"
 
-
-
 #define XDL_MAX_COST_MIN 256
 #define XDL_HEUR_MIN_COST 256
 #define XDL_LINE_MAX (long)((1UL << (CHAR_BIT * sizeof(long) - 1)) - 1)
 #define XDL_SNAKE_CNT 20
 #define XDL_K_HEUR 4
 
-
-
 typedef struct s_xdpsplit {
long i1, i2;
int min_lo, min_hi;
 } xdpsplit_t;
 
-
-
-
-static long xdl_split(unsigned long const *ha1, long off1, long lim1,
- unsigned long const *ha2, long off2, long lim2,
- long *kvdf, long *kvdb, int need_min, xdpsplit_t *spl,
- xdalgoenv_t *xenv);
-static xdchange_t *xdl_add_change(xdchange_t *xscr, long i1, long i2, long 
chg1, long chg2);
-
-
-
-
-
 /*
  * See "An O(ND) Difference Algorithm and its Variations", by Eugene Myers.
  * Basically considers a "box" (off1, off2, lim1, lim2) and scan from both
-- 
2.17.0.441.gb46fe60e1d-goog



[PATCH 3/7] diff.c: do not pass diff options as keydata to hashmap

2018-04-24 Thread Stefan Beller
When we initialize the hashmap, we give it a pointer to the
diff_options, which it then passes along to each call of the
hashmap_cmp_fn function. There's no need to pass it a second time as
the "keydata" parameter, and our comparison functions never look at
keydata.

This was a mistake left over from an earlier round of 2e2d5ac184
(diff.c: color moved lines differently, 2017-06-30), before hashmap
learned to pass the data pointer for us.

Explanation-by: Jeff King 
Signed-off-by: Stefan Beller 
---
 diff.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 1289df4b1f..ce7bedc1b9 100644
--- a/diff.c
+++ b/diff.c
@@ -842,13 +842,13 @@ static void mark_color_as_moved(struct diff_options *o,
case DIFF_SYMBOL_PLUS:
hm = del_lines;
key = prepare_entry(o, n);
-   match = hashmap_get(hm, key, o);
+   match = hashmap_get(hm, key, NULL);
free(key);
break;
case DIFF_SYMBOL_MINUS:
hm = add_lines;
key = prepare_entry(o, n);
-   match = hashmap_get(hm, key, o);
+   match = hashmap_get(hm, key, NULL);
free(key);
break;
default:
-- 
2.17.0.441.gb46fe60e1d-goog



[PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option

2018-04-24 Thread Stefan Beller
This marks moved code still as blocks when their indentation level
changes uniformly.

Signed-off-by: Stefan Beller 
---
 Documentation/diff-options.txt |  4 ++
 diff.c | 83 +++---
 diff.h |  2 +
 t/t4015-diff-whitespace.sh | 54 ++
 4 files changed, 137 insertions(+), 6 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 7b2527b9a1..facdbc8f95 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -304,6 +304,10 @@ dimmed_zebra::
Ignore whitespace when comparing lines when performing the move
detection for --color-moved.  This ignores differences even if
one line has whitespace where the other line has none.
+--color-moved-[no-]ignore-space-prefix-delta::
+   Ignores whitespace when comparing lines when performing the move
+   detection for --color-moved. This ignores uniform differences
+   of white space at the beginning lines in moved blocks.
 
 --word-diff[=]::
Show a word diff, using the  to delimit changed words.
diff --git a/diff.c b/diff.c
index b5819dd538..1227a4d2a8 100644
--- a/diff.c
+++ b/diff.c
@@ -709,6 +709,31 @@ struct moved_entry {
struct moved_entry *next_line;
 };
 
+struct ws_delta {
+   char *string; /* The prefix delta, which is the same in the block */
+   int direction; /* adding or removing the line? */
+   int missmatch; /* in the remainder */
+};
+#define WS_DELTA_INIT { NULL, 0, 0 }
+
+static void compute_ws_delta(const struct emitted_diff_symbol *a,
+const struct emitted_diff_symbol *b,
+struct ws_delta *out)
+{
+   const struct emitted_diff_symbol *longer =  a->len > b->len ? a : b;
+   const struct emitted_diff_symbol *shorter = a->len > b->len ? b : a;
+   int d = longer->len - shorter->len;
+
+   out->missmatch = !memcmp(longer->line + d, shorter->line, shorter->len);
+   out->string = xmemdupz(longer->line, d);
+   out->direction = (a == longer);
+}
+
+static int compare_ws_delta(const struct ws_delta *a, const struct ws_delta *b)
+{
+   return a->direction == b->direction && !strcmp(a->string, b->string);
+}
+
 static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
   const void *entry,
   const void *entry_or_key,
@@ -720,6 +745,15 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
unsigned flags = diffopt->color_moved_ws_handling
 & XDF_WHITESPACE_FLAGS;
 
+   if (diffopt->color_moved_ws_handling & COLOR_MOVED_DELTA_WHITESPACES)
+   /*
+* As there is not specific white space config given,
+* we'd need to check for a new block, so ignore all
+* white space. The setup of the white space
+* configuration for the next block is done else where
+*/
+   flags |= XDF_IGNORE_WHITESPACE;
+
return !xdiff_compare_lines(a->es->line, a->es->len,
b->es->line, b->es->len,
flags);
@@ -772,7 +806,8 @@ static void add_lines_to_move_detection(struct diff_options 
*o,
 }
 
 static int shrink_potential_moved_blocks(struct moved_entry **pmb,
-int pmb_nr)
+int pmb_nr,
+struct ws_delta **wsd)
 {
int lp, rp;
 
@@ -788,6 +823,10 @@ static int shrink_potential_moved_blocks(struct 
moved_entry **pmb,
 
if (lp < pmb_nr && rp > -1 && lp < rp) {
pmb[lp] = pmb[rp];
+   if (*wsd) {
+   free((*wsd)[lp].string);
+   (*wsd)[lp] = (*wsd)[rp];
+   }
pmb[rp] = NULL;
rp--;
lp++;
@@ -837,8 +876,11 @@ static void mark_color_as_moved(struct diff_options *o,
 {
struct moved_entry **pmb = NULL; /* potentially moved blocks */
int pmb_nr = 0, pmb_alloc = 0;
-   int n, flipped_block = 1, block_length = 0;
 
+   struct ws_delta *wsd = NULL; /* white space deltas between pmb */
+   int wsd_alloc = 0;
+
+   int n, flipped_block = 1, block_length = 0;
 
for (n = 0; n < o->emitted_symbols->nr; n++) {
struct hashmap *hm = NULL;
@@ -881,14 +923,31 @@ static void mark_color_as_moved(struct diff_options *o,
struct moved_entry *p = pmb[i];
struct moved_entry *pnext = (p && p->next_line) ?
p->next_line : NULL;
-   if (pnext && !hm->cmpfn(o, pnext, match, NULL)) {
-   pmb[i] = p->next_line;
+
+  

Re: [PATCH v2 1/2] merge: Add merge.renames config setting

2018-04-24 Thread Ben Peart



On 4/24/2018 2:59 PM, Elijah Newren wrote:

Sorry, I noticed something else I missed on my last reading...

On Tue, Apr 24, 2018 at 10:11 AM, Ben Peart  wrote:

diff --git a/builtin/merge.c b/builtin/merge.c
index 8746c5e3e8..3be52cd316 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -424,6 +424,7 @@ static void finish(struct commit *head_commit,
 opts.output_format |=
 DIFF_FORMAT_SUMMARY | DIFF_FORMAT_DIFFSTAT;
 opts.detect_rename = DIFF_DETECT_RENAME;
+   git_config_get_bool("merge.renames", _rename);
 diff_setup_done();
 diff_tree_oid(head, new_head, "", );
 diffcore_std();


Shouldn't this also be turned off if either (a) merge.renames is unset
and diff.renames is false, or (b) the user specifies -Xno-renames?



This makes me think that I should probably remove the line that 
overrides the detect_rename setting with the merge config setting.  As I 
look at the code, none of the other merge options are reflected in the 
diffstat; instead, all the settings are pretty much hard coded.  Perhaps 
I shouldn't rock that boat.


[PATCH v10 2/2] fixup! t6046: testcases checking whether updates can be skipped in a merge

2018-04-24 Thread Elijah Newren
---
 t/t6046-merge-skip-unneeded-updates.sh | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/t/t6046-merge-skip-unneeded-updates.sh 
b/t/t6046-merge-skip-unneeded-updates.sh
index 880cd782d7..fcefffcaec 100755
--- a/t/t6046-merge-skip-unneeded-updates.sh
+++ b/t/t6046-merge-skip-unneeded-updates.sh
@@ -41,7 +41,7 @@ test_expect_success '1a-setup: Modify(A)/Modify(B), change on 
B subset of A' '
(
cd 1a &&
 
-   test_write_lines 1 2 3 4 5 6 7 8 9 10 >b
+   test_write_lines 1 2 3 4 5 6 7 8 9 10 >b &&
git add b &&
test_tick &&
git commit -m "O" &&
@@ -138,7 +138,7 @@ test_expect_success '2a-setup: Modify(A)/rename(B)' '
(
cd 2a &&
 
-   test_seq 1 10 >b
+   test_seq 1 10 >b &&
git add b &&
test_tick &&
git commit -m "O" &&
@@ -148,7 +148,7 @@ test_expect_success '2a-setup: Modify(A)/rename(B)' '
git branch B &&
 
git checkout A &&
-   test_seq 1 11 > b &&
+   test_seq 1 11 >b &&
git add b &&
test_tick &&
git commit -m "A" &&
@@ -229,7 +229,7 @@ test_expect_success '2b-setup: Rename+Mod(A)/Mod(B), B mods 
subset of A' '
(
cd 2b &&
 
-   test_write_lines 1 2 3 4 5 6 7 8 9 10 >b
+   test_write_lines 1 2 3 4 5 6 7 8 9 10 >b &&
git add b &&
test_tick &&
git commit -m "O" &&
@@ -337,7 +337,7 @@ test_expect_success '2c-setup: Modify b & add c VS rename 
b->c' '
(
cd 2c &&
 
-   test_seq 1 10 >b
+   test_seq 1 10 >b &&
git add b &&
test_tick &&
git commit -m "O" &&
@@ -443,7 +443,7 @@ test_expect_success '3a-setup: bq_1->foo/bq_2 on A, 
foo/->bar/ on B' '
git branch B &&
 
git checkout A &&
-   test_seq 1 11 > bq &&
+   test_seq 1 11 >bq &&
git add bq &&
git mv bq foo/ &&
test_tick &&
@@ -542,7 +542,7 @@ test_expect_success '3b-setup: bq_1->foo/bq_2 on A, 
foo/->bar/ on B' '
git commit -m "A" &&
 
git checkout B &&
-   test_seq 1 11 > bq &&
+   test_seq 1 11 >bq &&
git add bq &&
git mv foo/ bar/ &&
test_tick &&
@@ -624,7 +624,7 @@ test_expect_success '4a-setup: Change on A, change on B 
subset of A, dirty mods
(
cd 4a &&
 
-   test_write_lines 1 2 3 4 5 6 7 8 9 10 >b
+   test_write_lines 1 2 3 4 5 6 7 8 9 10 >b &&
git add b &&
test_tick &&
git commit -m "O" &&
@@ -698,7 +698,7 @@ test_expect_success '4b-setup: Rename+Mod(A)/Mod(B), change 
on B subset of A, di
(
cd 4b &&
 
-   test_write_lines 1 2 3 4 5 6 7 8 9 10 >b
+   test_write_lines 1 2 3 4 5 6 7 8 9 10 >b &&
git add b &&
test_tick &&
git commit -m "O" &&
-- 
2.17.0.295.g791b7256b2.dirty



[PATCH v10 1/2] fixup! merge-recursive: fix was_tracked() to quit lying with some renamed paths

2018-04-24 Thread Elijah Newren
---
 merge-recursive.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 1de8dc1c53..f2cbad4f10 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -3146,10 +3146,10 @@ int merge_trees(struct merge_options *o,
 
/* Free the extra index left from git_merge_trees() */
/*
-* FIXME: Need to also data allocated by setup_unpack_trees_porcelain()
-* tucked away in o->unpack_opts.msgs, but the problem is that only
-* half of it refers to dynamically allocated data, while the other
-* half points at static strings.
+* FIXME: Need to also free data allocated by
+* setup_unpack_trees_porcelain() tucked away in o->unpack_opts.msgs,
+* but the problem is that only half of it refers to dynamically
+* allocated data, while the other half points at static strings.
 */
discard_index(>orig_index);
 
-- 
2.17.0.295.g791b7256b2.dirty



Fetching tags overwrites existing tags

2018-04-24 Thread Wink Saville
If have a repository with a tag "v1.0.0" and I add a remote repository
which also has a tag "v1.0.0" tag is overwritten.

Google found [1] from 2011 and option 3 is what I'd like to see. Has it been
implemented and I just don't see it?

[1]: https://groups.google.com/forum/#!topic/git-version-control/0l_rJFyTE60


Here is an example demonstrating what I see:

$ echo abc > abc.txt
$ git init .
Initialized empty Git repository in
/home/wink/prgs/git/investigate-fetch-tags/.git/
$ git add *
$ git commit -m "Initial commit"
[master (root-commit) 1116fdc] Initial commit
 1 file changed, 1 insertion(+)
 create mode 100644 abc.txt
$ git tag v1.0.0
$ git remote add gbenchmark g...@github.com:google/benchmark
$ git log --graph --format="%h %s %d"
* 1116fdc Initial commit  (HEAD -> master, tag: v1.0.0)
$ git fetch --tags gbenchmark
warning: no common commits
remote: Counting objects: 4400, done.
remote: Compressing objects: 100% (15/15), done.
remote: Total 4400 (delta 5), reused 5 (delta 3), pack-reused 4382
Receiving objects: 100% (4400/4400), 1.33 MiB | 2.81 MiB/s, done.
Resolving deltas: 100% (2863/2863), done.
>From github.com:google/benchmark
 * [new branch]  clangtidy   -> gbenchmark/clangtidy
 * [new branch]  iter_report -> gbenchmark/iter_report
 * [new branch]  master  -> gbenchmark/master
 * [new branch]  releasing   -> gbenchmark/releasing
 * [new branch]  reportercleanup -> gbenchmark/reportercleanup
 * [new branch]  rmheaders   -> gbenchmark/rmheaders
 * [new branch]  v2  -> gbenchmark/v2
 * [new tag] v0.0.9  -> v0.0.9
 * [new tag] v0.1.0  -> v0.1.0
 t [tag update]  v1.0.0  -> v1.0.0
 * [new tag] v1.1.0  -> v1.1.0
 * [new tag] v1.2.0  -> v1.2.0
 * [new tag] v1.3.0  -> v1.3.0
 * [new tag] v1.4.0  -> v1.4.0
$ git log --graph --format="%h %s %d"
* 1116fdc Initial commit  (HEAD -> master)

As you can see the tag on 1116fdc is gone, v1.0.0 tag has been updated
and now its pointing to the tag in gbenchmark:

$ git log -5 --graph --format="%h %s %d" v1.0.0
*   cd525ae Merge pull request #171 from eliben/update-doc-userealtime
 (tag: v1.0.0)
|\
| * c7ab1b9 Update README to mention UseRealTime for wallclock time
measurements.
|/
* f662e8b Rename OS_MACOSX macro to new name BENCHMARK_OS_MACOSX. Fix #169
*   0a1f484 Merge pull request #166 from disconnect3d/master
|\
| * d2917bc Fixes #165: CustomArguments ret type in README
|/

Ideally I would have liked the tags fetched from gbenchmark to have a prefix
of gbenchmark/, like the branches have, maybe something like:

$ git fetch --tags gbenchmark
...
 * [new branch]  v2  -> gbenchmark/v2
 * [new tag] v0.0.9  -> gbenchmark/v0.0.9
 * [new tag] v0.1.0  -> gbenchmark/v0.1.0
 * [new tag] v1.0.0  -> gbenchmark/v1.0.0
 * [new tag] v1.1.0  -> gbenchmark/v1.1.0
 * [new tag] v1.2.0  -> gbenchmark/v1.2.0
 * [new tag] v1.3.0  -> gbenchmark/v1.3.0
 * [new tag] v1.4.0  -> gbenchmark/v1.4.0


-- Wink


Assalamu`Alaikum.

2018-04-24 Thread Mohammad Ouattara



Dear Sir/Madam.

Assalamu`Alaikum.

I am Dr mohammad ouattara, I have  ($14.6 Million us dollars) to transfer into 
your account,

I will send you more details about this deal and the procedures to follow when 
I receive a positive response from you, 

Have a great day,
Dr mohammad ouattara.


Re: [PATCH v8 06/16] sequencer: introduce the `merge` command

2018-04-24 Thread Philip Oakley

From: "Johannes Schindelin" 

On Mon, 23 Apr 2018, Philip Oakley wrote:

From: "Johannes Schindelin"  : Monday, April 
23,

2018 1:03 PM
Subject: Re: [PATCH v8 06/16] sequencer: introduce the `merge` command

[...]
>
> > > label onto
> > >
> > > # Branch abc
> > > reset onto
> >
> > Is this reset strictly necessary. We are already there @head.
>
> No, this is not strictly necessary, but

I've realised my misunderstanding. I was thinking this (and others) was
equivalent to

$  git reset  # maybe even --hard,

i.e. affecting the worktree


Oh, but it *is* affecting the worktree. In this case, since we label HEAD
and then immediately reset to the label, there is just nothing to change.

Consider this example, though:

label onto

# Branch: from-philip
reset onto
pick abcdef something
label from-philip

# Branch: with-love
reset onto
pick 012345 else
label with-love

reset onto
merge -C 98765 from-philip
merge -C 43210 with-love

Only in the first instance is the `reset onto` a no-op, an incidental one.
After picking `something` and labeling the result as `from-philip`,
though, the next `reset onto` really resets the worktree.


rather that just being a movement of the Head rev (though I may be having
brain fade here regarding untracked files etc..)


The current way of doing things does not allow the `reset` to overwrite
untracked, nor ignored files (I think, I only verified the former, not the
latter).

But yeah, it is not just a movement of HEAD. It does reset the worktree,
although quite a bit more gently (and safely) than `git reset --hard`. In
that respect, this patch series is a drastic improvement over the Git
garden shears (which is the shell script I use in Git for Windows which
inspired this here patch series).

thanks for clarifying. Yes my reasoning  was a total brain fade ... Along 
with the fact that it's a soft/safe/gentle reset.

--
Philip 



Re: [PATCH v3 09/11] technical/shallow: describe the relationship with replace refs

2018-04-24 Thread Johannes Schindelin
Hi Philip,

On Sun, 22 Apr 2018, Philip Oakley wrote:

> From: "Johannes Schindelin" 
> > Now that grafts are deprecated, we should start to assume that readers
> > have no idea what grafts are. So it makes more sense to describe the
> > "shallow" feature in terms of replace refs.
> >
> > Suggested-by: Eric Sunshine 
> > Signed-off-by: Johannes Schindelin 
> > ---
> > Documentation/technical/shallow.txt | 19 +++
> > 1 file changed, 11 insertions(+), 8 deletions(-)
> >
> > diff --git a/Documentation/technical/shallow.txt
> > b/Documentation/technical/shallow.txt
> > index 5183b154229..b3ff23c25f6 100644
> > --- a/Documentation/technical/shallow.txt
> > +++ b/Documentation/technical/shallow.txt
> > @@ -9,14 +9,17 @@ these commits have no parents.
> > *
> >
> > The basic idea is to write the SHA-1s of shallow commits into
> > -$GIT_DIR/shallow, and handle its contents like the contents
> > -of $GIT_DIR/info/grafts (with the difference that shallow
> > -cannot contain parent information).
> > -
> > -This information is stored in a new file instead of grafts, or
> > -even the config, since the user should not touch that file
> > -at all (even throughout development of the shallow clone, it
> > -was never manually edited!).
> > +$GIT_DIR/shallow, and handle its contents similar to replace
> > +refs (with the difference that shallow does not actually
> > +create those replace refs) and
> 
> If grafts are deprecated, why not alse get rid of this mention and simply
> leave the 'what it does' part.

Internally, shallow commits are implemented using the graft code path, and
they always will be: we will always need a list of the shallow commits,
and we will always need to be able to lift the "shallow" attribute
quickly, when deepening a shallow clone.

So it makes sense to mention that here, because we are deep in technical
details in Documentation/technical/.

> >   very much like the deprecated
> > +graft file (with
> 
> >   the difference that shallow commits will
> > +always have their parents grafted away, not replaced by
> s/their parents grafted away/no parents/ (rather than being replaced..)

But the commits will typically have parents. So they really will have
their parents grafted away as long as they are marked "shallow"...

Thank you for reviewing!
Dscho


Re: [PATCH v4 05/11] replace: introduce --convert-graft-file

2018-04-24 Thread Johannes Schindelin
Hi Eric,

On Sat, 21 Apr 2018, Eric Sunshine wrote:

> On Sat, Apr 21, 2018 at 5:48 AM, Johannes Schindelin
>  wrote:
> > This option is intended to help with the transition away from the
> > now-deprecated graft file.
> >
> > Signed-off-by: Johannes Schindelin 
> > ---
> > diff --git a/builtin/replace.c b/builtin/replace.c
> > @@ -454,6 +455,38 @@ static int create_graft(int argc, const char **argv, 
> > int force)
> > +static int convert_graft_file(int force)
> > +{
> > +   const char *graft_file = get_graft_file();
> > +   FILE *fp = fopen_or_warn(graft_file, "r");
> > +   struct strbuf buf = STRBUF_INIT, err = STRBUF_INIT;
> > +   struct argv_array args = ARGV_ARRAY_INIT;
> > +
> > +   if (!fp)
> > +   return -1;
> > +
> > +   while (strbuf_getline(, fp) != EOF) {
> > +   if (*buf.buf == '#')
> > +   continue;
> > +
> > +   argv_array_split(, buf.buf);
> > +   if (args.argc && create_graft(args.argc, args.argv, force))
> > +   strbuf_addf(, "\n\t%s", buf.buf);
> > +   argv_array_clear();
> > +   }
> > +
> > +   strbuf_release();
> > +   argv_array_clear();
> 
> This argv_array_clear() is redundant, isn't it?

It sure is!

Thank you for the review,
Dscho


Re: [PATCH v4 04/11] replace: "libify" create_graft() and callees

2018-04-24 Thread Stefan Beller
Hi Johannes,

On Tue, Apr 24, 2018 at 11:51 AM, Johannes Schindelin
 wrote:

>
> Oy vey. How many more mistakes can I introduce in one commit...
>

I ask this myself all the time, but Software is hard when not having
computer assisted checks. The test suite doesn't quite count here,
as it doesn't yell loudly enough for leaks in corner cases.

Thanks for taking these seriously, I was unsure if the
first issues (close() clobbering the errno) were sever enough
to bother. It complicates the code, but the effect is theoretical)
(for EBADF) or a real niche corner case (EINTR).

Speaking of that, I wonder if we eventually want to have
a wrapper

int xclose(int fd)
{
int err = errno;
int ret = close(fd)
if (errno == EINTR)
/* on linux we don't care about this, other OSes? */
;
errno = err;
return ret;
}

Though not in this series.

Thanks,
Stefan


Re: [PATCH v2 1/2] merge: Add merge.renames config setting

2018-04-24 Thread Elijah Newren
Sorry, I noticed something else I missed on my last reading...

On Tue, Apr 24, 2018 at 10:11 AM, Ben Peart  wrote:
> diff --git a/builtin/merge.c b/builtin/merge.c
> index 8746c5e3e8..3be52cd316 100644
> --- a/builtin/merge.c
> +++ b/builtin/merge.c
> @@ -424,6 +424,7 @@ static void finish(struct commit *head_commit,
> opts.output_format |=
> DIFF_FORMAT_SUMMARY | DIFF_FORMAT_DIFFSTAT;
> opts.detect_rename = DIFF_DETECT_RENAME;
> +   git_config_get_bool("merge.renames", _rename);
> diff_setup_done();
> diff_tree_oid(head, new_head, "", );
> diffcore_std();

Shouldn't this also be turned off if either (a) merge.renames is unset
and diff.renames is false, or (b) the user specifies -Xno-renames?


Re: [PATCH v3 5/9] ref-filter: use generation number for --contains

2018-04-24 Thread Jakub Narebski
Derrick Stolee  writes:
> On 4/18/2018 5:02 PM, Jakub Narebski wrote:
>> Derrick Stolee  writes:
>>
>>> A commit A can reach a commit B only if the generation number of A
>>> is larger than the generation number of B. This condition allows
>>> significantly short-circuiting commit-graph walks.
>>>
>>> Use generation number for 'git tag --contains' queries.
>>>
>>> On a copy of the Linux repository where HEAD is containd in v4.13
>>> but no earlier tag, the command 'git tag --contains HEAD' had the
>>> following peformance improvement:
>>>
>>> Before: 0.81s
>>> After:  0.04s
>>> Rel %:  -95%
>>
>> A question: what is the performance after if the "commit-graph" feature
>> is disabled, or there is no commit-graph file?  Is there performance
>> regression in this case, or is the difference negligible?
>
> Negligible, since we are adding a small number of integer comparisons
> and the main cost is in commit parsing. More on commit parsing in
> response to your comments below.

If it is proven to be always negligible, then its all right.  If it is
unlikely to be non-negligible, well, still O.K.  But I wonder if maybe
there is some situation where the cost of extra parsing is non-negligble.

[...]
>>>   @@ -1618,8 +1623,18 @@ static enum contains_result 
>>> contains_tag_algo(struct commit *candidate,
>>>   struct contains_cache *cache)
>>>   {
>>> struct contains_stack contains_stack = { 0, 0, NULL };
>>> -   enum contains_result result = contains_test(candidate, want, cache);
>>> +   enum contains_result result;
>>> +   uint32_t cutoff = GENERATION_NUMBER_INFINITY;
>>> +   const struct commit_list *p;
>>> +
>>> +   for (p = want; p; p = p->next) {
>>> +   struct commit *c = p->item;
>>> +   parse_commit_or_die(c);
>>> +   if (c->generation < cutoff)
>>> +   cutoff = c->generation;
>>> +   }
>> Sholdn't the above be made conditional on the ability to get generation
>> numbers from the commit-graph file (feature is turned on and file
>> exists)?  Otherwise here after the change contains_tag_algo() now parses
>> each commit in 'want', which I think was not done previously.
>>
>> With commit-graph file parsing is [probably] cheap.  Without it, not
>> necessary.
>>
>> But I might be worrying about nothing.
>
> Not nothing. This parses the "wants" when we previously did not parse
> the wants. Further: this parsing happens before we do the simple check
> of comparing the OID of the candidate against the wants.
>
> The question is: are these parsed commits significant compared to the
> walk that will parse many more commits? It is certainly possible.
>
> One way to fix this is to call 'prepare_commit_graph()' directly and
> then test that 'commit_graph' is non-null before performing any
> parses. I'm not thrilled with how that couples the commit-graph
> implementation to this feature, but that may be necessary to avoid
> regressions in the non-commit-graph case.

Another possible solution (not sure if better or worse) would be to
change the signature of contains_tag_algo() function to take parameter
or flag that would decide whether to parse "wants".

Best,
-- 
Jakub Narębski


Re: [PATCH 9/9] cache.h: allow sha1_object_info to handle arbitrary repositories

2018-04-24 Thread Jonathan Tan
On Tue, 24 Apr 2018 11:42:33 -0700
Brandon Williams  wrote:

> On 04/24, Jonathan Tan wrote:
> > On Mon, 23 Apr 2018 16:43:27 -0700
> > Stefan Beller  wrote:
> > 
> > > This involves also adapting sha1_object_info_extended and a some
> > > internal functions that are used to implement these. It all has to
> > > happen in one patch, because of a single recursive chain of calls visits
> > > all these functions.
> > 
> > In packfile.c, unpack_entry() invokes get_delta_base_cache_entry(),
> > which references a global (delta_base_cache). Does delta_base_cache need
> > to be moved to the repo object (or object store object) first, or is
> > this safe?
> 
> After looking at this, I think it should be safe for now since its a
> cache that requires a packed_git pointer to access (and those would be
> per repository).  We may want to move it in to the repository at some
> point though.

Thanks for the pointer. I see that a packed_git pointer is part of the
key to that hashmap, so this is indeed safe. (Probably worth discussing
this in the commit message.)

There is another global, delta_base_cached, but it just limits the total
size of the hashmap, so it's safe to use that too.


Re: [PATCH v4 04/11] replace: "libify" create_graft() and callees

2018-04-24 Thread Johannes Schindelin
Hi Stefan,

On Mon, 23 Apr 2018, Stefan Beller wrote:

> On Fri, Apr 20, 2018 at 3:21 PM, Johannes Schindelin
>  wrote:
> > @@ -250,27 +257,38 @@ static void import_object(struct object_id *oid, enum 
> > object_type type,
> > -   if (strbuf_read(, cmd.out, 41) < 0)
> > -   die_errno("unable to read from mktree");
> > +   if (strbuf_read(, cmd.out, 41) < 0) {
> > +   close(fd);
> > +   close(cmd.out);
> > +   return error_errno("unable to read from mktree");
> 
> So before the errno is coming directly from strbuf_read,
> which will set errno on error to the desired errno.
> (It will come from an underlying read())

Yes, you are right!

> However close() may fail and clobber errno,
> so I would think we'd need to
> 
> if (strbuf_read(, cmd.out, 41) < 0) {
>   int err =  errno; /* close shall not clobber errno */
>   close(fd);
>   close(cmd.out);
>   errno = err;
>   return error_errno(...);
> }

I went for the easier route: call error_errno() before close(fd), and then
return -1 after close(cmd.out). Since error_errno() always returns -1, the
result is pretty much the same (I do not think that we want the caller of
import_object() to rely on the errno).

> > -   if (fstat(fd, ) < 0)
> > -   die_errno("unable to fstat %s", filename);
> > +   if (fstat(fd, ) < 0) {
> > +   close(fd);
> > +   return error_errno("unable to fstat %s", filename);
> > +   }
> 
> Same here?

Yep.

> An alternative would be to do
> ret = error_errno(...)
> close (..)
> return ret;

I even saved one variable ;-)

> > @@ -288,19 +307,23 @@ static int edit_and_replace(const char *object_ref, 
> > int force, int raw)
> > struct strbuf ref = STRBUF_INIT;
> >
> > if (get_oid(object_ref, _oid) < 0)
> > -   die("Not a valid object name: '%s'", object_ref);
> > +   return error("Not a valid object name: '%s'", object_ref);
> >
> > type = oid_object_info(_oid, NULL);
> > if (type < 0)
> > -   die("unable to get object type for %s", 
> > oid_to_hex(_oid));
> > +   return error("unable to get object type for %s",
> > +oid_to_hex(_oid));
> >
> > -   check_ref_valid(_oid, , , force);
> > +   if (check_ref_valid(_oid, , , force))
> > +   return -1;
> > strbuf_release();
> >
> > -   export_object(_oid, type, raw, tmpfile);
> > +   if (export_object(_oid, type, raw, tmpfile))
> > +   return -1;
> > if (launch_editor(tmpfile, NULL, NULL) < 0)
> > -   die("editing object file failed");
> > -   import_object(_oid, type, raw, tmpfile);
> > +   return error("editing object file failed");
> > +   if (import_object(_oid, type, raw, tmpfile))
> > +   return -1;
> >
> > free(tmpfile);
> 
> Do we need to free tmpfile in previous returns?

Oy vey. How many more mistakes can I introduce in one commit...

> > @@ -394,24 +422,29 @@ static int create_graft(int argc, const char **argv, 
> > int force)
> > unsigned long size;
> >
> > if (get_oid(old_ref, _oid) < 0)
> > -   die(_("Not a valid object name: '%s'"), old_ref);
> > -   commit = lookup_commit_or_die(_oid, old_ref);
> > +   return error(_("Not a valid object name: '%s'"), old_ref);
> > +   commit = lookup_commit_reference(_oid);
> > +   if (!commit)
> > +   return error(_("could not parse %s"), old_ref);
> >
> > buffer = get_commit_buffer(commit, );
> > strbuf_add(, buffer, size);
> > unuse_commit_buffer(commit, buffer);
> >
> > -   replace_parents(, argc - 1, [1]);
> > +   if (replace_parents(, argc - 1, [1]) < 0)
> > +   return -1;
> >
> > if (remove_signature()) {
> > warning(_("the original commit '%s' has a gpg signature."), 
> > old_ref);
> > warning(_("the signature will be removed in the replacement 
> > commit!"));
> > }
> >
> > -   check_mergetags(commit, argc, argv);
> > +   if (check_mergetags(commit, argc, argv))
> > +   return -1;
> >
> > if (write_object_file(buf.buf, buf.len, commit_type, _oid))
> > -   die(_("could not write replacement commit for: '%s'"), 
> > old_ref);
> > +   return error(_("could not write replacement commit for: 
> > '%s'"),
> > +old_ref);
> >
> > strbuf_release();
> 
> Release  in the other return cases, too?

Absolutely.

Thank you for helping me improve these patches,
Dscho


Re: [PATCH 9/9] cache.h: allow sha1_object_info to handle arbitrary repositories

2018-04-24 Thread Brandon Williams
On 04/24, Jonathan Tan wrote:
> On Mon, 23 Apr 2018 16:43:27 -0700
> Stefan Beller  wrote:
> 
> > This involves also adapting sha1_object_info_extended and a some
> > internal functions that are used to implement these. It all has to
> > happen in one patch, because of a single recursive chain of calls visits
> > all these functions.
> 
> In packfile.c, unpack_entry() invokes get_delta_base_cache_entry(),
> which references a global (delta_base_cache). Does delta_base_cache need
> to be moved to the repo object (or object store object) first, or is
> this safe?

After looking at this, I think it should be safe for now since its a
cache that requires a packed_git pointer to access (and those would be
per repository).  We may want to move it in to the repository at some
point though.

> 
> Also, in sha1_file.c, oid_object_info_extended() invokes fetch_object(),
> which attempts to fetch a missing object. For this, I think that it's
> best to guard with a "r == the_repository" check, or if there's a better
> way to distinguish between the "default" repository and any repository
> that we newly create (I vaguely remember some distinction when parsing
> environment variables when determining repo paths - the envvars were
> only used for the "default" repository, but not for the others).

This is a little more difficult and I'm not sure I know what the best
course of action would be for this.  Mostly because then this puts a big
recursive dependency on the whole fetch mechanism to handle arbitrary
repositories at the same time these functions are converted.  So maybe
throwing in the runtime check would be the best way to break the
dependencies for now.

-- 
Brandon Williams


Re: [PATCH v3 7/7] contrib/git-jump/git-jump: jump to match column in addition to line

2018-04-24 Thread Taylor Blau
On Tue, Apr 24, 2018 at 01:37:36AM -0400, Eric Sunshine wrote:
> On Tue, Apr 24, 2018 at 1:07 AM, Taylor Blau  wrote:
> > Take advantage of 'git-grep(1)''s new option, '--column-number' in order
> > to teach Peff's 'git-jump' script how to jump to the correct column for
> > any given match.
> >
> > 'git-grep(1)''s output is in the correct format for Vim's jump list, so
> > no additional cleanup is necessary.
> >
> > Signed-off-by: Taylor Blau 
> > ---
> >  contrib/git-jump/git-jump | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
>
> Based upon Ævar review[1], I was expecting to see git-jump/README
> modified by this patch, as well. Perhaps you overlooked or forgot
> about that review comment, or perhaps you disagreed with it?

Yes, and thank you for pointing that out. I recall reading his mail and
thought that when I prepared v3 that I had already included his changes,
but I had in fact not done so.

I amended the git-jump's README to prepare for v4, but was somewhat
confused by Ævar's comment when I reread [1]. I believe he was
suggesting updating the example to remove a reference to ag(1)'s
'--column' when configuring jump.grepCmd to 'ag --column'. Since
git-{grep,jump} support this now by default, I changed that line to
simply 'ag', instead of 'ag --column', as such:

diff --git a/contrib/git-jump/README b/contrib/git-jump/README
index 4484bda410..7630e16854 100644
--- a/contrib/git-jump/README
+++ b/contrib/git-jump/README
@@ -37,3 +37,3 @@ Git-jump can generate four types of interesting lists:

-  3. Any grep matches.
+  3. Any grep matches, including the column of the first match on a line.

@@ -67,3 +67,3 @@ git jump grep -i foo_bar
 # use the silver searcher for git jump grep
-git config jump.grepCmd "ag --column"
+git config jump.grepCmd "ag"
 --
@@ -84,3 +84,3 @@ leaving you to locate subsequent hits in that file or other 
files using
 the editor or pager. By contrast, git-jump provides the editor with a
-complete list of files and line numbers for each match.
+complete list of files, lines, and a column number for each match.

---

Does this look OK?

Thanks,
Taylor


Re: [PATCH v2 2/6] grep.c: take column number as argument to show_line()

2018-04-24 Thread Taylor Blau
On Tue, Apr 24, 2018 at 03:13:55PM +0900, Junio C Hamano wrote:
> Ævar Arnfjörð Bjarmason  writes:
>
> > I think when we add features to git-grep we should be as close to GNU
> > grep as possible (e.g. not add this -m alias meaning something different
> > as in your v1), but if GNU grep doesn't have something go with the trend
> > of other grep tools, as noted at
> > https://beyondgrep.com/feature-comparison/ (and I found another one that
> > has this: https://github.com/beyondgrep/website/pull/83), so there's
> > already 3 prominent grep tools that call this just --column.
> >
> > I think we should just go with that.
>
> OK.  If they called it --column-number, that might have been more in
> line with GNU grep's --line-number, but that is not something we can
> dictate retroactively anyway, so --column to match them would be
> better than trying to be consistent and ending up with being
> different from everybody else.

That sounds sensible. Let's call the new option '--column', and the
configuration options grep.column and color.grep.column to match
(instead of s/column/columnNumber/g), yes?

Thanks,
Taylor


Re: [PATCH 9/9] cache.h: allow sha1_object_info to handle arbitrary repositories

2018-04-24 Thread Jonathan Tan
On Mon, 23 Apr 2018 16:43:27 -0700
Stefan Beller  wrote:

> This involves also adapting sha1_object_info_extended and a some
> internal functions that are used to implement these. It all has to
> happen in one patch, because of a single recursive chain of calls visits
> all these functions.

In packfile.c, unpack_entry() invokes get_delta_base_cache_entry(),
which references a global (delta_base_cache). Does delta_base_cache need
to be moved to the repo object (or object store object) first, or is
this safe?

Also, in sha1_file.c, oid_object_info_extended() invokes fetch_object(),
which attempts to fetch a missing object. For this, I think that it's
best to guard with a "r == the_repository" check, or if there's a better
way to distinguish between the "default" repository and any repository
that we newly create (I vaguely remember some distinction when parsing
environment variables when determining repo paths - the envvars were
only used for the "default" repository, but not for the others).


Re: [PATCH 5/9] packfile: add repository argument to packed_object_info

2018-04-24 Thread Jonathan Tan
On Mon, 23 Apr 2018 16:43:23 -0700
Stefan Beller  wrote:

> diff --git a/sha1_file.c b/sha1_file.c
> index 93f25c6c6a..b292e04fd3 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -1307,7 +1307,8 @@ int oid_object_info_extended_the_repository(const 
> struct object_id *oid, struct
>* information below, so return early.
>*/
>   return 0;
> - rtype = packed_object_info(e.p, e.offset, oi);
> +
> + rtype = packed_object_info(the_repository, e.p, e.offset, oi);

Extra blank line introduced.


Re: [PATCH 2/9] cache.h: add repository argument to oid_object_info

2018-04-24 Thread Jonathan Tan
On Mon, 23 Apr 2018 16:43:20 -0700
Stefan Beller  wrote:

> Add a repository argument to allow the callers of oid_object_info
> to be more specific about which repository to handle. This is a small
> mechanical change; it doesn't change the implementation to handle
> repositories other than the_repository yet.

>From here...

> In the expanded macro the identifier `the_repository` is not actually used,
> so the compiler does not catch if the repository.h header is not included
> at the call site. call sites needing that #include were identified by
> changing the macro to definition to
> 
>   #define sha1_object_info(r, sha1, size) \
>   (r, sha1_object_info_##r(sha1, size)).
> 
> This produces a compiler warning about the left hand side of the comma
> operator being unused, which can be suppressed using -Wno-unused-value.
> 
> To avoid breaking bisection, do not include this trick in the patch.

...until here: I don't think this explanation is necessary - this macro
trick is temporary anyway in all our patch sets. Also, wouldn't
repository.h be needed anyway if we reference "struct repository"?

I would replace this section with the "As with the previous commits"
explanation you have in PATCH 3/9 and others.


Re: [PATCH v2 1/2] merge: Add merge.renames config setting

2018-04-24 Thread Elijah Newren
On Tue, Apr 24, 2018 at 10:11 AM, Ben Peart  wrote:
> Add the ability to control rename detection for merge via a config setting.

Sweet, thanks for including the documentation updates.

I lean towards the side of the argument that says that since
merge.renameLimit inherits from diff.renameLimit, merge.renames should
inherit default value from diff.renames (allow people to not have to
repeat themselves as much if they want to use the same rename settings
for all cases).  Sounds like you and Johannes disagree.  I don't feel
super strongly about this item, but it'd probably be good to get some
other git folks' opinions on this particular point.

Other than that unresolved question, and the separate one about
whether to go with a different option instead (e.g.
merge.defaultStrategy), as being discussed elsewhere in this thread,
the patch looks good to me.


Re: [PATCH 0/3] Colorful blame output

2018-04-24 Thread Stefan Beller
On Mon, Apr 23, 2018 at 7:09 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> This is a revamp of
>> https://public-inbox.org/git/20180416220955.46163-1-sbel...@google.com/
>>
>> Junio had some issue with that version, as it would collide config and 
>> command
>> line options. This is fixed now by introducing another option.
>
> Heh, that sounds as if the series was rerolled unnecessarily, only
> because I had unreasonable complaints.

Sorry for that wording, the complaints were of educational nature, so
I rerolled it.

> I was hoping that issues the
> series had were fixed, and my involvement was merely that I happened
> to be the one who pointed out first.

There were multiple issues (many fixed already), but there was one
outstanding design bug, so I had to reroll.

>
> In any case, will replace and take a look at it.  Thanks.

Thanks for looking at it,
Stefan


Re: [PATCH v1 1/2] merge: Add merge.renames config setting

2018-04-24 Thread Elijah Newren
Hi Dscho,

On Tue, Apr 24, 2018 at 4:58 AM, Johannes Schindelin
 wrote:
> Hi Junio,
>
> On Tue, 24 Apr 2018, Junio C Hamano wrote:
>
>> Yeah, but as opposed to passing "oh, let's see if we can get a
>> reasonable result without rename detection just this time" from the
>> command line, configuring merge.renames=false in would mean exactly
>> that: "we don't need rename detection, just want to skip the cycles
>> spent for it".  That is why I wondered how well the resolve strategy
>> would have fit your needs.
>
> Please do not forget that the context is GVFS, where you would cause a lot
> of pain and suffering by letting users forget to specify that command-line
> option all the time, resulting in several gigabytes of objects having to
> be downloaded just for the sake of rename detection.
>
> So there is a pretty good point in doing this as a config option.

I agree you need a config option, but I think Junio has a good point
that it's worth at least checking out the possibility of a different
one.  In particular, you could add a merge.defaultStrategy (or maybe
merge.twohead to be similar to pull.twohead??) that is set to
'resolve', and use that to avoid rename detection.

Perhaps performance considerations rule out the resolve strategy and
favor recursive, or maybe you need the 'recursive' part of the
recursive strategy (rather than the rename part), or perhaps there's
some other special reason you need to go this route, but since you are
avoiding renames right now it's at least worth considering the resolve
strategy.

Elijah


Re: [PATCH v1 2/2] merge: Add merge.aggressive config setting

2018-04-24 Thread Elijah Newren
Hi Ben,

On Tue, Apr 24, 2018 at 9:45 AM, Ben Peart  wrote:
> On 4/20/2018 1:22 PM, Elijah Newren wrote:
>> On Fri, Apr 20, 2018 at 6:36 AM, Ben Peart 
>> wrote:
>>>
>>> Add the ability to control the aggressive flag passed to read-tree via a
>>> config setting.
>>
>> This feels like a workaround to the performance problems with index
>> updates in merge-recursive.c.
>
> This change wasn't done to solve performance problems.  We turned it on
> because it reduced the number of unmerged entries (from 40K to 1) in the
> particular merge we were looking at.  The additional 3 scenarios that
> --aggressive resolves made that much difference.
>
> That said, it makes sense to me to do

Um...color me perplexed here.  aggressive exists just to do some
resolutions that higher-level strategies can and totally ought to be
able to handle easily (the rules are almost trivially
straight-forward), but deferring allows the higher level strategies
(either merge-recursive or resolve's git-merge-one-file) to handle
slightly differently (e.g. by detecting renames).  merge-recursive
should be able to resolve anything that the unpack_trees aggressive
setting handles.  If it can't, it sounds like there's a horrible bug
somewhere.

Perhaps fixing that bug is the real problem?

Is there any chance you can dig out more details about any of these
conflicts or come up with a simple testcase where running 'git merge
-X no-renames' gives a merge conflict but running with this option
would run to completion?

>> this when rename detection is turned off.  In fact, I think you'd
>> automatically want to set aggressive to true whenever rename detection
>> is turned off (whether by your merge.renames option or the
>> -Xno-renames flag).
>> > I can't think of any reason this setting would be useful separate from
>> turning rename detection off, and it'd actively harm rename detection
>> performance improvements I have in the pipeline.  I'd really prefer to
>> not add this option, and instead combine the setting of aggressive
>> with the other flag.  Do you have an independent reason for wanting
>> this?
>>
>
> While combining them would work for our specific use scenario (since we turn
> both on already along with turning off merge.stat), I really hesitate to tie
> these two different flags and code paths together with a single config
> setting.
>
> While I don't want to needlessly complicate your optimizations in this area
> (they are already complex enough!) I believe we need to keep the option to
> turn on --aggressive without turning off rename detection as a viable
> option.  Perhaps if that is the case, your optimizations have less impact or
> don't apply but the user should be able to make that choice for their
> specific situation.

I totally buy that you need at least one option to avoid waiting for
(current) rename detection in some fashion, and that you don't want
lots of spurious conflicts.  But I don't understand why you believe
that we need to keep the option to turn on the aggressive flag
independently.  What's the usecase?  It wasn't possible before in the
code, no one else has asked for it, and even you say you don't need it
as a separate option.  Is it a concern that turning on aggressive
whenever rename-detection is turned off will break something?  The
only reason I can see to keep the aggressive codepath in unpack_trees
behind a branch instead of it always running unconditionally for every
single caller throughout the codebase is because of renames.  So the
fact that you're turning renames off, to me, suggests that aggressive
flag should automatically be turned on.  I'd even call pre-existing
code (e.g. the -X no-renames option in merge-recursive) that doesn't
turn on the aggressive flag buggy (even if the only result is
suboptimal-performance).

I don't see how an option to turn on the aggressive flag independently
is possibly useful to anyone.  Further, we have strong reason to
believe it will soon be actively harmful.  So...why?  It's totally
possible I'm just missing something.  If there's a good reason for it,
providing some kind of benefit that the user could weigh in a
tradeoff, then I can get on board with providing it as an option, but
right now I just don't see it.


[PATCH v2 1/2] merge: Add merge.renames config setting

2018-04-24 Thread Ben Peart
Add the ability to control rename detection for merge via a config setting.

Reviewed-by: Johannes Schindelin 
Signed-off-by: Ben Peart 
---
 Documentation/diff-config.txt  | 3 ++-
 Documentation/merge-config.txt | 8 +++-
 Documentation/merge-strategies.txt | 6 --
 builtin/merge.c| 1 +
 merge-recursive.c  | 1 +
 5 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt
index 5ca942ab5e..77caa66c2f 100644
--- a/Documentation/diff-config.txt
+++ b/Documentation/diff-config.txt
@@ -112,7 +112,8 @@ diff.orderFile::
 
 diff.renameLimit::
The number of files to consider when performing the copy/rename
-   detection; equivalent to the 'git diff' option `-l`.
+   detection; equivalent to the 'git diff' option `-l`. This setting
+   has no effect if rename detection is turned off.
 
 diff.renames::
Whether and how Git detects renames.  If set to "false",
diff --git a/Documentation/merge-config.txt b/Documentation/merge-config.txt
index 12b6bbf591..0540c44e23 100644
--- a/Documentation/merge-config.txt
+++ b/Documentation/merge-config.txt
@@ -35,7 +35,13 @@ include::fmt-merge-msg-config.txt[]
 merge.renameLimit::
The number of files to consider when performing rename detection
during a merge; if not specified, defaults to the value of
-   diff.renameLimit.
+   diff.renameLimit. This setting has no effect if rename detection
+   is turned off.
+
+merge.renames::
+   Whether and how Git detects renames.  If set to "false",
+   rename detection is disabled. If set to "true", basic rename
+   detection is enabled. This is the default.
 
 merge.renormalize::
Tell Git that canonical representation of files in the
diff --git a/Documentation/merge-strategies.txt 
b/Documentation/merge-strategies.txt
index 4a58aad4b8..1e0728aa12 100644
--- a/Documentation/merge-strategies.txt
+++ b/Documentation/merge-strategies.txt
@@ -84,12 +84,14 @@ no-renormalize;;
`merge.renormalize` configuration variable.
 
 no-renames;;
-   Turn off rename detection.
+   Turn off rename detection. This overrides the `merge.renames`
+   configuration variable.
See also linkgit:git-diff[1] `--no-renames`.
 
 find-renames[=];;
Turn on rename detection, optionally setting the similarity
-   threshold.  This is the default.
+   threshold.  This is the default. This overrides the
+   'merge.renames' configuration variable.
See also linkgit:git-diff[1] `--find-renames`.
 
 rename-threshold=;;
diff --git a/builtin/merge.c b/builtin/merge.c
index 8746c5e3e8..3be52cd316 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -424,6 +424,7 @@ static void finish(struct commit *head_commit,
opts.output_format |=
DIFF_FORMAT_SUMMARY | DIFF_FORMAT_DIFFSTAT;
opts.detect_rename = DIFF_DETECT_RENAME;
+   git_config_get_bool("merge.renames", _rename);
diff_setup_done();
diff_tree_oid(head, new_head, "", );
diffcore_std();
diff --git a/merge-recursive.c b/merge-recursive.c
index 9c05eb7f70..cd5367e890 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -3256,6 +3256,7 @@ static void merge_recursive_config(struct merge_options 
*o)
git_config_get_int("merge.verbosity", >verbosity);
git_config_get_int("diff.renamelimit", >diff_rename_limit);
git_config_get_int("merge.renamelimit", >merge_rename_limit);
+   git_config_get_bool("merge.renames", >detect_rename);
git_config(git_xmerge_config, NULL);
 }
 
-- 
2.17.0.windows.1



[PATCH v2 2/2] merge: Add merge.aggressive config setting

2018-04-24 Thread Ben Peart
Add the ability to control the aggressive flag passed to read-tree via a config 
setting.

Reviewed-by: Johannes Schindelin 
Signed-off-by: Ben Peart 
---
 Documentation/merge-config.txt | 4 
 merge-recursive.c  | 1 +
 2 files changed, 5 insertions(+)

diff --git a/Documentation/merge-config.txt b/Documentation/merge-config.txt
index 0540c44e23..38492bcb98 100644
--- a/Documentation/merge-config.txt
+++ b/Documentation/merge-config.txt
@@ -1,3 +1,7 @@
+merge.aggressive::
+   Passes "aggressive" to read-tree which makes the command resolve
+   a few more cases internally. See "--aggressive" in 
linkgit:git-read-tree[1].
+
 merge.conflictStyle::
Specify the style in which conflicted hunks are written out to
working tree files upon merge.  The default is "merge", which
diff --git a/merge-recursive.c b/merge-recursive.c
index cd5367e890..0ca84e4b82 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -355,6 +355,7 @@ static int git_merge_trees(struct merge_options *o,
o->unpack_opts.fn = threeway_merge;
o->unpack_opts.src_index = _index;
o->unpack_opts.dst_index = _index;
+   git_config_get_bool("merge.aggressive", (int 
*)>unpack_opts.aggressive);
setup_unpack_trees_porcelain(>unpack_opts, "merge");
 
init_tree_desc_from_tree(t+0, common);
-- 
2.17.0.windows.1



[PATCH v2 0/2] add additional config settings for merge

2018-04-24 Thread Ben Peart
Updated in response to feedback.  Mostly documentation changes but the diffstat
at the end of the merge (if on) now honors the new merge.rename setting as well.

Base Ref: master
Web-Diff: https://github.com/benpeart/git/commit/653bfe6e01
Checkout: git fetch https://github.com/benpeart/git merge-options-v2 && git 
checkout 653bfe6e01


### Interdiff (v1..v2):

diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt
index 5ca942ab5e..77caa66c2f 100644
--- a/Documentation/diff-config.txt
+++ b/Documentation/diff-config.txt
@@ -112,7 +112,8 @@ diff.orderFile::
 
 diff.renameLimit::
The number of files to consider when performing the copy/rename
-   detection; equivalent to the 'git diff' option `-l`.
+   detection; equivalent to the 'git diff' option `-l`. This setting
+   has no effect if rename detection is turned off.
 
 diff.renames::
Whether and how Git detects renames.  If set to "false",
diff --git a/Documentation/merge-config.txt b/Documentation/merge-config.txt
index 5a9ab969db..38492bcb98 100644
--- a/Documentation/merge-config.txt
+++ b/Documentation/merge-config.txt
@@ -39,7 +39,8 @@ include::fmt-merge-msg-config.txt[]
 merge.renameLimit::
The number of files to consider when performing rename detection
during a merge; if not specified, defaults to the value of
-   diff.renameLimit.
+   diff.renameLimit. This setting has no effect if rename detection
+   is turned off.
 
 merge.renames::
Whether and how Git detects renames.  If set to "false",
diff --git a/Documentation/merge-strategies.txt 
b/Documentation/merge-strategies.txt
index 4a58aad4b8..1e0728aa12 100644
--- a/Documentation/merge-strategies.txt
+++ b/Documentation/merge-strategies.txt
@@ -84,12 +84,14 @@ no-renormalize;;
`merge.renormalize` configuration variable.
 
 no-renames;;
-   Turn off rename detection.
+   Turn off rename detection. This overrides the `merge.renames`
+   configuration variable.
See also linkgit:git-diff[1] `--no-renames`.
 
 find-renames[=];;
Turn on rename detection, optionally setting the similarity
-   threshold.  This is the default.
+   threshold.  This is the default. This overrides the
+   'merge.renames' configuration variable.
See also linkgit:git-diff[1] `--find-renames`.
 
 rename-threshold=;;
diff --git a/builtin/merge.c b/builtin/merge.c
index 8746c5e3e8..3be52cd316 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -424,6 +424,7 @@ static void finish(struct commit *head_commit,
opts.output_format |=
DIFF_FORMAT_SUMMARY | DIFF_FORMAT_DIFFSTAT;
opts.detect_rename = DIFF_DETECT_RENAME;
+   git_config_get_bool("merge.renames", _rename);
diff_setup_done();
diff_tree_oid(head, new_head, "", );
diffcore_std();


### Patches

Ben Peart (2):
  merge: Add merge.renames config setting
  merge: Add merge.aggressive config setting

 Documentation/diff-config.txt  |  3 ++-
 Documentation/merge-config.txt | 12 +++-
 Documentation/merge-strategies.txt |  6 --
 builtin/merge.c|  1 +
 merge-recursive.c  |  2 ++
 5 files changed, 20 insertions(+), 4 deletions(-)


base-commit: 0b0cc9f86731f894cff8dd25299a9b38c254569e
-- 
2.17.0.windows.1




[PATCH] git: add -N as a short option for --no-pager

2018-04-24 Thread Johannes Sixt
In modern setups, less, the pager, uses alternate screen to show
the content. When it is closed, it switches back to the original
screen, and all content is gone.

It is not uncommon to request that the output remains visible in
the terminal. For this, the option --no-pager can be used. But
it is a bit cumbersome to type, even when command completion is
available. Provide a short option, -N, to make the option easier
accessible.

Signed-off-by: Johannes Sixt 
---
 Documentation/git.txt | 3 ++-
 git.c | 4 ++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/Documentation/git.txt b/Documentation/git.txt
index 4767860e72..17b50b0dc6 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -11,7 +11,7 @@ SYNOPSIS
 [verse]
 'git' [--version] [--help] [-C ] [-c =]
 [--exec-path[=]] [--html-path] [--man-path] [--info-path]
-[-p|--paginate|--no-pager] [--no-replace-objects] [--bare]
+[-p|--paginate|-N|--no-pager] [--no-replace-objects] [--bare]
 [--git-dir=] [--work-tree=] [--namespace=]
 [--super-prefix=]
  []
@@ -103,6 +103,7 @@ foo.bar= ...`) sets `foo.bar` to the empty string which 
`git config
configuration options (see the "Configuration Mechanism" section
below).
 
+-N::
 --no-pager::
Do not pipe Git output into a pager.
 
diff --git a/git.c b/git.c
index ceaa58ef40..9e2d78f442 100644
--- a/git.c
+++ b/git.c
@@ -7,7 +7,7 @@
 const char git_usage_string[] =
N_("git [--version] [--help] [-C ] [-c =]\n"
   "   [--exec-path[=]] [--html-path] [--man-path] 
[--info-path]\n"
-  "   [-p | --paginate | --no-pager] [--no-replace-objects] 
[--bare]\n"
+  "   [-p | --paginate | -N | --no-pager] 
[--no-replace-objects] [--bare]\n"
   "   [--git-dir=] [--work-tree=] 
[--namespace=]\n"
   "[]");
 
@@ -81,7 +81,7 @@ static int handle_options(const char ***argv, int *argc, int 
*envchanged)
exit(0);
} else if (!strcmp(cmd, "-p") || !strcmp(cmd, "--paginate")) {
use_pager = 1;
-   } else if (!strcmp(cmd, "--no-pager")) {
+   } else if (!strcmp(cmd, "-N") || !strcmp(cmd, "--no-pager")) {
use_pager = 0;
if (envchanged)
*envchanged = 1;
-- 
2.17.0.69.g0c1d01d9b6



Re: [PATCH v1 1/2] merge: Add merge.renames config setting

2018-04-24 Thread Ben Peart



On 4/23/2018 5:32 PM, Eckhard Maaß wrote:

On Mon, Apr 23, 2018 at 09:15:09AM -0400, Ben Peart wrote:

In commit 2a2ac926547 when merge.renamelimit was added, it was decided to
have separate settings for merge and diff to give users the ability to
control that behavior.  In this particular case, it will default to the
value of diff.renamelimit when it isn't set.  That isn't consistent with the
other merge settings.


However, it seems like a desirable way to do it.


I'm just one opinion among many but I personally believe the cascading 
settings are complicated enough just with the various config files and 
command line options and which overwrite the others.  I'd rather not 
complicate them further by having settings inherited from one feature 
(diff) to another (merge).


There are currently ~15 merge specific config settings and only 
merge.renamelimit currently does this inheritance.  That said, at least 
one other person thought it was a good idea. :)




Maybe let me throw in some code for discussion (test and documentation
is missing, mainly to form an idea what the change in options should
be). I admit the patch below is concerned only with diff.renames, but
whatever we come up with for merge should be reflected there, too,
doesn't it >
Greetings,
Eckhard

-- >8 --

 From e8a88111f2aaf338a4c19e83251c7178f7152129 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Eckhard=20S=2E=20Maa=C3=9F?= 
Date: Sun, 22 Apr 2018 23:29:08 +0200
Subject: [PATCH] diff: enhance diff.renames to be able to set rename score
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Eckhard S. Maaß 
---
  diff.c | 35 ---
  1 file changed, 28 insertions(+), 7 deletions(-)

diff --git a/diff.c b/diff.c
index 1289df4b1f..a3cedad5cf 100644
--- a/diff.c
+++ b/diff.c
@@ -30,6 +30,7 @@
  #endif
  
  static int diff_detect_rename_default;

+static int diff_rename_score_default;
  static int diff_indent_heuristic = 1;
  static int diff_rename_limit_default = 400;
  static int diff_suppress_blank_empty;
@@ -177,13 +178,33 @@ static int parse_submodule_params(struct diff_options 
*options, const char *valu
return 0;
  }
  
+int parse_rename_score(const char **cp_p);

+
+static int git_config_rename_score(const char *value)
+{
+   int parsed_rename_score = parse_rename_score();
+   if (parsed_rename_score == -1)
+   return error("invalid argument to diff.renamescore: %s", value);
+   diff_rename_score_default = parsed_rename_score;
+   return 0;
+}
+
  static int git_config_rename(const char *var, const char *value)
  {
-   if (!value)
-   return DIFF_DETECT_RENAME;
-   if (!strcasecmp(value, "copies") || !strcasecmp(value, "copy"))
-   return  DIFF_DETECT_COPY;
-   return git_config_bool(var,value) ? DIFF_DETECT_RENAME : 0;
+   if (!value) {
+   diff_detect_rename_default = DIFF_DETECT_RENAME;
+   return 0;
+   }
+   if (skip_to_optional_arg(value, "copies", ) || skip_to_optional_arg(value, 
"copy", )) {
+   diff_detect_rename_default = DIFF_DETECT_COPY;
+   return git_config_rename_score(value);
+   }
+   if (skip_to_optional_arg(value, "renames", ) || skip_to_optional_arg(value, 
"rename", )) {
+   diff_detect_rename_default = DIFF_DETECT_RENAME;
+   return git_config_rename_score(value);
+   }
+   diff_detect_rename_default = git_config_bool(var,value) ? 
DIFF_DETECT_RENAME : 0;
+   return 0;
  }
  
  long parse_algorithm_value(const char *value)

@@ -307,8 +328,7 @@ int git_diff_ui_config(const char *var, const char *value, 
void *cb)
return 0;
}
if (!strcmp(var, "diff.renames")) {
-   diff_detect_rename_default = git_config_rename(var, value);
-   return 0;
+   return git_config_rename(var, value);
}
if (!strcmp(var, "diff.autorefreshindex")) {
diff_auto_refresh_index = git_config_bool(var, value);
@@ -4116,6 +4136,7 @@ void diff_setup(struct diff_options *options)
options->add_remove = diff_addremove;
options->use_color = diff_use_color_default;
options->detect_rename = diff_detect_rename_default;
+   options->rename_score = diff_rename_score_default;
options->xdl_opts |= diff_algorithm;
if (diff_indent_heuristic)
DIFF_XDL_SET(options, INDENT_HEURISTIC);



Re: [PATCH v1 2/2] merge: Add merge.aggressive config setting

2018-04-24 Thread Ben Peart



On 4/20/2018 1:22 PM, Elijah Newren wrote:

On Fri, Apr 20, 2018 at 6:36 AM, Ben Peart  wrote:

Add the ability to control the aggressive flag passed to read-tree via a config 
setting.


This feels like a workaround to the performance problems with index
updates in merge-recursive.c.  


This change wasn't done to solve performance problems.  We turned it on 
because it reduced the number of unmerged entries (from 40K to 1) in the 
particular merge we were looking at.  The additional 3 scenarios that 
--aggressive resolves made that much difference.


That said, it makes sense to me to do

this when rename detection is turned off.  In fact, I think you'd
automatically want to set aggressive to true whenever rename detection
is turned off (whether by your merge.renames option or the
-Xno-renames flag).
> I can't think of any reason this setting would be useful separate from
turning rename detection off, and it'd actively harm rename detection
performance improvements I have in the pipeline.  I'd really prefer to
not add this option, and instead combine the setting of aggressive
with the other flag.  Do you have an independent reason for wanting
this?



While combining them would work for our specific use scenario (since we 
turn both on already along with turning off merge.stat), I really 
hesitate to tie these two different flags and code paths together with a 
single config setting.


While I don't want to needlessly complicate your optimizations in this 
area (they are already complex enough!) I believe we need to keep the 
option to turn on --aggressive without turning off rename detection as a 
viable option.  Perhaps if that is the case, your optimizations have 
less impact or don't apply but the user should be able to make that 
choice for their specific situation.



Thanks,
Elijah



Re: [PATCH 2/2] unpack_trees_options: free messages when done

2018-04-24 Thread Elijah Newren
On Mon, Apr 23, 2018 at 10:13 PM, Martin Ågren  wrote:
> The strings allocated in `setup_unpack_trees_porcelain()` are never
> freed. Provide a function `clear_unpack_trees_porcelain()` to do so and
> call it in the functions which use `setup_unpack_trees_porcelain()`.

This is awesome; thanks.

> diff --git a/merge-recursive.c b/merge-recursive.c
> index 0c0d48624d..8229b91e2f 100644
> --- a/merge-recursive.c
> +++ b/merge-recursive.c
> @@ -301,6 +301,7 @@ static int git_merge_trees(int index_only,
> init_tree_desc_from_tree(t+2, merge);
>
> rc = unpack_trees(3, t, );
> +   clear_unpack_trees_porcelain();
> cache_tree_free(_cache_tree);
> return rc;

Yeah, this could result in an evil merge.  In my series, I want to
continue to be able to call verify_uptodate() from unpack_trees.c in order
to check if files affected by renames are dirty and we need to avoid
overwriting them.  That can trigger error messages, so they need to not be
freed until later.  So, instead, I'd like to see something like the below
(built on top of my series):

-- >8 --

---
 merge-recursive.c | 25 -
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index f2cbad4f10..3491a27bf2 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -337,10 +337,10 @@ static void init_tree_desc_from_tree(struct tree_desc 
*desc, struct tree *tree)
init_tree_desc(desc, tree->buffer, tree->size);
 }
 
-static int git_merge_trees(struct merge_options *o,
-  struct tree *common,
-  struct tree *head,
-  struct tree *merge)
+static int unpack_trees_start(struct merge_options *o,
+ struct tree *common,
+ struct tree *head,
+ struct tree *merge)
 {
int rc;
struct tree_desc t[3];
@@ -378,6 +378,12 @@ static int git_merge_trees(struct merge_options *o,
return rc;
 }
 
+static void unpack_trees_finish(struct merge_options *o)
+{
+   discard_index(>orig_index);
+   clear_unpack_trees_porcelain(>unpack_opts);
+}
+
 struct tree *write_tree_from_memory(struct merge_options *o)
 {
struct tree *result = NULL;
@@ -3079,7 +3085,7 @@ int merge_trees(struct merge_options *o,
return 1;
}
 
-   code = git_merge_trees(o, common, head, merge);
+   code = unpack_trees_start(o, common, head, merge);
 
if (code != 0) {
if (show(o, 4) || o->call_depth)
@@ -3144,14 +3150,7 @@ int merge_trees(struct merge_options *o,
else
clean = 1;
 
-   /* Free the extra index left from git_merge_trees() */
-   /*
-* FIXME: Need to also free data allocated by
-* setup_unpack_trees_porcelain() tucked away in o->unpack_opts.msgs,
-* but the problem is that only half of it refers to dynamically
-* allocated data, while the other half points at static strings.
-*/
-   discard_index(>orig_index);
+   unpack_trees_finish(o);
 
if (o->call_depth && !(*result = write_tree_from_memory(o)))
return -1;
-- 
2.17.0.295.g791b7256b2.dirty



Re: [PATCH v3 4/6] git.c: implement --list-cmds=porcelain

2018-04-24 Thread Duy Nguyen
On Tue, Apr 24, 2018 at 6:12 PM, Duy Nguyen  wrote:
> git-completion.bash will be updated to ask git "give me the commands
> in the mainporcelain, completable or external category". This also
> addresses another thing that bugs me: I wanted an option to let me
> complete all commands instead of just porcelain. This approach kinda
> generalizes that and it would be easy to let the user choose what
> category they want to complete.

To complete this: there could also be yet another special category
"custom", where --list-cmds=custom simply returns a list of commands
specified in config file. With this the user can pick the "custom"
category to have total control of what command shows up at "git "
if they are not satisfied with the predefined categories.
-- 
Duy


Re: [PATCH v3 4/6] git.c: implement --list-cmds=porcelain

2018-04-24 Thread Duy Nguyen
On Mon, Apr 23, 2018 at 3:32 PM, SZEDER Gábor  wrote:
> But then I noticed that it's not an accurate description of the
> current situation, because there is a wide grey area between
> porcelains and plumbing, and the completion script doesn't "filter out
> plumbing commands", but rather filters out commands that can be
> considered too low-level to be useful for "general" usage.
> Consequently, after 'git ' we also list:
>
>   - some 'ancillaryinterrogators': blame, annotate, difftool, fsck,
> help
>   - some 'ancillarymanipulators': config, mergetool, remote
>   - some 'foreignscminterface': p4, request-pull, svn, send-email
>   - even some plumbing: apply, name-rev (though 'name-rev' could be
> omitted; we have 'git describe')
>   - and also all "unknown" 'git-foo' commands that can be found in
> $PATH, which can be the user's own git scripts or other
> git-related tools ('git-annex', Git LFS, etc.).
>
> With this change we wouldn't list any of the above commands, but only
> those that are explicitly categorized as 'mainporcelain'.  I'd much
> prefer the current behaviour.

Yeah I noticed this (kinda) with filter-branch but I did not look
further to see all this. It's good that you review this series then :)

For the first group (known commands), how about we add a new category
"completion" in command-list.txt? Each command may belong to multiple
categories (and my updated script has to deal with that [1]). For the
second group, we could also have a special "external" category that is
produced at run time, not specified in command-list.txt. --list-cmds
option either has to accept multiple values, or we accept multiple
--list-cmds= options.

git-completion.bash will be updated to ask git "give me the commands
in the mainporcelain, completable or external category". This also
addresses another thing that bugs me: I wanted an option to let me
complete all commands instead of just porcelain. This approach kinda
generalizes that and it would be easy to let the user choose what
category they want to complete.

[1] which also means I could bring "deprecated" category back.
-- 
Duy


Re: [PATCH v2 0/3] Some add-on patches on top of dj/runtime-prefix

2018-04-24 Thread Johannes Schindelin
Hi Junio,

On Tue, 24 Apr 2018, Junio C Hamano wrote:

> Junio C Hamano  writes:
> 
> > Junio C Hamano  writes:
> >
> >>> base-commit: b46fe60e1d7235603a29499822493bd3791195da
> >>
> >> Basing your work on the tip of 'next' is good for discussion, but
> >> not readily usable for final application.  Let me see if I can
> >> untangle the dependents to come up with a reasonable base.
> >
> > I'll queue this on top of a merge of 'dj/runtime-prefix' into 'master'.
> > Merging the resulting topic into 'next' and applying these patches
> > directly on top of 'next' result in identical trees, of course ;-)
> 
> Actually, these trivially rebase on top of dj/runtime-prefix, so
> I'll queue them like so without taking it hostage to other things in
> 'master'.  We'd want to keep these mergeable to any integration
> branch that dj/runtime-prefix would be merged to, so that is the
> most logical organization, I would think, even though I do not
> immediately see the reason why we would want to merge
> dj/runtime-prefix to 'maint' and lower right now.
> 
> Thanks.

Thank you, and sorry for the trouble. I am just too used to a Continuous
Integration setting with exactly one integration branch.

I will make an effort in the future to figure out the best base branch for
patches that do not apply cleanly on `master` but require more stuff from
`next`/`pu`.

Ciao,
Dscho


Re: [RFC PATCH] checkout: Force matching mtime between files

2018-04-24 Thread Marc Branchaud

On 2018-04-13 01:01 PM, Michał Górny wrote:

Currently git does not control mtimes of files being checked out.  This
means that the only assumption you could make is that all files created
or modified within a single checkout action will have mtime between
start time and end time of this checkout.  The relations between mtimes
of different files depend on the order in which they are checked out,
filesystem speed and timestamp precision.


Thanks for scratching this old itch [1]!

Big +1 from me.  We've had incremental smoke-test rebuilds fail because 
of inconsistent file timestamps.


M.

[1] https://public-inbox.org/git/50c79d1f.1080...@xiplink.com/


Git repositories sometimes contain both generated and respective source
files.  For example, the Gentoo 'user syncing' repository combines
source ebuild files with generated metadata cache for user convenience.
Ideally, the 'git checkout' would be fast enough that (combined with low
timestamp precision) all files created or modified within a single
checkout would have matching timestamp.  However, in reality the cache
files may end up being wrongly 'older' than source file, requiring
unnecessary recheck.

The opposite problem (cache files not being regenerated when they should
have been) might also occur.  However, it could not be solved without
preserving timestamps, therefore it is out of scope of this change.

In order to avoid unnecessary cache mismatches, force a matching mtime
between all files created by a single checkout action.  This seems to be
the best course of action.  Matching mtimes do not trigger cache
updates.  They also match the concept of 'checkout' being an atomic
action.  Finally, this change does not break backwards compatibility
as the new result is a subset of the possible previous results.

For example, let's say that 'git checkout' creates two files in order,
with respective timestamps T1 and T2.  Before this patch, T1 <= T2.
After this patch, T1 == T2 which also satisfies T1 <= T2.

A similar problem was previously being addressed via git-restore-mtime
tool.  However, that solution is unnecessarily complex for Gentoo's use
case and does not seem to be suitable for 'seamless' integration.

The patch integrates mtime forcing via adding an additional member of
'struct checkout'.  This seemed the simplest way of adding it without
having to modify prototypes and adjust multiple call sites.  The member
is set to the current time in check_updates() function and is afterwards
enforced by checkout_entry().

The patch uses utime() rather than futimes() as that seems to be
the function used everywhere else in git and provided some MinGW
compatibility code.

Signed-off-by: Michał Górny 
---
  cache.h|  1 +
  entry.c| 12 +++-
  unpack-trees.c |  1 +
  3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/cache.h b/cache.h
index bbaf5c349..9f0a7c867 100644
--- a/cache.h
+++ b/cache.h
@@ -1526,6 +1526,7 @@ struct checkout {
const char *base_dir;
int base_dir_len;
struct delayed_checkout *delayed_checkout;
+   time_t checkout_mtime;
unsigned force:1,
 quiet:1,
 not_new:1,
diff --git a/entry.c b/entry.c
index 2101201a1..7ee5a6d28 100644
--- a/entry.c
+++ b/entry.c
@@ -411,6 +411,7 @@ int checkout_entry(struct cache_entry *ce,
  {
static struct strbuf path = STRBUF_INIT;
struct stat st;
+   int ret;
  
  	if (topath)

return write_entry(ce, topath, state, 1);
@@ -473,5 +474,14 @@ int checkout_entry(struct cache_entry *ce,
return 0;
  
  	create_directories(path.buf, path.len, state);

-   return write_entry(ce, path.buf, state, 0);
+   ret = write_entry(ce, path.buf, state, 0);
+
+   if (ret == 0 && state->checkout_mtime != 0) {
+   struct utimbuf t;
+   t.modtime = state->checkout_mtime;
+   if (utime(path.buf, ) < 0)
+   warning_errno("failed utime() on %s", path.buf);
+   }
+
+   return ret;
  }
diff --git a/unpack-trees.c b/unpack-trees.c
index e73745051..e1efefb68 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -346,6 +346,7 @@ static int check_updates(struct unpack_trees_options *o)
state.quiet = 1;
state.refresh_cache = 1;
state.istate = index;
+   state.checkout_mtime = time(NULL);
  
  	progress = get_progress(o);
  



Re: [PATCH v3 7/9] commit: add short-circuit to paint_down_to_common()

2018-04-24 Thread Derrick Stolee

On 4/23/2018 5:38 PM, Jakub Narebski wrote:

Derrick Stolee  writes:


On 4/18/2018 7:19 PM, Jakub Narebski wrote:

Derrick Stolee  writes:


[...]

[...], and this saves time during 'git branch --contains' queries
that would otherwise walk "around" the commit we are inspecting.


If I understand the code properly, what happens is that we can now
short-circuit if all commits that are left are lower than the target
commit.

This is because max-order priority queue is used: if the commit with
maximum generation number is below generation number of target commit,
then target commit is not reachable from any commit in the priority
queue (all of which has generation number less or equal than the commit
at head of queue, i.e. all are same level or deeper); compare what I
have written in [1]

[1]: https://public-inbox.org/git/866052dkju@gmail.com/

Do I have that right?  If so, it looks all right to me.

Yes, the priority queue needs to compare via generation number first
or there will be errors. This is why we could not use commit time
before.

I was more concerned about getting right the order in the priority queue
(does it return minimal or maximal generation number).

I understand that the cutoff could not be used without generation
numbers because of the possibility of clock skew - using cutoff on dates
could lead to wrong results.


Maximal generation number is important so we do not visit commits 
multiple times (say, once with PARENT1 set, and a second time when 
PARENT2 is set). A minimal generation number order would create a DFS 
order and walk until the cutoff every time.


In cases without clock skew, maximal generation number order will walk 
the same set of commits as maximal commit time; the order may differ, 
but only between incomparable commits.



For a copy of the Linux repository, where HEAD is checked out at
v4.13~100, we get the following performance improvement for
'git branch --contains' over the previous commit:

Before: 0.21s
After:  0.13s
Rel %: -38%

[...]

flags = commit->object.flags & (PARENT1 | PARENT2 | STALE);
if (flags == (PARENT1 | PARENT2)) {
if (!(commit->object.flags & RESULT)) {
@@ -876,7 +886,7 @@ static struct commit_list *merge_bases_many(struct commit 
*one, int n, struct co
return NULL;
}
   -list = paint_down_to_common(one, n, twos);
+   list = paint_down_to_common(one, n, twos, 0);
while (list) {
struct commit *commit = pop_commit();
@@ -943,7 +953,7 @@ static int remove_redundant(struct commit **array, int cnt)
filled_index[filled] = j;
work[filled++] = array[j];
}
-   common = paint_down_to_common(array[i], filled, work);
+   common = paint_down_to_common(array[i], filled, work, 0);
if (array[i]->object.flags & PARENT2)
redundant[i] = 1;
for (j = 0; j < filled; j++)
@@ -1067,7 +1077,7 @@ int in_merge_bases_many(struct commit *commit, int 
nr_reference, struct commit *
if (commit->generation > min_generation)
return 0;
   -bases = paint_down_to_common(commit, nr_reference, reference);
+   bases = paint_down_to_common(commit, nr_reference, reference, 
commit->generation);

Is it the only case where we would call paint_down_to_common() with
non-zero last parameter?  Would we always use commit->generation where
commit is the first parameter of paint_down_to_common()?

If both are true and will remain true, then in my humble opinion it is
not necessary to change the signature of this function.

We need to change the signature some way, but maybe the way I chose is
not the best.

No, after taking longer I think the new signature is a good choice.


To elaborate: paint_down_to_common() is used for multiple
purposes. The caller here that supplies 'commit->generation' is used
only to compute reachability (by testing if the flag PARENT2 exists on
the commit, then clears all flags). The other callers expect the full
walk down to the common commits, and keeps those PARENT1, PARENT2, and
STALE flags for future use (such as reporting merge bases). Usually
the call to paint_down_to_common() is followed by a revision walk that
only halts when reaching root commits or commits with both PARENT1 and
PARENT2 flags on, so always short-circuiting on generations would
break the functionality; this is confirmed by the
t5318-commit-graph.sh.

Right.

I have realized that just after sending the email.  I'm sorry about this.


An alternative to the signature change is to add a boolean parameter
"use_cutoff" or something, that specifies "don't walk beyond the
commit". This may give a more of a clear description of what it will
do with the generation value, but since we are already performing
generation comparisons before calling paint_down_to_common() I 

Re: [PATCH v1 1/2] merge: Add merge.renames config setting

2018-04-24 Thread Johannes Schindelin
Hi Junio,

On Tue, 24 Apr 2018, Junio C Hamano wrote:

> Ben Peart  writes:
> 
> >> I also had to wonder how "merge -s resolve" faired, if the project
> >> is not interested in renamed paths at all.
> >>
> >
> > To be clear, it isn't that we're not interested in detecting renamed
> > files and paths.  We're just opposed to it taking an hour to figure
> > that out!
> 
> Yeah, but as opposed to passing "oh, let's see if we can get a
> reasonable result without rename detection just this time" from the
> command line, configuring merge.renames=false in would mean exactly
> that: "we don't need rename detection, just want to skip the cycles
> spent for it".  That is why I wondered how well the resolve strategy
> would have fit your needs.

Please do not forget that the context is GVFS, where you would cause a lot
of pain and suffering by letting users forget to specify that command-line
option all the time, resulting in several gigabytes of objects having to
be downloaded just for the sake of rename detection.

So there is a pretty good point in doing this as a config option.

Ciao,
Dscho


Re: [PATCH 1/2] merge: setup `opts` later in `checkout_fast_forward()`

2018-04-24 Thread Johannes Schindelin
Hi Martin,

On Tue, 24 Apr 2018, Martin Ågren wrote:

> On 24 April 2018 at 08:20, Jacob Keller  wrote:
> > I'm guessing the diff algorithm simply found that this was a more
> > compact representation of the change? It's a bit confusing when your
> > description indicates you "moved" some code down, but it looks like
> > you moved code up.
> 
> Agreed. I'll play with --anchored and other magic stuff to see if I can
> improve this. Or I could instead try to sell this patch as "move some
> other stuff out of the way" ;-) That seems a bit less direct though.

Or you could add a remark to the commit message along the lines "best
viewed with `--anchored=...`". This is what I would do ;-)

Ciao,
Dscho

Re: [PATCH v8 09/16] rebase: introduce the --rebase-merges option

2018-04-24 Thread Johannes Schindelin
Hi Philip,

On Sun, 22 Apr 2018, Philip Oakley wrote:

> From: "Johannes Schindelin" 
> > Once upon a time, this here developer thought: wouldn't it be nice if,
> > say, Git for Windows' patches on top of core Git could be represented as
> > a thicket of branches, and be rebased on top of core Git in order to
> > maintain a cherry-pick'able set of patch series?
> >
> > The original attempt to answer this was: git rebase --preserve-merges.
> >
> > However, that experiment was never intended as an interactive option,
> > and it only piggy-backed on git rebase --interactive because that
> > command's implementation looked already very, very familiar: it was
> > designed by the same person who designed --preserve-merges: yours truly.
> >
> > Some time later, some other developer (I am looking at you, Andreas!
> > ;-)) decided that it would be a good idea to allow --preserve-merges to
> > be combined with --interactive (with caveats!) and the Git maintainer
> > (well, the interim Git maintainer during Junio's absence, that is)
> > agreed, and that is when the glamor of the --preserve-merges design
> > started to fall apart rather quickly and unglamorously.
> >
> > The reason? In --preserve-merges mode, the parents of a merge commit (or
> > for that matter, of *any* commit) were not stated explicitly, but were
> > *implied* by the commit name passed to the `pick` command.
> >
> Aside: I think this para should be extracted to the --preserve-merges
> documentation to highlight what it does / why it is 'wrong' (not what would be
> expected in some case). It may also need to discuss the (figurative) Cousins
> vs. Siblings distinction [merge of branches external, or internal, to the
> rebase.

Quite honestly, I'd much rather spend time improving --rebase-merges than
improving --preserve-merges documentation. In my mind, the latter is
pretty useless, especially once the former lands in an official Git
version.

Of course, feel free to disagree with me by sending a patch to improve the
documentation of --preserve-merges ;-)

> "In --preserve-merges, the commit being selected for merging is implied by the
> commit name  passed to the `pick` command (i.e. of the original merge commit),
> not that of the rebased version of that parent."

It is much, much worse:

In --preserve-merges, no commit can change its ancestry. Every
rebased commit's parents will be the rebased original parents.

Or some such. But really, why bother describing something *that* broken?
Why not work toward a solution that makes that broken option obsolete?
Like, say, --rebase-merges? ;-)

> A similar issue occurs with (figuratively) '--ancestry-path --first parent'
> searches which lacks the alternate '--lead parent' post-walk selection. [1]. I
> don't think there is a dot notation to select the merge cousins, nor merge
> siblings either A.,B ? (that's dot-comma ;-)

I actually had missed `--ancestry-path`... I should probably use it in the
description of the "cousins".

> [... lots of quoted text...]

Could I ask you to make it easier for me by cutting quoted text that is
irrelevant to your reply? The way I read mails forces me to scroll down
(sometimes on a phone) all the way to the end, just to find that that time
was spent in vain.

Thanks,
Dscho




Re: Git archeology

2018-04-24 Thread Vladimir Gorshenin
Hi Christian,

Thank you for the reply.

After I got the reply from Szedar I was so excited about git community
passion for help. And your reply made me ever more assure in it.
Once again thank your for the comprehensive answer. I appreciate
Daniel German's research and going to try token based method in my
task as well.

Have a nice day,

Vladimir

2018-04-21 8:43 GMT+02:00 Christian Couder :
> Hi,
>
> On Sat, Apr 21, 2018 at 8:19 AM, Vladimir Gorshenin
>  wrote:
>> Hi,
>>
>> My team and I as well as millions of other developers are excited to
>> have such tool at hand as Git. It helps us a lot.
>>
>> Now we challenged ourselves to be even more productive with Git
>> analyzing our usage history.
>
> What kind of analysis do you want to do? Is it the same kind of
> analysis as described in the "Token-based authorship information from
> Git" article (https://lwn.net/Articles/698425/) on LWN.net?
>
>> And there is a problem, which I believe is fundamental for Git (please
>> prove me wrong): how to find all overlapping commits, e.g. touching
>> the same lines of code?
>
> It is not very clear what you would consider overlapping commits or
> commits touching the same lines of code. If some lines of code have
> been duplicated in different files, for example, are the commits
> touching the original lines relevant to what happened to the
> duplicated lines? And what about lines that were moved from one file
> to another or in the same file?
>
>> I played with “Git diff” and “Git blame” but without a reliable
>> result. “Git diff” gives only relative number of lines and it’s not
>> easy to track these number through 1000+ commits. “Git blame” has nice
>> output but without any information about deletion.
>
> Did you try 'git log -L' as Szeder Gábor just suggested?
>
>> What would you advice me to do?
>
> If 'git log -L' and other git commands are not doing what you want,
> you might want to take a look at cregit
> (https://github.com/cregit/cregit) and maybe at other work from the
> people who developed it. The above LWN.net article is about their
> early work.
>
> There are links related to this tool in:
> https://git.github.io/rev_news/2017/05/17/edition-27/


Re: Git archeology

2018-04-24 Thread Vladimir Gorshenin
Hi Szedar,

Thank you for the reply! I didn't expect it could be so instant!

I checked 'git log -L' option and it seems to the best one so far. But
nevertheless is has a pit fall: I run it like 'git log -L ,:somefile'
and get the output that needs manual scraping since each commit spans
the whole file despite only few lines were actually altered.

I would like to have an output form 'git log -L' in patch style. Could
it be done somehow?

Have a nice day,

Vladimir

2018-04-21 8:29 GMT+02:00 SZEDER Gábor :
>> And there is a problem, which I believe is fundamental for Git (please
>> prove me wrong): how to find all overlapping commits, e.g. touching
>> the same lines of code?
>
> You might be looking for 'git log -L'.
>


Re: [PATCH 31/41] wt-status: convert two uses of EMPTY_TREE_SHA1_HEX

2018-04-24 Thread Martin Ågren
On 24 April 2018 at 01:39, brian m. carlson
 wrote:
> Convert two uses of EMPTY_TREE_SHA1_HEX to use oid_to_hex_r and
> the_hash_algo to avoid a dependency on a given hash algorithm.  Use
> oid_to_hex_r in preference to oid_to_hex because the buffer needs to
> last through several function calls which might exhaust the limit of
> four static buffers.
>
> Signed-off-by: brian m. carlson 
> ---
>  wt-status.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/wt-status.c b/wt-status.c
> index 50815e5faf..857724bd60 100644
> --- a/wt-status.c
> +++ b/wt-status.c
> @@ -600,10 +600,11 @@ static void wt_status_collect_changes_index(struct 
> wt_status *s)
>  {
> struct rev_info rev;
> struct setup_revision_opt opt;
> +   char hex[GIT_MAX_HEXSZ + 1];
>
> init_revisions(, NULL);
> memset(, 0, sizeof(opt));
> -   opt.def = s->is_initial ? EMPTY_TREE_SHA1_HEX : s->reference;
> +   opt.def = s->is_initial ? oid_to_hex_r(hex, 
> the_hash_algo->empty_tree) : s->reference;
> setup_revisions(0, NULL, , );
>
> rev.diffopt.flags.override_submodule_config = 1;
> @@ -975,13 +976,14 @@ static void wt_longstatus_print_verbose(struct 
> wt_status *s)
> struct setup_revision_opt opt;
> int dirty_submodules;
> const char *c = color(WT_STATUS_HEADER, s);
> +   char hex[GIT_MAX_HEXSZ + 1];
>
> init_revisions(, NULL);
> rev.diffopt.flags.allow_textconv = 1;
> rev.diffopt.ita_invisible_in_index = 1;
>
> memset(, 0, sizeof(opt));
> -   opt.def = s->is_initial ? EMPTY_TREE_SHA1_HEX : s->reference;
> +   opt.def = s->is_initial ? oid_to_hex_r(hex, 
> the_hash_algo->empty_tree) : s->reference;
> setup_revisions(0, NULL, , );

Just a thought: Maybe it would make sense to have a function
`oid_hex_empty_tree()` or similar to replace the
oid_to_hex[_r](the_hash_algo->empty_tree) idiom. It would help avoid the
buffer here, but also get rid of a few instances of code peeking into
the_hash_algo. I dunno.

I've been scanning this series semi-sloppily up to here, and left some
comments along the way.

Thank you for working on this.

Martin


Re: [PATCH 25/41] builtin/receive-pack: avoid hard-coded constants for push certs

2018-04-24 Thread Martin Ågren
On 24 April 2018 at 01:39, brian m. carlson
 wrote:
> Use the GIT_SHA1_RAWSZ and GIT_SHA1_HEXSZ macros instead of hard-coding
> the constants 20 and 40.  Switch one use of 20 with a format specifier
> for a hex value to use the hex constant instead, as the original appears
> to have been a typo.
>
> At this point, avoid converting the hard-coded use of SHA-1 to use
> the_hash_algo.  SHA-1, even if not collision resistant, is secure in the
> context in which it is used here, and the hash algorithm of the repo
> need not match what is used here.  When we adopt a new hash algorithm,
> we can simply adopt the new algorithm wholesale here, as the nonce is
> opaque and its length and validity are entirely controlled by the
> server.  Consequently, defer updating this code until that point.
>
> Signed-off-by: brian m. carlson 
> ---
>  builtin/receive-pack.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
> index c4272fbc96..5f35596c14 100644
> --- a/builtin/receive-pack.c
> +++ b/builtin/receive-pack.c
> @@ -454,21 +454,21 @@ static void hmac_sha1(unsigned char *out,
> /* RFC 2104 2. (6) & (7) */
> git_SHA1_Init();
> git_SHA1_Update(, k_opad, sizeof(k_opad));
> -   git_SHA1_Update(, out, 20);
> +   git_SHA1_Update(, out, GIT_SHA1_RAWSZ);
> git_SHA1_Final(out, );
>  }

Since we do HMAC with SHA-1, we use the functions `git_SHA1_foo()`. Ok.
But then why not just use "20"? Isn't GIT_SHA1_RAWSZ coupled to the
whole hash transition thing? This use of "20" is not, IMHO, the "length
in bytes [...] of an object name" (quoting cache.h).

>  static char *prepare_push_cert_nonce(const char *path, timestamp_t stamp)
>  {
> struct strbuf buf = STRBUF_INIT;
> -   unsigned char sha1[20];
> +   unsigned char sha1[GIT_SHA1_RAWSZ];
>
> strbuf_addf(, "%s:%"PRItime, path, stamp);
> hmac_sha1(sha1, buf.buf, buf.len, cert_nonce_seed, 
> strlen(cert_nonce_seed));;
> strbuf_release();
>
> /* RFC 2104 5. HMAC-SHA1-80 */
> -   strbuf_addf(, "%"PRItime"-%.*s", stamp, 20, sha1_to_hex(sha1));
> +   strbuf_addf(, "%"PRItime"-%.*s", stamp, GIT_SHA1_HEXSZ, 
> sha1_to_hex(sha1));
> return strbuf_detach(, NULL);
>  }

Same comment here.

Martin


Re: [PATCH 21/41] http: eliminate hard-coded constants

2018-04-24 Thread Martin Ågren
On 24 April 2018 at 01:39, brian m. carlson
 wrote:
> Use the_hash_algo to find the right size for parsing pack names.
>
> Signed-off-by: brian m. carlson 
> ---
>  http.c | 11 ++-
>  1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/http.c b/http.c
> index 3034d10b68..ec70676748 100644
> --- a/http.c
> +++ b/http.c
> @@ -2047,7 +2047,8 @@ int http_get_info_packs(const char *base_url, struct 
> packed_git **packs_head)
> int ret = 0, i = 0;
> char *url, *data;
> struct strbuf buf = STRBUF_INIT;
> -   unsigned char sha1[20];
> +   unsigned char hash[GIT_MAX_RAWSZ];
> +   const unsigned hexsz = the_hash_algo->hexsz;
>
> end_url_with_slash(, base_url);
> strbuf_addstr(, "objects/info/packs");
> @@ -2063,11 +2064,11 @@ int http_get_info_packs(const char *base_url, struct 
> packed_git **packs_head)
> switch (data[i]) {
> case 'P':
> i++;
> -   if (i + 52 <= buf.len &&
> +   if (i + hexsz + 12 <= buf.len &&
> starts_with(data + i, " pack-") &&
> -   starts_with(data + i + 46, ".pack\n")) {
> -   get_sha1_hex(data + i + 6, sha1);
> -   fetch_and_setup_pack_index(packs_head, sha1,
> +   starts_with(data + i + hexsz + 6, ".pack\n")) {
> +   get_sha1_hex(data + i + 6, hash);
> +   fetch_and_setup_pack_index(packs_head, hash,
>   base_url);
> i += 51;

s/51/hexsz + 11/ ?


Re: [PATCH 18/41] index-pack: abstract away hash function constant

2018-04-24 Thread Martin Ågren
On 24 April 2018 at 01:39, brian m. carlson
 wrote:
> The code for reading certain pack v2 offsets had a hard-coded 5
> representing the number of uint32_t words that we needed to skip over.
> Specify this value in terms of a value from the_hash_algo.
>
> Signed-off-by: brian m. carlson 
> ---
>  builtin/index-pack.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index d81473e722..c1f94a7da6 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -1543,12 +1543,13 @@ static void read_v2_anomalous_offsets(struct 
> packed_git *p,
>  {
> const uint32_t *idx1, *idx2;
> uint32_t i;
> +   const uint32_t hashwords = the_hash_algo->rawsz / sizeof(uint32_t);

Should we round up? Or just what should we do if a length is not
divisible by 4? (I am not aware of any such hash functions, but one
could exist for all I know.) Another question is whether such an
index-pack v2 will ever contain non-SHA-1 oids to begin with. I can't
find anything suggesting that it could, but this is unfamiliar code to
me.

> /* The address of the 4-byte offset table */
> idx1 = (((const uint32_t *)p->index_data)
> + 2 /* 8-byte header */
> + 256 /* fan out */
> -   + 5 * p->num_objects /* 20-byte SHA-1 table */
> +   + hashwords * p->num_objects /* object ID table */
> + p->num_objects /* CRC32 table */
> );


  1   2   >