Re: [PATCH v2 2/2] oidset: use khash

2018-10-03 Thread René Scharfe
Am 03.10.2018 um 21:40 schrieb Jeff King:
> On Wed, Oct 03, 2018 at 03:16:39PM +0200, René Scharfe wrote:
>> diff --git a/fetch-pack.c b/fetch-pack.c
>> index 75047a4b2a..a839315726 100644
>> --- a/fetch-pack.c
>> +++ b/fetch-pack.c
>> @@ -536,7 +536,7 @@ static int tip_oids_contain(struct oidset *tip_oids,
>>   * add to "newlist" between calls, the additions will always be for
>>   * oids that are already in the set.
>>   */
>> -if (!tip_oids->map.map.tablesize) {
>> +if (!tip_oids->set.n_buckets) {
>>  add_refs_to_oidset(tip_oids, unmatched);
>>  add_refs_to_oidset(tip_oids, newlist);
>>  }
> 
> This is a little intimate with the implementation of khash, but I think
> it's probably OK (and really no worse than what was there before).
> 
> As the comment above notes, I think we're really looking at the case
> where this gets populated on the first call, but not subsequent ones. It
> might be less hacky to use a "static int initialized" here. Or if we
> want to avoid hidden globals, put the logic into filter_refs() to decide
> when to populate.

Right.  I'd prefer the latter, but was unable to find a nice way that
still populates the oidset lazily.  It's certainly worth another look,
and a separate series.

>> diff --git a/oidset.h b/oidset.h
>> index 40ec5f87fe..4b90540cd4 100644
>> --- a/oidset.h
>> +++ b/oidset.h
>> [...]
>> +KHASH_INIT(oid, struct object_id, int, 0, oid_hash, oid_equal)
> 
> This will declare these "static inline". Our other major "macros become
> inline functions" code is commit-slab.h, and there we found it necessary
> to add MAYBE_UNUSED. I wonder if we ought to be doing the same here (I
> don't get any warnings, but I suspect sparse might complain).

I doubt it (but didn't check) because khash.h defines kh_clear_##name(),
which we don't use it anywhere and there have been no complaints so far.
And if we wanted to add MAYBE_UNUSED then the right place for that would
be in KHASH_INIT, no?

> It might be nice if these functions could hide inside oidset.c (and just
> declare the struct here). It looks like we might be able to do that with
> __KHASH_TYPE(), but the double-underscore implies that we're not
> supposed to. ;)
> 
> I guess we also use a few of them in our inlines here. I'm not 100% sure
> that oidset_* needs to be inlined either, but this is at least a pretty
> faithful conversion of the original.

We could inline all of the oidset functions, following the spirit of
klib/khash.h.

Or we could uninline all of them and then may be able to clean up
oidset.h by using KHASH_DECLARE.  Perhaps we'd need to guard with an
"#ifndef THIS_IS_OIDSET_C" or similar to avoid a clash with KHASH_INIT.

Not sure if any of that would be a worthwhile improvement..

René


Re: [RFC/PATCH] drop vcs-svn experiment

2018-10-03 Thread Jonathan Nieder
Jeff King wrote:
> On Fri, Aug 17, 2018 at 10:26:05PM -0700, Jonathan Nieder wrote:

>>  - testsvn:: is a demo and at a minimum we ought not to install it
>>with "make install"
>>
>>  - keeping this in-tree for the benefit of just one user is excessive,
>>so removing it is probably the right thing
>>
>>  - it would be nice if the commit removing this code from Git includes
>>a note to help people find its new home
>>
>> Would you mind holding off until I'm able to arrange that last bit?
>
> Any further thoughts / movement on this? Not urgent, but I saw you
> mention it in another thread, and it's on my "to be decided" pile.

I'd like to move this to contrib so it's all in one place.  Then I
should be able to put it in a separate repo and drop it from Git.

If someone has a spare moment to help, I'd be happy to review patches
that move vcs-svn/* and git-remote-testsvn to contrib/svn-fe
(including the tests, contrib/subtree-style; in a separate repo the
tests would then work fine using sharness).

Thanks,
Jonathan


Re: [RFC/PATCH] drop vcs-svn experiment

2018-10-03 Thread Jeff King
On Fri, Aug 17, 2018 at 10:26:05PM -0700, Jonathan Nieder wrote:

> > Of course, I could be completely wrong about people using this. Maybe
> > svn-fe builds are just completely broken on my system, and maybe people
> > really do use testsvn::. But if so, they certainly aren't talking about
> > it on the mailing list. :)
> 
> My take:
> 
>  - svn-fe works fine and has been useful to me, though its Makefile
>could likely be simplified and made more user-friendly
> 
>  - I've benefited from the test coverage of having this in-tree
> 
>  - testsvn:: is a demo and at a minimum we ought not to install it
>with "make install"
> 
>  - keeping this in-tree for the benefit of just one user is excessive,
>so removing it is probably the right thing
> 
>  - it would be nice if the commit removing this code from Git includes
>a note to help people find its new home
> 
> Would you mind holding off until I'm able to arrange that last bit?

Any further thoughts / movement on this? Not urgent, but I saw you
mention it in another thread, and it's on my "to be decided" pile.

-Peff


Re: git-remote-helper list command question

2018-10-03 Thread Jonathan Nieder
Hi,

Jeff King wrote:
> On Tue, Oct 02, 2018 at 11:43:50AM +0200, Stanisław Drozd wrote:

>> I'm trying to write a fast-import-based git remote helper, but I'm not
>> sure what the output of the `list` command should look like. How can I
>> find an example of the format in action?
>
> There's some documentation in "git help remote-helpers".
>
> For working examples (of this or any other remote-helper stuff), your
> best bet is the git-remote-http helper that ships with Git. E.g., you
> should be able to do:
>
>   echo list | git remote-http https://github.com/git/git
>
> to see sample output.

Other examples to look at:

 git cinnabar 

 remote-testsvn.c in git.git (though it will be moving to contrib/ and
 then out of git.git fairly soon)

These might be more helpful than git-remote-http because they are
fast-import-based helpers.

Thanks and hope that helps,
Jonathan


Re: [PATCH v4 3/4] transport.c: introduce core.alternateRefsCommand

2018-10-03 Thread Taylor Blau
On Tue, Oct 02, 2018 at 07:40:56PM -0400, Jeff King wrote:
> On Mon, Oct 01, 2018 at 07:23:58PM -0700, Taylor Blau wrote:
>
> > +core.alternateRefsCommand::
> > +   When advertising tips of available history from an alternate, use the 
> > shell to
> > +   execute the specified command instead of linkgit:git-for-each-ref[1]. 
> > The
> > +   first argument is the absolute path of the alternate. Output must 
> > contain one
> > +   hex object id per line (i.e., the same as produce by `git for-each-ref
> > +   --format='%(objectname)'`).
> > ++
> > +This is useful when a repository only wishes to advertise some of its
> > +alternate's references as `.have`'s. For example, to only advertise branch
> > +heads, configure `core.alternateRefsCommand` to the path of a script which 
> > runs
> > +`git --git-dir="$1" for-each-ref --format='%(objectname)' refs/heads`.
> > ++
> > +Note that the configured value is executed in a shell, and thus
> > +linkgit:git-for-each-ref[1] by itself does not work, as scripts have to 
> > handle
> > +the path argument specially.
>
> This last paragraph is trying to fix the wrong-impression that we
> discussed in the last round. But I'm not sure it doesn't make things
> more confusing. ;)

Heh, point taken. I suppose that I won't try to ignore your feedback
here!

> Specifically, the problem isn't the shell. The issue is that we pass the
> repo path as an argument to the command. So either:
>
>   - it's a real command that we run, in which case git-for-each-ref does
> not take a repo path argument and so doesn't work; or
>
>   - it's a shell snippet, in which case the argument is appended to the
> snippet (and here's where you can get into a rabbit hole of
> explaining how our shell invocation works, and we should avoid that)
>
> Can we just say:
>
>   Note that you cannot generally put `git for-each-ref` directly into
>   the config value, as it does not take a repository path as an argument
>   (but you can wrap the command above in a shell script).
>
> > [...]

Yeah, I think that this is certainly the way to go. I took your
suggestion as-is, which I think is much clearer than what I wrote.
Thanks!

> The rest of the patch looks good to me, along with the other three
> (modulo the "expect" fixup you already sent).

Thanks for your review, as always :-). I certainly appreciate your
patience with the word-smithing and whatnot.

Junio, I applied Peff's suggestion directly into my local copy, so I'm
happy to do either of a couple things, depending on which would be
easiest for you to pick up. I could either:

  1. Re-send this patch (in addition to 4/4), or

  2. Re-roll the entire series (with this and 4/4 amended to reflect the
 two bits of feedback I've gotten since sending v4).

I imagine that the later will be easier for you to deal with, instead of
manually picking up patch amendments, but if you'd like fewer email,
that works too :-).

Thanks,
Taylor


Re: [PATCH v3 0/3] alias help tweaks

2018-10-03 Thread Jeff King
On Wed, Oct 03, 2018 at 01:42:39PM +0200, Rasmus Villemoes wrote:

> v2: Added patches 2 and 3, made "git cmd --help" unconditionally (no
> config option, no delay) redirect to the aliased command's help,
> preserve pre-existing behaviour of the spelling "git help cmd".
> 
> v3: Add some additional comments in patch 1 and avoid triggering leak
> checker reports. Use better wording in patch 3.

Thanks, v3 looks good to me!

-Peff


Re: [PATCH v10 0/8] filter: support for excluding all trees and blobs

2018-10-03 Thread Matthew DeVore
On Wed, Oct 3, 2018 at 12:52 PM Matthew DeVore  wrote:
>
> This is a minor change to the previous rollup. It moves positional
> arguments to the end of git-rev-list invocations. Here is an interdiff
> from v9:
...
There is another problem with this patchset related to dropped exit
codes and pipelines. In t6112, we run "git cat-file -t" on an object
with was rm'd without being promised. It was printing an error and
going undetected because it was upstream in a pipeline. The file was
removed in the previous test.

So I fixed the previous test to clone the repository before
manipulating it, and I fixed the latter test to not mask Git exit
codes :) (This is a really insidious pattern and I should have taken
it more seriously.) Below is an interdiff. The two tests are added in
different commits, so each commit had to be fixed up.

I'll send a re-roll in two days or so if there are no more comments.

diff --git a/t/t6112-rev-list-filters-objects.sh
b/t/t6112-rev-list-filters-objects.sh
index 5a61614b1..c8e3d87c4 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -210,16 +210,21 @@ test_expect_success 'verify sparse:oid=oid-ish
omits top-level files' '
 test_expect_success 'rev-list W/ --missing=print and
--missing=allow-any for trees' '
 TREE=$(git -C r3 rev-parse HEAD:dir1) &&

-rm r3/.git/objects/$(echo $TREE | sed "s|^..|&/|") &&
+# Create a spare repo because we will be deleting objects
from this one.
+git clone r3 r3.b &&

-git -C r3 rev-list --quiet --missing=print --objects HEAD
>missing_objs 2>rev_list_err &&
+rm r3.b/.git/objects/$(echo $TREE | sed "s|^..|&/|") &&
+
+git -C r3.b rev-list --quiet --missing=print --objects HEAD \
+>missing_objs 2>rev_list_err &&
 echo "?$TREE" >expected &&
 test_cmp expected missing_objs &&

 # do not complain when a missing tree cannot be parsed
 test_must_be_empty rev_list_err &&

-git -C r3 rev-list --missing=allow-any --objects HEAD >objs
2>rev_list_err &&
+git -C r3.b rev-list --missing=allow-any --objects HEAD \
+>objs 2>rev_list_err &&
 ! grep $TREE objs &&
 test_must_be_empty rev_list_err
 '
@@ -228,12 +233,13 @@ test_expect_success 'rev-list W/ --missing=print
and --missing=allow-any for tre

 test_expect_success 'verify tree:0 includes trees in "filtered" output' '
 git -C r3 rev-list --quiet --objects --filter-print-omitted \
---filter=tree:0 HEAD |
-awk -f print_1.awk |
+--filter=tree:0 HEAD >revs &&
+
+awk -f print_1.awk revs |
 sed s/~// |
-xargs -n1 git -C r3 cat-file -t |
-sort -u >filtered_types &&
+xargs -n1 git -C r3 cat-file -t >unsorted_filtered_types &&

+sort -u unsorted_filtered_types >filtered_types &&
 printf "blob\ntree\n" >expected &&
 test_cmp expected filtered_types
 '


[PATCH v2 0/2] Lazy fetch bug fix (and feature that reveals it)

2018-10-03 Thread Jonathan Tan
> This seems to break 5520 and 5616 when merged to 'pu'.  
> 
> It seems that merging master to md/filter-trees and then applying
> this is sufficient to break 5616.

I verified that 5616 is broken on [master + md/filter-trees + my patch],
and after investigation, found a bug in the lazy fetch implementation.
Patch 1 contains a fix, and more information about that bug.

Patch 2 is the original patch, updated with the commit message I
suggested [1] in reply to Junio's comment.

I bundled these patches together because patch 2 (in combination with
certain other commits) reveals the bug that patch 1 fixes, and to make
it easier for others to verify that these patches together pass all
tests when rebased on [master + md/filter-trees] or 'pu' (at least, as
of the time of writing).

[1] 
https://public-inbox.org/git/20180927183718.89804-1-jonathanta...@google.com/

Jonathan Tan (2):
  fetch-pack: avoid object flags if no_dependents
  fetch-pack: exclude blobs when lazy-fetching trees

 fetch-pack.c | 115 +--
 fetch-pack.h |   7 +++
 t/t0410-partial-clone.sh |  41 ++
 3 files changed, 121 insertions(+), 42 deletions(-)

-- 
2.19.0.605.g01d371f741-goog



[PATCH v2 2/2] fetch-pack: exclude blobs when lazy-fetching trees

2018-10-03 Thread Jonathan Tan
A partial clone with missing trees can be obtained using "git clone
--filter=tree:none ". In such a repository, when a tree needs to
be lazily fetched, any tree or blob it directly or indirectly references
is fetched as well, regardless of whether the original command required
those objects, or if the local repository already had some of them.

This is because the fetch protocol, which the lazy fetch uses, does not
allow clients to request that only the wanted objects be sent, which
would be the ideal solution. This patch implements a partial solution:
specify the "blob:none" filter, somewhat reducing the fetch payload.

This change has no effect when lazily fetching blobs (due to how filters
work). And if lazily fetching a commit (such repositories are difficult
to construct and is not a use case we support very well, but it is
possible), referenced commits and trees are still fetched - only the
blobs are not fetched.

The necessary code change is done in fetch_pack() instead of somewhere
closer to where the "filter" instruction is written to the wire so that
only one part of the code needs to be changed in order for users of all
protocol versions to benefit from this optimization.

Signed-off-by: Jonathan Tan 
---
 fetch-pack.c | 14 ++
 fetch-pack.h |  7 +++
 t/t0410-partial-clone.sh | 41 
 3 files changed, 62 insertions(+)

diff --git a/fetch-pack.c b/fetch-pack.c
index b9b1211dda..5d82666a52 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1615,6 +1615,20 @@ struct ref *fetch_pack(struct fetch_pack_args *args,
if (nr_sought)
nr_sought = remove_duplicates_in_refs(sought, nr_sought);
 
+   if (args->no_dependents && !args->filter_options.choice) {
+   /*
+* The protocol does not support requesting that only the
+* wanted objects be sent, so approximate this by setting a
+* "blob:none" filter if no filter is already set. This works
+* for all object types: note that wanted blobs will still be
+* sent because they are directly specified as a "want".
+*
+* NEEDSWORK: Add an option in the protocol to request that
+* only the wanted objects be sent, and implement it.
+*/
+   parse_list_objects_filter(>filter_options, "blob:none");
+   }
+
if (!ref) {
packet_flush(fd[1]);
die(_("no matching remote head"));
diff --git a/fetch-pack.h b/fetch-pack.h
index 5b6e868802..43ec344d95 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -43,6 +43,13 @@ struct fetch_pack_args {
unsigned from_promisor:1;
 
/*
+* Attempt to fetch only the wanted objects, and not any objects
+* referred to by them. Due to protocol limitations, extraneous
+* objects may still be included. (When fetching non-blob
+* objects, only blobs are excluded; when fetching a blob, the
+* blob itself will still be sent. The client does not need to
+* know whether a wanted object is a blob or not.)
+*
 * If 1, fetch_pack() will also not modify any object flags.
 * This allows fetch_pack() to safely be called by any function,
 * regardless of which object flags it uses (if any).
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index cfd0655ea1..c521d7d6c6 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -182,6 +182,47 @@ test_expect_success 'fetching of missing objects works 
with ref-in-want enabled'
grep "git< fetch=.*ref-in-want" trace
 '
 
+test_expect_success 'fetching of missing blobs works' '
+   rm -rf server repo &&
+   test_create_repo server &&
+   test_commit -C server foo &&
+   git -C server repack -a -d --write-bitmap-index &&
+
+   git clone "file://$(pwd)/server" repo &&
+   git hash-object repo/foo.t >blobhash &&
+   rm -rf repo/.git/objects/* &&
+
+   git -C server config uploadpack.allowanysha1inwant 1 &&
+   git -C server config uploadpack.allowfilter 1 &&
+   git -C repo config core.repositoryformatversion 1 &&
+   git -C repo config extensions.partialclone "origin" &&
+
+   git -C repo cat-file -p $(cat blobhash)
+'
+
+test_expect_success 'fetching of missing trees does not fetch blobs' '
+   rm -rf server repo &&
+   test_create_repo server &&
+   test_commit -C server foo &&
+   git -C server repack -a -d --write-bitmap-index &&
+
+   git clone "file://$(pwd)/server" repo &&
+   git -C repo rev-parse foo^{tree} >treehash &&
+   git hash-object repo/foo.t >blobhash &&
+   rm -rf repo/.git/objects/* &&
+
+   git -C server config uploadpack.allowanysha1inwant 1 &&
+   git -C server config uploadpack.allowfilter 1 &&
+   git -C repo config core.repositoryformatversion 1 &&
+   git -C repo 

[PATCH v2 1/2] fetch-pack: avoid object flags if no_dependents

2018-10-03 Thread Jonathan Tan
When fetch_pack() is invoked as part of another Git command (due to a
lazy fetch from a partial clone, for example), it uses object flags that
may already be used by the outer Git command.

The commit that introduced the lazy fetch feature (88e2f9ed8e
("introduce fetch-object: fetch one promisor object", 2017-12-05)) tried
to avoid this overlap, but it did not avoid it totally. It was
successful in avoiding writing COMPLETE, but did not avoid reading
COMPLETE, and did not avoid writing and reading ALTERNATE.

Ensure that no flags are written or read by fetch_pack() in the case
where it is used to perform a lazy fetch. To do this, it is sufficient
to avoid checking completeness of wanted refs (unnecessary in the case
of lazy fetches), and to avoid negotiation-related work (in the current
implementation, already, no negotiation is performed). After that was
done, the lack of overlap was verified by checking all direct and
indirect usages of COMPLETE and ALTERNATE - that they are read or
written only if no_dependents is false.

There are other possible solutions to this issue:

 (1) Split fetch-pack.{c,h} into a flag-using part and a non-flag-using
 part, and whenever no_dependents is set, only use the
 non-flag-using part.
 (2) Make fetch_pack() be able to be used with arbitrary repository
 objects. fetch_pack() should then create its own repository object
 based on the given repository object, with its own object
 hashtable, so that the flags do not conflict.

(1) is possible but invasive - some functions would need to be split;
and such invasiveness would potentially be unnecessary if we ever were
to need (2) anyway. (2) would be useful if we were to support, say,
submodules that were partial clones themselves, but I don't know when or
if the Git project plans to support those.

Signed-off-by: Jonathan Tan 
---
 fetch-pack.c | 101 ++-
 1 file changed, 59 insertions(+), 42 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index 75047a4b2a..b9b1211dda 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -253,8 +253,10 @@ static int find_common(struct fetch_negotiator *negotiator,
if (args->stateless_rpc && multi_ack == 1)
die(_("--stateless-rpc requires multi_ack_detailed"));
 
-   mark_tips(negotiator, args->negotiation_tips);
-   for_each_cached_alternate(negotiator, insert_one_alternate_object);
+   if (!args->no_dependents) {
+   mark_tips(negotiator, args->negotiation_tips);
+   for_each_cached_alternate(negotiator, 
insert_one_alternate_object);
+   }
 
fetching = 0;
for ( ; refs ; refs = refs->next) {
@@ -271,8 +273,12 @@ static int find_common(struct fetch_negotiator *negotiator,
 * We use lookup_object here because we are only
 * interested in the case we *know* the object is
 * reachable and we have already scanned it.
+*
+* Do this only if args->no_dependents is false (if it is true,
+* we cannot trust the object flags).
 */
-   if (((o = lookup_object(the_repository, remote->hash)) != NULL) 
&&
+   if (!args->no_dependents &&
+   ((o = lookup_object(the_repository, remote->hash)) != NULL) 
&&
(o->flags & COMPLETE)) {
continue;
}
@@ -710,31 +716,29 @@ static void mark_complete_and_common_ref(struct 
fetch_negotiator *negotiator,
 
oidset_clear(_oid_set);
 
-   if (!args->no_dependents) {
-   if (!args->deepen) {
-   for_each_ref(mark_complete_oid, NULL);
-   for_each_cached_alternate(NULL, 
mark_alternate_complete);
-   commit_list_sort_by_date();
-   if (cutoff)
-   mark_recent_complete_commits(args, cutoff);
-   }
+   if (!args->deepen) {
+   for_each_ref(mark_complete_oid, NULL);
+   for_each_cached_alternate(NULL, mark_alternate_complete);
+   commit_list_sort_by_date();
+   if (cutoff)
+   mark_recent_complete_commits(args, cutoff);
+   }
 
-   /*
-* Mark all complete remote refs as common refs.
-* Don't mark them common yet; the server has to be told so 
first.
-*/
-   for (ref = *refs; ref; ref = ref->next) {
-   struct object *o = deref_tag(the_repository,
-
lookup_object(the_repository,
-ref->old_oid.hash),
-NULL, 0);
+   /*
+* Mark all complete remote refs as common refs.
+* Don't mark them common yet; the server has to be told so first.
+ 

Re: [PATCH 1/1] protocol: limit max protocol version per service

2018-10-03 Thread Stefan Beller
On Wed, Oct 3, 2018 at 2:34 PM Josh Steadmon  wrote:

>
> I believe that git-upload-archive can still speak version 1 without any
> trouble, but it at least doesn't break anything in the test suite to
> limit this to v0 either.

ok, let me just play around with archive to follow along:

Running the excerpt that I found in one of the tests in t5000

GIT_TRACE_PACKET=1 git archive --remote=. HEAD >b5.tar

(whoooa ... that spews a lot of text into my terminal, which makes
sense as transported tar files unlike packets starting with PACK are
not cut short; so onwards:)

$ git init test && cd test
$ GIT_TRACE_PACKET=1 git archive --remote=. HEAD >b5.tar
...
git< \2fatal: no such ref: HEAD
...
fatal: sent error to the client: git upload-archive: archiver died
with error
remote: git upload-archive: archiver died with error

This sounds similar to a bug that I have on my todo list for
clone recursing into submodules. Maybe we need to talk
about HEAD-less repositories and how to solve handling them
in general. Usually it doesn't happen except for corner cases like
now, so:

$ git commit --allow-empty -m "commit"
[master (root-commit) 24d7967] commit
$ GIT_TRACE_PACKET=1 git archive --remote=. HEAD >b5.tar
15:28:00.762615 pkt-line.c:80   packet:  git> argument HEAD
15:28:00.762704 pkt-line.c:80   packet:  git> 
15:28:00.766336 pkt-line.c:80   packet:  git> ACK
15:28:00.766428 pkt-line.c:80   packet:  git> 
15:28:00.766483 pkt-line.c:80   packet:  git< ACK
15:28:00.766508 pkt-line.c:80   packet:  git< 
15:28:00.767694 pkt-line.c:80   packet:  git< \2
15:28:00.767524 pkt-line.c:80   packet:  git< argument HEAD
15:28:00.767583 pkt-line.c:80   packet:  git< 
remote: 15:28:00.767524 pkt-line.c:80   packet:  git<
argument HEAD
remote: 15:28:00.767583 pkt-line.c:80   packet:  git< 
15:28:00.768758 pkt-line.c:80   packet:  git> 
15:28:00.770475 pkt-line.c:80   packet:  git<
\1pax_global_header
... \0\0\0\0\0\0\0\0\0\0\0\0\ ...
 # not too bad but the tar file contains a lot of zeros
$

Ah I forgot the crucial part, so

$ GIT_TRACE_PACKET=1 git -c protocol.version=1 archive --remote=.
HEAD >b5.tar
15:33:10.030508 pkt-line.c:80   packet:  git> argument HEAD
...

This pretty much looks like v0 as v1 would send a "version 1", c.f.

$ GIT_TRACE_PACKET=1 git -c protocol.version=1 fetch .
15:34:26.111013 pkt-line.c:80   packet:  upload-pack> version 1
15:34:26.40 pkt-line.c:80   packet:fetch< version 1



>
> Is there a method or design for advertising multiple acceptable versions
> from the client?

I think the client can send multiple versions, looking through protocol.c
(and not the documentation as I should for this:)

  /*
   * Determine which protocol version the client has requested.  Since
   * multiple 'version' keys can be sent by the client, indicating that
   * the client is okay to speak any of them, select the greatest version
   * that the client has requested.  This is due to the assumption that
   * the most recent protocol version will be the most state-of-the-art.
   */
...
const char *git_protocol = getenv(GIT_PROTOCOL_ENVIRONMENT);
string_list_split(, git_protocol, ':', -1);
...
for_each_string_list_item(item, ) {
...
if (skip_prefix(item->string, "version=", ))
...

in determine_protocol_version_server which already had the client
speak to it, so I think at least the server can deal with multiple versions.

But given that we transport the version in env variables, we'd
need to be extra careful if we just did not see the version
in the --remote=. above?

> From my understanding, we can only add a single
> version=X field in the advertisement, but IIUC we can extend this fairly
> easily? Perhaps we can have "version=X" to mean the preferred version,
> and then a repeatable "acceptable_version=Y" field or similar?

Just re-use "version X", separated by colons as above.

> > From a maintenance perspective, do we want to keep
> > this part of the code central, as it ties protocol (as proxied
> > by service name) to the max version number?
> > I would think that we'd rather have the decision local to the
> > code, i.e. builtin/fetch would need to tell protocol.c that it
> > can do (0,1,2) and builtin/push can do (0,1), and then the
> > networking layers of code would figure out by the input
> > from the caller and the input from the user (configured
> > protocol.version) what is the best to go forward from
> > then on.
>
> I like having it centralized, because enforcing this in git_connect()
> and discover_refs() catches all the outgoing version advertisements, but
> there's lots of code paths that lead to those two functions that 

Re: [BUG] A part of an edge from an octopus merge gets colored, even with --color=never

2018-10-03 Thread Noam Postavsky
On Wed, 3 Oct 2018 at 00:24, Jeff King  wrote:

> Yeah, I really like your explanations/diagrams in the comments. It makes
> the logic very clear.

Ok good, I did have the feeling that the logic became actually clearer
to me after I wrestled with the test code, so I think this means I
didn't just imagine that. :)

> (there's also only one caller of this
> function, so it could be inlined; note that it's OK to use "return" in a
> test_expect block).

Oh, I think had run into some trouble with the test runner complaining
about a broken &&-chain, but it seems to work fine now (perhaps I
missing the && somewhere else that I fixed later).

> Why do we need the tag name to be different?

Otherwise the 'git checkout' command complains about an ambiguous ref
(added that to the comment).

> >   git checkout 1 -b merge &&
>
> This is assuming we just made a branch called "1", but that's one of the
> arguments. Probably this should be "$1" (or the whole thing should just
> be inlined so it is clear that what the set of parameters we expect is).

Oops, right. I've inlined it.

> It might actually be worth setting up the uncolored expect file as part
> of this, since it so neatly diagrams the graph you're trying to produce.
>
> I.e., something like (completely untested; note that the leading
> indentation is all tabs, which will be removed by the "<<-" operator):

Yup, works (I again had run into some problems with &&-chaining
earlier, but now it works fine)

> > * left
> > | 
> > *---.   
> > octopus-merge
[...]
>
> Yikes. :) This one is pretty hard to read. I'm not sure if there's a
> good alternative. If you pipe the output of test_decode through
> this:
>
>   sed '
> s/./R/g;
[...]
> you get this:
>
>   * left
>   R *BBMM   octopus-merge
>   R RY B M
[...]
> which is admittedly pretty horrible, too, but at least resembles a
> graph. I dunno.

Yeah, but it's lossy, so it doesn't seem usable for the test. Maybe
doubling up some characters?

**  left
R|  **B-B-M-M.  octopus-merge
R|  R|Y\  B\  M\
R|R/  Y/  B/  M/
R|  Y|  B|  **  4
R|  Y|  **  M|  3
R|  Y|  M|M/
R|  **  M|  2
R|  M|M/
**  M|  1
M|M/
**  initial

> I'm also not thrilled that we depend on the exact sequence of default
> colors, but I suspect it's not the first time. And it wouldn't be too
> hard to update it if that default changes.

Well, it's easy enough to set the colors explicitly. After doing this
I noticed that green seems to be skipped. Not sure if that's a bug or
not.

> Try not to put "git" on the left-hand side of a pipe, since it means
> we'll miss its exit code

Ok.

> Leftover "debug" cruft?
>
> The same pipe comment applies as above.
>
> > test_done
> > test_done
>
> Two dones; we exit after the first one (so everything after this is
> ignored).

Oops, yeah, this script was still a bit of a rough draft.

> I think it's OK to have a dedicated script for even these two tests, if
> it makes things easier to read. However, would we also want to test the
> octopus without the problematic graph here? I think if we just omit
> "left" we get that, don't we?

t4202-log.sh already does test a "normal" octopus merge (starting
around line 615, search for "octopus-a"). But that is only a 3-parent
merge. And adding another test is easy enough.
From cd9415b524357c2c8b9b20a63032c94e01d46a15 Mon Sep 17 00:00:00 2001
From: Noam Postavsky 
Date: Sat, 1 Sep 2018 20:07:16 -0400
Subject: [PATCH v5] log: Fix coloring of certain octupus merge shapes

For octopus merges where the first parent edge immediately merges into
the next column to the left:

| *-.
| |\ \
|/ / /

then the number of columns should be one less than the usual case:

| *-.
| |\ \
| | | *

Also refactor the code to iterate over columns rather than dashes,
building from an initial patch suggestion by Jeff King.

Signed-off-by: Noam Postavsky 
---
 graph.c  |  56 +---
 t/t4214-log-graph-octopus.sh | 102 +++
 2 files changed, 143 insertions(+), 15 deletions(-)
 create mode 100755 t/t4214-log-graph-octopus.sh

diff --git a/graph.c b/graph.c
index e1f6d3bdd..a3366f6da 100644
--- a/graph.c
+++ b/graph.c
@@ -842,27 +842,53 @@ static void graph_output_commit_char(struct git_graph *graph, struct strbuf *sb)
 }
 
 /*
- * Draw an octopus merge and return the number of characters written.
+ * Draw the horizontal dashes of an octopus merge and return the number of
+ * characters written.
  */
 static int graph_draw_octopus_merge(struct git_graph *graph,
 struct strbuf *sb)
 {
 	/*
-	 * Here dashless_commits represents the number of parents
-	 * which don't need to have dashes (because their edges fit
-	 * neatly under the commit).
-	 */
-	const int dashless_commits = 2;
-	int col_num, i;
-	int num_dashes =
-		((graph->num_parents - dashless_commits) * 2) - 1;
-	for (i = 0; i < num_dashes; i++) {
-		col_num = (i / 2) + dashless_commits + graph->commit_index;
-		strbuf_write_column(sb, 

Re: Fwd: Git credentials not working

2018-10-03 Thread Dimitri Kopriwa

Thanks everyone.

All your answers helped. I found out that the issue was not related to git.

I am using semantic-release to perform a release, apparently 
git-credentials is not working with semantic-release.


I did also setup the double authentication and every fix applied on 
git-credentials were simply useless.


Read more here : 
https://github.com/semantic-release/semantic-release/issues/941#issuecomment-426691824


Thanks a lot for your help and git is the best software ever made thanks!

Dimitri Kopriwa


On 10/4/18 3:43 AM, Jeff King wrote:

On Thu, Oct 04, 2018 at 02:34:17AM +0700, Dimitri Kopriwa wrote:


I have replaced the way I fill the git credentials store, I have verify
~/.git-credentials and information are there, the ~/.gitconfig look fine
too.

I still have 401 error when reading from that file.

This is the paste log : https://paste.gnome.org/pmntlkdw0

Now that I use git approve, I dont think that I need a custom helper.

Any idea why I still can't log in using git-credential?

Looking at your pastebin, it looks like the server sometimes takes it
and sometimes not. E.g., piping the log through:

   egrep '(Send|Recv) header:' |
   perl -lpe 's/^.*?(=>|<=) //'

I see:

   Send header: GET 
/example-keys/sample-project.git/info/refs?service=git-upload-pack HTTP/1.1
   Send header: User-Agent: git/2.19.0
   ...
   Recv header: HTTP/1.1 401 Unauthorized
   Recv header: WWW-Authenticate: Basic realm="GitLab"
   ...
   Send header: GET 
/example-keys/sample-project.git/info/refs?service=git-upload-pack HTTP/1.1
   Send header: Authorization: Basic 
   Send header: User-Agent: git/2.19.0
   ...
   Recv header: HTTP/1.1 200 OK

So that works. But then later we get:

   Send header: GET 
/example-keys/sample-project.git/info/refs?service=git-upload-pack HTTP/1.1
   Send header: User-Agent: git/2.19.0
   ...
   Recv header: HTTP/1.1 401 Unauthorized
   Recv header: WWW-Authenticate: Basic realm="GitLab"
   ...
   Send header: GET 
/example-keys/sample-project.git/info/refs?service=git-upload-pack HTTP/1.1
   Send header: Authorization: Basic 
   Send header: User-Agent: git/2.19.0
   ...
   Recv header: HTTP/1.1 401 Unauthorized

And then that causes credential-store to delete the non-working entry,
after which all of them must fail (because you have no working
credential, and presumably no terminal to prompt the user).

I have no idea why the same request would sometimes be allowed and
sometimes not. It's possible the  data is different in those
two times, but I don't know why that would be. It's also possible you're
hitting different load-balancing servers that behave differently.

-Peff


Re: [PATCH 1/1] protocol: limit max protocol version per service

2018-10-03 Thread Josh Steadmon
On 2018.10.02 15:28, Stefan Beller wrote:
> On Tue, Oct 2, 2018 at 3:00 PM Josh Steadmon  wrote:
> >
> > For services other than git-receive-pack, clients currently advertise
> > that they support the version set in the protocol.version config,
> > regardless of whether or not there is actually an implementation of that
> > service for the given protocol version. This causes backwards-
> > compatibility problems when a new implementation for the given
> > protocol version is added.
> >
> > This patch sets maximum allowed protocol versions for git-receive-pack,
> > git-upload-archive, and git-upload-pack.
> >
> > Previously, git-receive-pack would downgrade from v2 to v0, but would
> > allow v1 if set in protocol.version. Now, it will downgrade from v2 to
> > v1.
> 
> But does git-receive-pack understand v1?
> As to my understanding we have not even defined v1
> for push (receive-pack) and archive --remote (upload-archive).
> v1 is only known to fetch (upload-pack).
> 
> > +enum protocol_version determine_maximum_protocol_version(
> > +   const char *service, enum protocol_version default_version)
> > +{
> > +   if (!strcmp(service, "git-receive-pack"))
> > +   return protocol_v1;
> > +   else if (!strcmp(service, "git-upload-archive"))
> > +   return protocol_v1;
> 
> so I would think these two would be _v0.
> ... goes and checks ...
> aa9bab29b8 (upload-pack, receive-pack: introduce protocol version 1,
> 2017-10-16) seems to actually teach v1 to receive-pack as well,
> but upload-archive was completely off radar, so I think returning
> (v1, v0, v2 in the order as in the code) would make sense?

I believe that git-upload-archive can still speak version 1 without any
trouble, but it at least doesn't break anything in the test suite to
limit this to v0 either.


> Asides from this, I thought there was a deliberate decision
> that we'd want to avoid a strict order on the protocol versions,
> but I could not find prior discussion on list to back up this claim. :/
> 
> For example we'd go with e.g. enums instead of integers
> for version numbers, as then some internal setup could
> also have things like protocol_v2018-10-02 or protocol_vWhatever;
> some protocol version may be advantageous to the client, some to
> the server, and we'd need to negotiate the best version that both
> are happy with. (e.g. the server may like version 0, 2 and 3, and
> the client may like 0,2,4 as 3 is bad security wise for the client,
> so both would negotiate to 2 as their best case)

Is there a method or design for advertising multiple acceptable versions
from the client? From my understanding, we can only add a single
version=X field in the advertisement, but IIUC we can extend this fairly
easily? Perhaps we can have "version=X" to mean the preferred version,
and then a repeatable "acceptable_version=Y" field or similar?


> From a maintenance perspective, do we want to keep
> this part of the code central, as it ties protocol (as proxied
> by service name) to the max version number?
> I would think that we'd rather have the decision local to the
> code, i.e. builtin/fetch would need to tell protocol.c that it
> can do (0,1,2) and builtin/push can do (0,1), and then the
> networking layers of code would figure out by the input
> from the caller and the input from the user (configured
> protocol.version) what is the best to go forward from
> then on.

I like having it centralized, because enforcing this in git_connect()
and discover_refs() catches all the outgoing version advertisements, but
there's lots of code paths that lead to those two functions that would
all have to have the acceptable version numbers plumbed through.

I suppose we could also have a registry of services to version numbers,
but I tend to dislike non-local sources of data. But if the list likes
that approach better, I'll be happy to implement it.


> But I guess having the central place here is not to
> bad either. How will it cope with the desire of protocol v2
> to have only one end point (c.f. serve.{c,h} via builtin/serve
> as "git serve") ?

I'm not sure about this. In my series to add a v2 archive command, I
added support for a new endpoint for proto v2 and I don't recall seeing
any complaints, but that is still open for review.

I suppose if we are strict about serving from a single endpoint, the
version registry makes more sense, and individual operations can declare
acceptable version numbers before calling any network code?



Thanks for the review,
Josh


[PATCH] doc: fix a typo and clarify a sentence

2018-10-03 Thread Mihir Mehta
I noticed that git-merge-base was unlikely to actually be a git command,
and tried it in my shell. Seeing that it doesn't work, I cleaned up two
places in the docs where it appears.

Signed-off-by: Mihir Mehta 
---
 Documentation/git-diff.txt  | 5 +++--
 Documentation/howto/update-hook-example.txt | 2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-diff.txt b/Documentation/git-diff.txt
index b180f1fa5..6173f569e 100644
--- a/Documentation/git-diff.txt
+++ b/Documentation/git-diff.txt
@@ -72,8 +72,9 @@ two blob objects, or changes between two files on disk.
This form is to view the changes on the branch containing
and up to the second , starting at a common ancestor
of both .  "git diff A\...B" is equivalent to
-   "git diff $(git-merge-base A B) B".  You can omit any one
-   of , which has the same effect as using HEAD instead.
+   "git diff $(git merge-base A B) B".  You can omit any one
+   of the two instances of , which has the same effect as
+   using HEAD in its place.
 
 Just in case if you are doing something exotic, it should be
 noted that all of the  in the above description, except
diff --git a/Documentation/howto/update-hook-example.txt 
b/Documentation/howto/update-hook-example.txt
index a5193b1e5..89821ec74 100644
--- a/Documentation/howto/update-hook-example.txt
+++ b/Documentation/howto/update-hook-example.txt
@@ -80,7 +80,7 @@ case "$1" in
   info "The branch '$1' is new..."
 else
   # updating -- make sure it is a fast-forward
-  mb=$(git-merge-base "$2" "$3")
+  mb=$(git merge-base "$2" "$3")
   case "$mb,$2" in
 "$2,$mb") info "Update is fast-forward" ;;
*)noff=y; info "This is not a fast-forward update.";;
-- 
2.19.0



Re: Fwd: Git credentials not working

2018-10-03 Thread Jeff King
On Thu, Oct 04, 2018 at 02:34:17AM +0700, Dimitri Kopriwa wrote:

> I have replaced the way I fill the git credentials store, I have verify
> ~/.git-credentials and information are there, the ~/.gitconfig look fine
> too.
> 
> I still have 401 error when reading from that file.
> 
> This is the paste log : https://paste.gnome.org/pmntlkdw0
> 
> Now that I use git approve, I dont think that I need a custom helper.
> 
> Any idea why I still can't log in using git-credential?

Looking at your pastebin, it looks like the server sometimes takes it
and sometimes not. E.g., piping the log through:

  egrep '(Send|Recv) header:' |
  perl -lpe 's/^.*?(=>|<=) //'

I see:

  Send header: GET 
/example-keys/sample-project.git/info/refs?service=git-upload-pack HTTP/1.1
  Send header: User-Agent: git/2.19.0
  ...
  Recv header: HTTP/1.1 401 Unauthorized
  Recv header: WWW-Authenticate: Basic realm="GitLab"
  ...
  Send header: GET 
/example-keys/sample-project.git/info/refs?service=git-upload-pack HTTP/1.1
  Send header: Authorization: Basic 
  Send header: User-Agent: git/2.19.0
  ...
  Recv header: HTTP/1.1 200 OK

So that works. But then later we get:

  Send header: GET 
/example-keys/sample-project.git/info/refs?service=git-upload-pack HTTP/1.1
  Send header: User-Agent: git/2.19.0
  ...
  Recv header: HTTP/1.1 401 Unauthorized
  Recv header: WWW-Authenticate: Basic realm="GitLab"
  ...
  Send header: GET 
/example-keys/sample-project.git/info/refs?service=git-upload-pack HTTP/1.1
  Send header: Authorization: Basic 
  Send header: User-Agent: git/2.19.0
  ...
  Recv header: HTTP/1.1 401 Unauthorized

And then that causes credential-store to delete the non-working entry,
after which all of them must fail (because you have no working
credential, and presumably no terminal to prompt the user).

I have no idea why the same request would sometimes be allowed and
sometimes not. It's possible the  data is different in those
two times, but I don't know why that would be. It's also possible you're
hitting different load-balancing servers that behave differently.

-Peff


Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-03 Thread Ævar Arnfjörð Bjarmason


On Wed, Oct 03 2018, Jeff King wrote:

> On Wed, Oct 03, 2018 at 12:08:15PM -0700, Stefan Beller wrote:
>
>> I share these concerns in a slightly more abstract way, as
>> I would bucket the actions into two separate bins:
>>
>> One bin that throws away information.
>> this would include removing expired reflog entries (which
>> I do not think are garbage, or collection thereof), but their
>> usefulness is questionable.
>>
>> The other bin would be actions that optimize but
>> do not throw away any information, repacking (without
>> dropping files) would be part of it, or the new
>> "write additional files".
>>
>> Maybe we can move all actions of the second bin into a new
>> "git optimize" command, and git gc would do first the "throw away
>> things" and then the optimize action, whereas clone would only
>> go for the second optimizing part?
>
> One problem with that world-view is that some of the operations do
> _both_, for efficiency. E.g., repacking will drop unreachable objects in
> too-old packs. We could actually be more aggressive in combining things
> here. For instance, a full object graph walk in linux.git takes 30-60
> seconds, depending on your CPU. But we do it at least twice during a gc:
> once to repack, and then again to determine reachability for pruning.
>
> If you generate bitmaps during the repack step, you can use them during
> the prune step. But by itself, the cost of generating the bitmaps
> generally outweighs the extra walk. So it's not worth generating them
> _just_ for this (but is an obvious optimization for a server which would
> be generating them anyway).

I don't mean to fan the flames of this obviously controversial "git gc
does optimization" topic (which I didn't suspect there would be a debate
about...), but a related thing I was wondering about the other day is
whether we could have a gc.fsck option, and in the background do fsck
while we were at it, and report this back via some facility like
gc.log[1].

That would also fall into this category of more work we could do while
we're doing a full walk anyway, but as with what you're suggesting would
require some refactoring.

1. Well, one that doesn't suck, see
   https://public-inbox.org/git/87inc89j38@evledraar.gmail.com/ /
   https://public-inbox.org/git/87d0vmck55@evledraar.gmail.com/ etc.


Re: [PATCH] sequencer: use return value of oidset_insert()

2018-10-03 Thread Johannes Schindelin
Hi René,

On Wed, 3 Oct 2018, René Scharfe wrote:

> oidset_insert() returns 1 if the object ID is already in the set and
> doesn't add it again, or 0 if it hadn't been present.  Make use of that
> fact instead of checking with an extra oidset_contains() call.
> 
> Signed-off-by: Rene Scharfe 
> ---

ACK!

Thanks,
Dscho

>  sequencer.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/sequencer.c b/sequencer.c
> index ddb41a62d9..6387c9ee6e 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -4146,9 +4146,7 @@ static int make_script_with_merges(struct 
> pretty_print_context *pp,
>   struct object_id *oid = >item->object.oid;
>   if (!oidset_contains(, oid))
>   continue;
> - if (!oidset_contains(_seen, oid))
> - oidset_insert(_seen, oid);
> - else
> + if (oidset_insert(_seen, oid))
>   label_oid(oid, "branch-point", );
>   }
>  
> -- 
> 2.19.0
> 

Re: Fwd: Git credentials not working

2018-10-03 Thread Bryan Turner
On Wed, Oct 3, 2018 at 12:34 PM Dimitri Kopriwa  wrote:
>
> I have replaced the way I fill the git credentials store, I have verify
> ~/.git-credentials and information are there, the ~/.gitconfig look fine
> too.
>
> I still have 401 error when reading from that file.
>
> This is the paste log : https://paste.gnome.org/pmntlkdw0
>
> Now that I use git approve, I dont think that I need a custom helper.
>
> Any idea why I still can't log in using git-credential?

I'm pretty sure Peff touched on this in his reply. When it works,
you're either sending a "Private-Token" header or including it in the
URL, but, as Peff said, Git will never do either of those things. It
sends an "Authorization" header, and, based on their documentation, it
doesn't appear Gitlab accepts access tokens in that header.

It looks like you're either going to need to include it in the URL
(like what happens earlier in the posted trace), or adjust your git
config with a "http.extraHeader" set to "Private-Token: " to
include the "Private-Token" header (or you could pass it on the
command line, like `git -c http.extraHeader="Private-Token: "
clone ...`.

Hope this helps!
Bryan

>
> Thanks in advance,
>
> On 10/4/18 1:24 AM, Jeff King wrote:
> > On Thu, Oct 04, 2018 at 01:12:11AM +0700, Dimitri Kopriwa wrote:
> >
> >> Thanks for your reply. I have activated GIT_TRACE_CURL=1 and I can see that
> >> the request is failing 401.
> >>
> >> I can't see which token is used and using what header ?
> >>
> >> The log say:
> >>
> >> 17:50:26.414654 http.c:657  => Send header: Authorization: 
> >> Basic 
> > Yeah, we redact the auth information so people don't accidentally share
> > it publicly. If you use the older GIT_CURL_VERBOSE=1, it will include
> > the credential (I think it may be base64 encoded, though, so you'll have
> > to decipher it).
> >
> >> I have retested the token locally and it work when used in the url or using
> >> `Private-Token: ` as stated in the Gitlab documentation
> >> https://docs.gitlab.com/ee/api/README.html#personal-access-tokens
> > I don't think Git will ever send your token in either of those ways. It
> > will always some as an Authorization header.
> >
> >> Peff, what would be the appropriate way to input my git credential in a 
> >> 100%
> >> success way in a CI?
> > I don't know the details of what GitLab would want, but...
> >
> >> Is this good:
> >>
> >> git credential approve < >> protocol=https
> >> host=example.com
> >> username=bob
> >> password=secr3t
> >> OEF
> > Yes, that would work to preload a token into any configured helpers.
> >
> > -Peff


[PATCH v10 8/8] list-objects-filter: implement filter tree:0

2018-10-03 Thread Matthew DeVore
Teach list-objects the "tree:0" filter which allows for filtering
out all tree and blob objects (unless other objects are explicitly
specified by the user). The purpose of this patch is to allow smaller
partial clones.

The name of this filter - tree:0 - does not explicitly specify that
it also filters out all blobs, but this should not cause much confusion
because blobs are not at all useful without the trees that refer to
them.

I also considered only:commits as a name, but this is inaccurate because
it suggests that annotated tags are omitted, but actually they are
included.

The name "tree:0" allows later filtering based on depth, i.e. "tree:1"
would filter out all but the root tree and blobs. In order to avoid
confusion between 0 and capital O, the documentation was worded in a
somewhat round-about way that also hints at this future improvement to
the feature.

Signed-off-by: Matthew DeVore 
---
 Documentation/rev-list-options.txt |  5 +++
 list-objects-filter-options.c  | 13 +++
 list-objects-filter-options.h  |  1 +
 list-objects-filter.c  | 49 ++
 t/t5317-pack-objects-filter-objects.sh | 28 +++
 t/t5616-partial-clone.sh   | 42 ++
 t/t6112-rev-list-filters-objects.sh| 14 
 7 files changed, 152 insertions(+)

diff --git a/Documentation/rev-list-options.txt 
b/Documentation/rev-list-options.txt
index 7b273635d..5f1672913 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -731,6 +731,11 @@ the requested refs.
 +
 The form '--filter=sparse:path=' similarly uses a sparse-checkout
 specification contained in .
++
+The form '--filter=tree:' omits all blobs and trees whose depth
+from the root tree is >=  (minimum depth if an object is located
+at multiple depths in the commits traversed). Currently, only =0
+is supported, which omits all blobs and trees.
 
 --no-filter::
Turn off any previous `--filter=` argument.
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index d259bdb2c..e8da2e858 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -49,6 +49,19 @@ static int gently_parse_list_objects_filter(
return 0;
}
 
+   } else if (skip_prefix(arg, "tree:", )) {
+   unsigned long depth;
+   if (!git_parse_ulong(v0, ) || depth != 0) {
+   if (errbuf) {
+   strbuf_addstr(
+   errbuf,
+   _("only 'tree:0' is supported"));
+   }
+   return 1;
+   }
+   filter_options->choice = LOFC_TREE_NONE;
+   return 0;
+
} else if (skip_prefix(arg, "sparse:oid=", )) {
struct object_context oc;
struct object_id sparse_oid;
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index a61f8..af64e5c66 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -10,6 +10,7 @@ enum list_objects_filter_choice {
LOFC_DISABLED = 0,
LOFC_BLOB_NONE,
LOFC_BLOB_LIMIT,
+   LOFC_TREE_NONE,
LOFC_SPARSE_OID,
LOFC_SPARSE_PATH,
LOFC__COUNT /* must be last */
diff --git a/list-objects-filter.c b/list-objects-filter.c
index 5f8b1a002..09b2b05d5 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -79,6 +79,54 @@ static void *filter_blobs_none__init(
return d;
 }
 
+/*
+ * A filter for list-objects to omit ALL trees and blobs from the traversal.
+ * Can OPTIONALLY collect a list of the omitted OIDs.
+ */
+struct filter_trees_none_data {
+   struct oidset *omits;
+};
+
+static enum list_objects_filter_result filter_trees_none(
+   enum list_objects_filter_situation filter_situation,
+   struct object *obj,
+   const char *pathname,
+   const char *filename,
+   void *filter_data_)
+{
+   struct filter_trees_none_data *filter_data = filter_data_;
+
+   switch (filter_situation) {
+   default:
+   BUG("unknown filter_situation: %d", filter_situation);
+
+   case LOFS_BEGIN_TREE:
+   case LOFS_BLOB:
+   if (filter_data->omits)
+   oidset_insert(filter_data->omits, >oid);
+   return LOFR_MARK_SEEN; /* but not LOFR_DO_SHOW (hard omit) */
+
+   case LOFS_END_TREE:
+   assert(obj->type == OBJ_TREE);
+   return LOFR_ZERO;
+
+   }
+}
+
+static void* filter_trees_none__init(
+   struct oidset *omitted,
+   struct list_objects_filter_options *filter_options,
+   filter_object_fn *filter_fn,
+   filter_free_fn *filter_free_fn)
+{
+   struct filter_trees_none_data *d = xcalloc(1, sizeof(*d));
+   d->omits = omitted;
+
+   *filter_fn = filter_trees_none;
+   

[PATCH v10 6/8] list-objects-filter: use BUG rather than die

2018-10-03 Thread Matthew DeVore
In some cases in this file, BUG makes more sense than die. In such
cases, a we get there from a coding error rather than a user error.

'return' has been removed following some instances of BUG since BUG does
not return.

Signed-off-by: Matthew DeVore 
---
 list-objects-filter.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/list-objects-filter.c b/list-objects-filter.c
index a0ba78b20..5f8b1a002 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -44,8 +44,7 @@ static enum list_objects_filter_result filter_blobs_none(
 
switch (filter_situation) {
default:
-   die("unknown filter_situation");
-   return LOFR_ZERO;
+   BUG("unknown filter_situation: %d", filter_situation);
 
case LOFS_BEGIN_TREE:
assert(obj->type == OBJ_TREE);
@@ -102,8 +101,7 @@ static enum list_objects_filter_result filter_blobs_limit(
 
switch (filter_situation) {
default:
-   die("unknown filter_situation");
-   return LOFR_ZERO;
+   BUG("unknown filter_situation: %d", filter_situation);
 
case LOFS_BEGIN_TREE:
assert(obj->type == OBJ_TREE);
@@ -208,8 +206,7 @@ static enum list_objects_filter_result filter_sparse(
 
switch (filter_situation) {
default:
-   die("unknown filter_situation");
-   return LOFR_ZERO;
+   BUG("unknown filter_situation: %d", filter_situation);
 
case LOFS_BEGIN_TREE:
assert(obj->type == OBJ_TREE);
@@ -389,7 +386,7 @@ void *list_objects_filter__init(
assert((sizeof(s_filters) / sizeof(s_filters[0])) == LOFC__COUNT);
 
if (filter_options->choice >= LOFC__COUNT)
-   die("invalid list-objects filter choice: %d",
+   BUG("invalid list-objects filter choice: %d",
filter_options->choice);
 
init_fn = s_filters[filter_options->choice];
-- 
2.19.0.605.g01d371f741-goog



[PATCH v10 5/8] revision: mark non-user-given objects instead

2018-10-03 Thread Matthew DeVore
Currently, list-objects.c incorrectly treats all root trees of commits
as USER_GIVEN. Also, it would be easier to mark objects that are
non-user-given instead of user-given, since the places in the code
where we access an object through a reference are more obvious than
the places where we access an object that was given by the user.

Resolve these two problems by introducing a flag NOT_USER_GIVEN that
marks blobs and trees that are non-user-given, replacing USER_GIVEN.
(Only blobs and trees are marked because this mark is only used when
filtering objects, and filtering of other types of objects is not
supported yet.)

This fixes a bug in that git rev-list behaved differently from git
pack-objects. pack-objects would *not* filter objects given explicitly
on the command line and rev-list would filter. This was because the two
commands used a different function to add objects to the rev_info
struct. This seems to have been an oversight, and pack-objects has the
correct behavior, so I added a test to make sure that rev-list now
behaves properly.

Signed-off-by: Matthew DeVore 
---
 list-objects.c  | 31 +
 revision.c  |  1 -
 revision.h  | 11 --
 t/t6112-rev-list-filters-objects.sh | 12 +++
 4 files changed, 39 insertions(+), 16 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index 243192af5..7a1a0929d 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -53,7 +53,7 @@ static void process_blob(struct traversal_context *ctx,
 
pathlen = path->len;
strbuf_addstr(path, name);
-   if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+   if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
r = ctx->filter_fn(LOFS_BLOB, obj,
   path->buf, >buf[pathlen],
   ctx->filter_data);
@@ -120,17 +120,19 @@ static void process_tree_contents(struct 
traversal_context *ctx,
continue;
}
 
-   if (S_ISDIR(entry.mode))
-   process_tree(ctx,
-lookup_tree(the_repository, entry.oid),
-base, entry.path);
+   if (S_ISDIR(entry.mode)) {
+   struct tree *t = lookup_tree(the_repository, entry.oid);
+   t->object.flags |= NOT_USER_GIVEN;
+   process_tree(ctx, t, base, entry.path);
+   }
else if (S_ISGITLINK(entry.mode))
process_gitlink(ctx, entry.oid->hash,
base, entry.path);
-   else
-   process_blob(ctx,
-lookup_blob(the_repository, entry.oid),
-base, entry.path);
+   else {
+   struct blob *b = lookup_blob(the_repository, entry.oid);
+   b->object.flags |= NOT_USER_GIVEN;
+   process_blob(ctx, b, base, entry.path);
+   }
}
 }
 
@@ -171,7 +173,7 @@ static void process_tree(struct traversal_context *ctx,
}
 
strbuf_addstr(base, name);
-   if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+   if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
r = ctx->filter_fn(LOFS_BEGIN_TREE, obj,
   base->buf, >buf[baselen],
   ctx->filter_data);
@@ -185,7 +187,7 @@ static void process_tree(struct traversal_context *ctx,
if (!failed_parse)
process_tree_contents(ctx, tree, base);
 
-   if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
+   if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn) {
r = ctx->filter_fn(LOFS_END_TREE, obj,
   base->buf, >buf[baselen],
   ctx->filter_data);
@@ -301,8 +303,11 @@ static void do_traverse(struct traversal_context *ctx)
 * an uninteresting boundary commit may not have its tree
 * parsed yet, but we are not going to show them anyway
 */
-   if (get_commit_tree(commit))
-   add_pending_tree(ctx->revs, get_commit_tree(commit));
+   if (get_commit_tree(commit)) {
+   struct tree *tree = get_commit_tree(commit);
+   tree->object.flags |= NOT_USER_GIVEN;
+   add_pending_tree(ctx->revs, tree);
+   }
ctx->show_commit(commit, ctx->show_data);
 
if (ctx->revs->tree_blobs_in_commit_order)
diff --git a/revision.c b/revision.c
index de4dce600..72d48a17f 100644
--- a/revision.c
+++ b/revision.c
@@ -175,7 +175,6 @@ static void add_pending_object_with_path(struct rev_info 
*revs,

[PATCH v10 4/8] rev-list: handle missing tree objects properly

2018-10-03 Thread Matthew DeVore
Previously, we assumed only blob objects could be missing. This patch
makes rev-list handle missing trees like missing blobs. The --missing=*
and --exclude-promisor-objects flags now work for trees as they already
do for blobs. This is demonstrated in t6112.

Signed-off-by: Matthew DeVore 
---
 builtin/rev-list.c | 11 ---
 list-objects.c | 11 +--
 revision.h | 15 +
 t/t0410-partial-clone.sh   | 45 ++
 t/t5317-pack-objects-filter-objects.sh | 13 
 t/t6112-rev-list-filters-objects.sh| 17 ++
 6 files changed, 105 insertions(+), 7 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 5b07f3f4a..49d6deed7 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -6,6 +6,7 @@
 #include "list-objects.h"
 #include "list-objects-filter.h"
 #include "list-objects-filter-options.h"
+#include "object.h"
 #include "object-store.h"
 #include "pack.h"
 #include "pack-bitmap.h"
@@ -209,7 +210,8 @@ static inline void finish_object__ma(struct object *obj)
 */
switch (arg_missing_action) {
case MA_ERROR:
-   die("missing blob object '%s'", oid_to_hex(>oid));
+   die("missing %s object '%s'",
+   type_name(obj->type), oid_to_hex(>oid));
return;
 
case MA_ALLOW_ANY:
@@ -222,8 +224,8 @@ static inline void finish_object__ma(struct object *obj)
case MA_ALLOW_PROMISOR:
if (is_promisor_object(>oid))
return;
-   die("unexpected missing blob object '%s'",
-   oid_to_hex(>oid));
+   die("unexpected missing %s object '%s'",
+   type_name(obj->type), oid_to_hex(>oid));
return;
 
default:
@@ -235,7 +237,7 @@ static inline void finish_object__ma(struct object *obj)
 static int finish_object(struct object *obj, const char *name, void *cb_data)
 {
struct rev_list_info *info = cb_data;
-   if (obj->type == OBJ_BLOB && !has_object_file(>oid)) {
+   if (!has_object_file(>oid)) {
finish_object__ma(obj);
return 1;
}
@@ -373,6 +375,7 @@ int cmd_rev_list(int argc, const char **argv, const char 
*prefix)
init_revisions(, prefix);
revs.abbrev = DEFAULT_ABBREV;
revs.commit_format = CMIT_FMT_UNSPECIFIED;
+   revs.do_not_die_on_missing_tree = 1;
 
/*
 * Scan the argument list before invoking setup_revisions(), so that we
diff --git a/list-objects.c b/list-objects.c
index f9b51db7a..243192af5 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -143,6 +143,7 @@ static void process_tree(struct traversal_context *ctx,
struct rev_info *revs = ctx->revs;
int baselen = base->len;
enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
+   int failed_parse;
 
if (!revs->tree_objects)
return;
@@ -150,7 +151,9 @@ static void process_tree(struct traversal_context *ctx,
die("bad tree object");
if (obj->flags & (UNINTERESTING | SEEN))
return;
-   if (parse_tree_gently(tree, 1) < 0) {
+
+   failed_parse = parse_tree_gently(tree, 1);
+   if (failed_parse) {
if (revs->ignore_missing_links)
return;
 
@@ -163,7 +166,8 @@ static void process_tree(struct traversal_context *ctx,
is_promisor_object(>oid))
return;
 
-   die("bad tree object %s", oid_to_hex(>oid));
+   if (!revs->do_not_die_on_missing_tree)
+   die("bad tree object %s", oid_to_hex(>oid));
}
 
strbuf_addstr(base, name);
@@ -178,7 +182,8 @@ static void process_tree(struct traversal_context *ctx,
if (base->len)
strbuf_addch(base, '/');
 
-   process_tree_contents(ctx, tree, base);
+   if (!failed_parse)
+   process_tree_contents(ctx, tree, base);
 
if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
r = ctx->filter_fn(LOFS_END_TREE, obj,
diff --git a/revision.h b/revision.h
index 007278cc1..5910613cb 100644
--- a/revision.h
+++ b/revision.h
@@ -126,6 +126,21 @@ struct rev_info {
line_level_traverse:1,
tree_blobs_in_commit_order:1,
 
+   /*
+* Blobs are shown without regard for their existence.
+* But not so for trees: unless exclude_promisor_objects
+* is set and the tree in question is a promisor object;
+* OR ignore_missing_links is set, the revision walker
+* dies with a "bad tree object HASH" message when
+* encountering a missing tree. For callers that can
+* handle missing trees 

[PATCH v10 7/8] list-objects-filter-options: do not over-strbuf_init

2018-10-03 Thread Matthew DeVore
The function gently_parse_list_objects_filter is either called with
errbuf=STRBUF_INIT or errbuf=NULL, but that function calls strbuf_init
when errbuf is not NULL. strbuf_init is only necessary if errbuf
contains garbage, and risks a memory leak if errbuf already has a
non-STRBUF_INIT state. It should be the caller's responsibility to make
sure errbuf is not garbage, since garbage content is easily avoidable
with STRBUF_INIT.

Signed-off-by: Matthew DeVore 
---
 list-objects-filter-options.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index c0e2bd6a0..d259bdb2c 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -30,7 +30,6 @@ static int gently_parse_list_objects_filter(
 
if (filter_options->choice) {
if (errbuf) {
-   strbuf_init(errbuf, 0);
strbuf_addstr(
errbuf,
_("multiple filter-specs cannot be combined"));
@@ -71,10 +70,9 @@ static int gently_parse_list_objects_filter(
return 0;
}
 
-   if (errbuf) {
-   strbuf_init(errbuf, 0);
+   if (errbuf)
strbuf_addf(errbuf, "invalid filter-spec '%s'", arg);
-   }
+
memset(filter_options, 0, sizeof(*filter_options));
return 1;
 }
-- 
2.19.0.605.g01d371f741-goog



[PATCH v10 3/8] list-objects: always parse trees gently

2018-10-03 Thread Matthew DeVore
If parsing fails when revs->ignore_missing_links and
revs->exclude_promisor_objects are both false, we print the OID anyway
in the die("bad tree object...") call, so any message printed by
parse_tree_gently() is superfluous.

Signed-off-by: Matthew DeVore 
---
 list-objects.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index ccc529e5e..f9b51db7a 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -143,8 +143,6 @@ static void process_tree(struct traversal_context *ctx,
struct rev_info *revs = ctx->revs;
int baselen = base->len;
enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
-   int gently = revs->ignore_missing_links ||
-revs->exclude_promisor_objects;
 
if (!revs->tree_objects)
return;
@@ -152,7 +150,7 @@ static void process_tree(struct traversal_context *ctx,
die("bad tree object");
if (obj->flags & (UNINTERESTING | SEEN))
return;
-   if (parse_tree_gently(tree, gently) < 0) {
+   if (parse_tree_gently(tree, 1) < 0) {
if (revs->ignore_missing_links)
return;
 
-- 
2.19.0.605.g01d371f741-goog



[PATCH v10 1/8] list-objects: store common func args in struct

2018-10-03 Thread Matthew DeVore
This will make utility functions easier to create, as done by the next
patch.

Signed-off-by: Matthew DeVore 
---
 list-objects.c | 158 +++--
 1 file changed, 74 insertions(+), 84 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index c99c47ac1..584518a3f 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -12,20 +12,25 @@
 #include "packfile.h"
 #include "object-store.h"
 
-static void process_blob(struct rev_info *revs,
+struct traversal_context {
+   struct rev_info *revs;
+   show_object_fn show_object;
+   show_commit_fn show_commit;
+   void *show_data;
+   filter_object_fn filter_fn;
+   void *filter_data;
+};
+
+static void process_blob(struct traversal_context *ctx,
 struct blob *blob,
-show_object_fn show,
 struct strbuf *path,
-const char *name,
-void *cb_data,
-filter_object_fn filter_fn,
-void *filter_data)
+const char *name)
 {
struct object *obj = >object;
size_t pathlen;
enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
 
-   if (!revs->blob_objects)
+   if (!ctx->revs->blob_objects)
return;
if (!obj)
die("bad blob object");
@@ -41,21 +46,21 @@ static void process_blob(struct rev_info *revs,
 * may cause the actual filter to report an incomplete list
 * of missing objects.
 */
-   if (revs->exclude_promisor_objects &&
+   if (ctx->revs->exclude_promisor_objects &&
!has_object_file(>oid) &&
is_promisor_object(>oid))
return;
 
pathlen = path->len;
strbuf_addstr(path, name);
-   if (!(obj->flags & USER_GIVEN) && filter_fn)
-   r = filter_fn(LOFS_BLOB, obj,
- path->buf, >buf[pathlen],
- filter_data);
+   if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+   r = ctx->filter_fn(LOFS_BLOB, obj,
+  path->buf, >buf[pathlen],
+  ctx->filter_data);
if (r & LOFR_MARK_SEEN)
obj->flags |= SEEN;
if (r & LOFR_DO_SHOW)
-   show(obj, path->buf, cb_data);
+   ctx->show_object(obj, path->buf, ctx->show_data);
strbuf_setlen(path, pathlen);
 }
 
@@ -81,26 +86,21 @@ static void process_blob(struct rev_info *revs,
  * the link, and how to do it. Whether it necessarily makes
  * any sense what-so-ever to ever do that is another issue.
  */
-static void process_gitlink(struct rev_info *revs,
+static void process_gitlink(struct traversal_context *ctx,
const unsigned char *sha1,
-   show_object_fn show,
struct strbuf *path,
-   const char *name,
-   void *cb_data)
+   const char *name)
 {
/* Nothing to do */
 }
 
-static void process_tree(struct rev_info *revs,
+static void process_tree(struct traversal_context *ctx,
 struct tree *tree,
-show_object_fn show,
 struct strbuf *base,
-const char *name,
-void *cb_data,
-filter_object_fn filter_fn,
-void *filter_data)
+const char *name)
 {
struct object *obj = >object;
+   struct rev_info *revs = ctx->revs;
struct tree_desc desc;
struct name_entry entry;
enum interesting match = revs->diffopt.pathspec.nr == 0 ?
@@ -133,14 +133,14 @@ static void process_tree(struct rev_info *revs,
}
 
strbuf_addstr(base, name);
-   if (!(obj->flags & USER_GIVEN) && filter_fn)
-   r = filter_fn(LOFS_BEGIN_TREE, obj,
- base->buf, >buf[baselen],
- filter_data);
+   if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+   r = ctx->filter_fn(LOFS_BEGIN_TREE, obj,
+  base->buf, >buf[baselen],
+  ctx->filter_data);
if (r & LOFR_MARK_SEEN)
obj->flags |= SEEN;
if (r & LOFR_DO_SHOW)
-   show(obj, base->buf, cb_data);
+   ctx->show_object(obj, base->buf, ctx->show_data);
if (base->len)
strbuf_addch(base, '/');
 
@@ -157,29 +157,25 @@ static void process_tree(struct rev_info *revs,
}
 
if (S_ISDIR(entry.mode))
-   process_tree(revs,
+   process_tree(ctx,
 lookup_tree(the_repository, entry.oid),
-   

[PATCH v10 2/8] list-objects: refactor to process_tree_contents

2018-10-03 Thread Matthew DeVore
This will be used in a follow-up patch to reduce indentation needed when
invoking the logic conditionally. i.e. rather than:

if (foo) {
while (...) {
/* this is very indented */
}
}

we will have:

if (foo)
process_tree_contents(...);

Signed-off-by: Matthew DeVore 
---
 list-objects.c | 68 ++
 1 file changed, 41 insertions(+), 27 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index 584518a3f..ccc529e5e 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -94,6 +94,46 @@ static void process_gitlink(struct traversal_context *ctx,
/* Nothing to do */
 }
 
+static void process_tree(struct traversal_context *ctx,
+struct tree *tree,
+struct strbuf *base,
+const char *name);
+
+static void process_tree_contents(struct traversal_context *ctx,
+ struct tree *tree,
+ struct strbuf *base)
+{
+   struct tree_desc desc;
+   struct name_entry entry;
+   enum interesting match = ctx->revs->diffopt.pathspec.nr == 0 ?
+   all_entries_interesting : entry_not_interesting;
+
+   init_tree_desc(, tree->buffer, tree->size);
+
+   while (tree_entry(, )) {
+   if (match != all_entries_interesting) {
+   match = tree_entry_interesting(, base, 0,
+  
>revs->diffopt.pathspec);
+   if (match == all_entries_not_interesting)
+   break;
+   if (match == entry_not_interesting)
+   continue;
+   }
+
+   if (S_ISDIR(entry.mode))
+   process_tree(ctx,
+lookup_tree(the_repository, entry.oid),
+base, entry.path);
+   else if (S_ISGITLINK(entry.mode))
+   process_gitlink(ctx, entry.oid->hash,
+   base, entry.path);
+   else
+   process_blob(ctx,
+lookup_blob(the_repository, entry.oid),
+base, entry.path);
+   }
+}
+
 static void process_tree(struct traversal_context *ctx,
 struct tree *tree,
 struct strbuf *base,
@@ -101,10 +141,6 @@ static void process_tree(struct traversal_context *ctx,
 {
struct object *obj = >object;
struct rev_info *revs = ctx->revs;
-   struct tree_desc desc;
-   struct name_entry entry;
-   enum interesting match = revs->diffopt.pathspec.nr == 0 ?
-   all_entries_interesting: entry_not_interesting;
int baselen = base->len;
enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
int gently = revs->ignore_missing_links ||
@@ -144,29 +180,7 @@ static void process_tree(struct traversal_context *ctx,
if (base->len)
strbuf_addch(base, '/');
 
-   init_tree_desc(, tree->buffer, tree->size);
-
-   while (tree_entry(, )) {
-   if (match != all_entries_interesting) {
-   match = tree_entry_interesting(, base, 0,
-  >diffopt.pathspec);
-   if (match == all_entries_not_interesting)
-   break;
-   if (match == entry_not_interesting)
-   continue;
-   }
-
-   if (S_ISDIR(entry.mode))
-   process_tree(ctx,
-lookup_tree(the_repository, entry.oid),
-base, entry.path);
-   else if (S_ISGITLINK(entry.mode))
-   process_gitlink(ctx, entry.oid->hash, base, entry.path);
-   else
-   process_blob(ctx,
-lookup_blob(the_repository, entry.oid),
-base, entry.path);
-   }
+   process_tree_contents(ctx, tree, base);
 
if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
r = ctx->filter_fn(LOFS_END_TREE, obj,
-- 
2.19.0.605.g01d371f741-goog



[PATCH v10 0/8] filter: support for excluding all trees and blobs

2018-10-03 Thread Matthew DeVore
This is a minor change to the previous rollup. It moves positional
arguments to the end of git-rev-list invocations. Here is an interdiff
from v9:

diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 7b6294ca5..53fbf7db8 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -168,7 +168,8 @@ test_expect_success 'use fsck before and after manually 
fetching a missing subtr
git -C dst fsck &&
 
# Make sure we only have commits, and all trees and blobs are missing.
-   git -C dst rev-list master --missing=allow-any --objects 
>fetched_objects &&
+   git -C dst rev-list --missing=allow-any --objects master \
+   >fetched_objects &&
awk -f print_1.awk fetched_objects |
xargs -n1 git -C dst cat-file -t >fetched_types &&
 
@@ -184,7 +185,7 @@ test_expect_success 'use fsck before and after manually 
fetching a missing subtr
git -C dst fsck &&
 
# Auto-fetch all remaining trees and blobs with --missing=error
-   git -C dst rev-list master --missing=error --objects >fetched_objects &&
+   git -C dst rev-list --missing=error --objects master >fetched_objects &&
test_line_count = 70 fetched_objects &&
 
awk -f print_1.awk fetched_objects |
diff --git a/t/t6112-rev-list-filters-objects.sh 
b/t/t6112-rev-list-filters-objects.sh
index 6e5c41a68..5a61614b1 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -227,7 +227,8 @@ test_expect_success 'rev-list W/ --missing=print and 
--missing=allow-any for tre
 # Test tree:0 filter.
 
 test_expect_success 'verify tree:0 includes trees in "filtered" output' '
-   git -C r3 rev-list HEAD --quiet --objects --filter-print-omitted 
--filter=tree:0 |
+   git -C r3 rev-list --quiet --objects --filter-print-omitted \
+   --filter=tree:0 HEAD |
awk -f print_1.awk |
sed s/~// |
xargs -n1 git -C r3 cat-file -t |

Thank you,

Matthew DeVore (8):
  list-objects: store common func args in struct
  list-objects: refactor to process_tree_contents
  list-objects: always parse trees gently
  rev-list: handle missing tree objects properly
  revision: mark non-user-given objects instead
  list-objects-filter: use BUG rather than die
  list-objects-filter-options: do not over-strbuf_init
  list-objects-filter: implement filter tree:0

 Documentation/rev-list-options.txt |   5 +
 builtin/rev-list.c |  11 +-
 list-objects-filter-options.c  |  19 +-
 list-objects-filter-options.h  |   1 +
 list-objects-filter.c  |  60 ++-
 list-objects.c | 232 +
 revision.c |   1 -
 revision.h |  26 ++-
 t/t0410-partial-clone.sh   |  45 +
 t/t5317-pack-objects-filter-objects.sh |  41 +
 t/t5616-partial-clone.sh   |  42 +
 t/t6112-rev-list-filters-objects.sh|  43 +
 12 files changed, 398 insertions(+), 128 deletions(-)

-- 
2.19.0.605.g01d371f741-goog



[PATCH 2/3] mingw: set _WIN32_WINNT explicitly for Git for Windows

2018-10-03 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

Previously, we only ever declared a target Windows version if compiling
with Visual C.

Which meant that we were relying on the MinGW headers to guess which
Windows version we want to target...

Let's be explicit about it, in particular because we actually want to
bump the target Windows version to Vista (which we will do in the next
commit).

Signed-off-by: Johannes Schindelin 
---
 git-compat-util.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/git-compat-util.h b/git-compat-util.h
index 5f2e90932f..3ba93d9c15 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -146,7 +146,7 @@
 #define _SGI_SOURCE 1
 
 #if defined(WIN32) && !defined(__CYGWIN__) /* Both MinGW and MSVC */
-# if defined (_MSC_VER) && !defined(_WIN32_WINNT)
+# if !defined(_WIN32_WINNT)
 #  define _WIN32_WINNT 0x0502
 # endif
 #define WIN32_LEAN_AND_MEAN  /* stops windows.h including winsock.h */
-- 
gitgitgadget



[PATCH 1/3] compat/poll: prepare for targeting Windows Vista

2018-10-03 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

Windows Vista (and later) actually have a working poll(), but we still
cannot use it because it only works on sockets.

So let's detect when we are targeting Windows Vista and undefine those
constants, and define `pollfd` so that we can declare our own pollfd
struct.

We also need to make sure that we override those constants *after*
`winsock2.h` has been `#include`d (otherwise we would not really
override those constants).

Signed-off-by: Johannes Schindelin 
---
 compat/poll/poll.c |  6 +++---
 compat/poll/poll.h | 15 +++
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/compat/poll/poll.c b/compat/poll/poll.c
index 7ed3fbbea1..ad5dcde439 100644
--- a/compat/poll/poll.c
+++ b/compat/poll/poll.c
@@ -29,9 +29,6 @@
 
 #include 
 
-/* Specification.  */
-#include 
-
 #include 
 #include 
 #include 
@@ -55,6 +52,9 @@
 # include 
 #endif
 
+/* Specification.  */
+#include "poll.h"
+
 #ifdef HAVE_SYS_IOCTL_H
 # include 
 #endif
diff --git a/compat/poll/poll.h b/compat/poll/poll.h
index cd1995292a..1e1597360f 100644
--- a/compat/poll/poll.h
+++ b/compat/poll/poll.h
@@ -21,6 +21,21 @@
 #ifndef _GL_POLL_H
 #define _GL_POLL_H
 
+#if defined(_WIN32_WINNT) && _WIN32_WINNT >= 0x600
+/* Vista has its own, socket-only poll() */
+#undef POLLIN
+#undef POLLPRI
+#undef POLLOUT
+#undef POLLERR
+#undef POLLHUP
+#undef POLLNVAL
+#undef POLLRDNORM
+#undef POLLRDBAND
+#undef POLLWRNORM
+#undef POLLWRBAND
+#define pollfd compat_pollfd
+#endif
+
 /* fake a poll(2) environment */
 #define POLLIN  0x0001  /* any readable data available   */
 #define POLLPRI 0x0002  /* OOB/Urgent readable data  */
-- 
gitgitgadget



[PATCH 0/3] mingw: require Windows Vista or later (and fix the Windows CI builds)

2018-10-03 Thread Johannes Schindelin via GitGitGadget
I noticed that a recent GitGitGadget build failed in the Windows phase, with
compat/poll/poll.h problems all over the place.

Turns out that this is caused by the recent upgrade of the mingw-w64 headers
and crt files (7.0.0.5233.e0c09544 -> 7.0.0.5245.edf66197, which I assume
now enforces Vista as minimal Windows version also for all mingw-w64
projects).

Luckily, in Git for Windows' master, we already had changes to require Vista
(for unrelated reasons: to restrict the std handle inheritance when spawning
new processes).

Technically, Windows Vista is already no longer supported
[https://support.microsoft.com/en-us/help/22882/windows-vista-end-of-support]
, but we do try to keep Git building on older Windows version, up until the
point when it becomes too big of a maintenance burden.

Johannes Schindelin (3):
  compat/poll: prepare for targeting Windows Vista
  mingw: set _WIN32_WINNT explicitly for Git for Windows
  mingw: bump the minimum Windows version to Vista

 compat/poll/poll.c |  6 +++---
 compat/poll/poll.h | 15 +++
 config.mak.uname   |  4 
 git-compat-util.h  |  4 ++--
 4 files changed, 20 insertions(+), 9 deletions(-)


base-commit: fe8321ec057f9231c26c29b364721568e58040f7
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-44%2Fdscho%2Frequire-windows-vista-or-later-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-44/dscho/require-windows-vista-or-later-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/44
-- 
gitgitgadget


[PATCH 3/3] mingw: bump the minimum Windows version to Vista

2018-10-03 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

Quite some time ago, a last plea to the XP users out there who want to
see Windows XP support in Git for Windows, asking them to get engaged
and help, vanished into the depths of the universe.

We tried for a long time to play nice with the last remaining XP users
who somehow manage to build Git from source, but a recent update of
mingw-w64 (7.0.0.5233.e0c09544 -> 7.0.0.5245.edf66197) finally dropped
the last sign of XP support, and Git for Windows' SDK is no longer able
to build core Git's `master` branch as a consequence. (Git for Windows'
`master` branch already bumped the minimum Windows version to Vista a
while ago, so it is fine.)

It is time to require Windows Vista or later to build Git from source.
This, incidentally, lets us use quite a few nice new APIs.

It also means that we no longer need the inet_pton() and inet_ntop()
emulation, which is nice.

Signed-off-by: Johannes Schindelin 
---
 config.mak.uname  | 4 
 git-compat-util.h | 2 +-
 2 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/config.mak.uname b/config.mak.uname
index e47af72e01..8acdeb71fd 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -381,8 +381,6 @@ ifeq ($(uname_S),Windows)
NO_PYTHON = YesPlease
BLK_SHA1 = YesPlease
ETAGS_TARGET = ETAGS
-   NO_INET_PTON = YesPlease
-   NO_INET_NTOP = YesPlease
NO_POSIX_GOODIES = UnfortunatelyYes
NATIVE_CRLF = YesPlease
DEFAULT_HELP_FORMAT = html
@@ -529,8 +527,6 @@ ifneq (,$(findstring MINGW,$(uname_S)))
NO_REGEX = YesPlease
NO_PYTHON = YesPlease
ETAGS_TARGET = ETAGS
-   NO_INET_PTON = YesPlease
-   NO_INET_NTOP = YesPlease
NO_POSIX_GOODIES = UnfortunatelyYes
DEFAULT_HELP_FORMAT = html
COMPAT_CFLAGS += -DNOGDI -Icompat -Icompat/win32
diff --git a/git-compat-util.h b/git-compat-util.h
index 3ba93d9c15..48c955541e 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -147,7 +147,7 @@
 
 #if defined(WIN32) && !defined(__CYGWIN__) /* Both MinGW and MSVC */
 # if !defined(_WIN32_WINNT)
-#  define _WIN32_WINNT 0x0502
+#  define _WIN32_WINNT 0x0600
 # endif
 #define WIN32_LEAN_AND_MEAN  /* stops windows.h including winsock.h */
 #include 
-- 
gitgitgadget


Re: [PATCH v2 2/2] oidset: use khash

2018-10-03 Thread Jeff King
On Wed, Oct 03, 2018 at 03:16:39PM +0200, René Scharfe wrote:

> Performance of a command that mainly checks for duplicate objects using
> an oidset, with master and Clang 6.0.1:
> 
>   $ cmd="./git-cat-file --batch-all-objects --unordered --buffer 
> --batch-check='%(objectname)'"
> 
>   $ /usr/bin/time $cmd >/dev/null
>   0.22user 0.03system 0:00.25elapsed 99%CPU (0avgtext+0avgdata 
> 48484maxresident)k
>   0inputs+0outputs (0major+11204minor)pagefaults 0swaps
> 
>   $ hyperfine "$cmd"
>   Benchmark #1: ./git-cat-file --batch-all-objects --unordered --buffer 
> --batch-check='%(objectname)'
> 
> Time (mean ± σ): 250.0 ms ±   6.0 ms[User: 225.9 ms, System: 23.6 
> ms]
> 
> Range (min … max):   242.0 ms … 261.1 ms
> 
> And with this patch:
> 
>   $ /usr/bin/time $cmd >/dev/null
>   0.14user 0.00system 0:00.15elapsed 100%CPU (0avgtext+0avgdata 
> 41396maxresident)k
>   0inputs+0outputs (0major+8318minor)pagefaults 0swaps
> 
>   $ hyperfine "$cmd"
>   Benchmark #1: ./git-cat-file --batch-all-objects --unordered --buffer 
> --batch-check='%(objectname)'
> 
> Time (mean ± σ): 151.9 ms ±   4.9 ms[User: 130.5 ms, System: 21.2 
> ms]
> 
> Range (min … max):   148.2 ms … 170.4 ms

Thanks. Just to reinforce these timings, a similar command on linux.git
drops from 4.0s to 2.3s.

The patch itself mostly looks good. A few small comments:

> diff --git a/fetch-pack.c b/fetch-pack.c
> index 75047a4b2a..a839315726 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -536,7 +536,7 @@ static int tip_oids_contain(struct oidset *tip_oids,
>* add to "newlist" between calls, the additions will always be for
>* oids that are already in the set.
>*/
> - if (!tip_oids->map.map.tablesize) {
> + if (!tip_oids->set.n_buckets) {
>   add_refs_to_oidset(tip_oids, unmatched);
>   add_refs_to_oidset(tip_oids, newlist);
>   }

This is a little intimate with the implementation of khash, but I think
it's probably OK (and really no worse than what was there before).

As the comment above notes, I think we're really looking at the case
where this gets populated on the first call, but not subsequent ones. It
might be less hacky to use a "static int initialized" here. Or if we
want to avoid hidden globals, put the logic into filter_refs() to decide
when to populate.

I don't think any of that needs to hold up this series, though.

>  int oidset_insert(struct oidset *set, const struct object_id *oid)
>  {
> - struct oidmap_entry *entry;
> -
> - if (!set->map.map.tablesize)
> - oidmap_init(>map, 0);
> - else if (oidset_contains(set, oid))
> - return 1;
> -
> - entry = xmalloc(sizeof(*entry));
> - oidcpy(>oid, oid);
> -
> - oidmap_put(>map, entry);
> - return 0;
> + int added;
> + kh_put_oid(>set, *oid, );
> + return !added;
>  }

This actually does the check-and-insert in a single lookup, which is
nice. ;)

> diff --git a/oidset.h b/oidset.h
> index 40ec5f87fe..4b90540cd4 100644
> --- a/oidset.h
> +++ b/oidset.h
> [...]
> +KHASH_INIT(oid, struct object_id, int, 0, oid_hash, oid_equal)

This will declare these "static inline". Our other major "macros become
inline functions" code is commit-slab.h, and there we found it necessary
to add MAYBE_UNUSED. I wonder if we ought to be doing the same here (I
don't get any warnings, but I suspect sparse might complain).

It might be nice if these functions could hide inside oidset.c (and just
declare the struct here). It looks like we might be able to do that with
__KHASH_TYPE(), but the double-underscore implies that we're not
supposed to. ;)

I guess we also use a few of them in our inlines here. I'm not 100% sure
that oidset_* needs to be inlined either, but this is at least a pretty
faithful conversion of the original.

-Peff


Re: Fwd: Git credentials not working

2018-10-03 Thread Dimitri Kopriwa
I have replaced the way I fill the git credentials store, I have verify 
~/.git-credentials and information are there, the ~/.gitconfig look fine 
too.


I still have 401 error when reading from that file.

This is the paste log : https://paste.gnome.org/pmntlkdw0

Now that I use git approve, I dont think that I need a custom helper.

Any idea why I still can't log in using git-credential?

Thanks in advance,

On 10/4/18 1:24 AM, Jeff King wrote:

On Thu, Oct 04, 2018 at 01:12:11AM +0700, Dimitri Kopriwa wrote:


Thanks for your reply. I have activated GIT_TRACE_CURL=1 and I can see that
the request is failing 401.

I can't see which token is used and using what header ?

The log say:

17:50:26.414654 http.c:657  => Send header: Authorization: Basic 


Yeah, we redact the auth information so people don't accidentally share
it publicly. If you use the older GIT_CURL_VERBOSE=1, it will include
the credential (I think it may be base64 encoded, though, so you'll have
to decipher it).


I have retested the token locally and it work when used in the url or using
`Private-Token: ` as stated in the Gitlab documentation
https://docs.gitlab.com/ee/api/README.html#personal-access-tokens

I don't think Git will ever send your token in either of those ways. It
will always some as an Authorization header.


Peff, what would be the appropriate way to input my git credential in a 100%
success way in a CI?

I don't know the details of what GitLab would want, but...


Is this good:

git credential approve <
Yes, that would work to preload a token into any configured helpers.

-Peff


Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-03 Thread Jeff King
On Wed, Oct 03, 2018 at 12:08:15PM -0700, Stefan Beller wrote:

> I share these concerns in a slightly more abstract way, as
> I would bucket the actions into two separate bins:
> 
> One bin that throws away information.
> this would include removing expired reflog entries (which
> I do not think are garbage, or collection thereof), but their
> usefulness is questionable.
> 
> The other bin would be actions that optimize but
> do not throw away any information, repacking (without
> dropping files) would be part of it, or the new
> "write additional files".
> 
> Maybe we can move all actions of the second bin into a new
> "git optimize" command, and git gc would do first the "throw away
> things" and then the optimize action, whereas clone would only
> go for the second optimizing part?

One problem with that world-view is that some of the operations do
_both_, for efficiency. E.g., repacking will drop unreachable objects in
too-old packs. We could actually be more aggressive in combining things
here. For instance, a full object graph walk in linux.git takes 30-60
seconds, depending on your CPU. But we do it at least twice during a gc:
once to repack, and then again to determine reachability for pruning.

If you generate bitmaps during the repack step, you can use them during
the prune step. But by itself, the cost of generating the bitmaps
generally outweighs the extra walk. So it's not worth generating them
_just_ for this (but is an obvious optimization for a server which would
be generating them anyway).

-Peff


Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-03 Thread Jeff King
On Wed, Oct 03, 2018 at 02:59:34PM -0400, Derrick Stolee wrote:

> > They don't help yet, and there's no good reason to enable bitmaps for
> > clients. I have a few patches that use bitmaps for things like
> > ahead/behind and --contains checks, but the utility of those may be
> > lessened quite a bit by Stolee's commit-graph work.  And if it isn't,
> > I'm mildly in favor of replacing the existing .bitmap format with
> > something better integrated with commit-graphs (which would give us an
> > opportunity to clean up some of the rough edges).
> 
> If the commit-graph doesn't improve enough on those applications, then we
> could consider adding a commit-to-commit reachability bitmap inside the
> commit-graph. ;)

That unfortunately wouldn't be enough for us to ditch the existing
.bitmap files, since we need full object reachability for some cases
(including packing). And commit-to-commit reachability is a trivial
subset of that. I'm not sure if it would be better to just leave
.bitmaps in place as a server-side thing, and grow a new thing for
commit-to-commit reachability (since it would presumably be easier).

I'm still excited about the prospect of a bloom filter for paths which
each commit touches. I think that's the next big frontier in getting
things like "git log -- path" to a reasonable run-time.

-Peff


Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-03 Thread Stefan Beller
>
> But you thought right, I do have an objection against that.  'git gc'
> should, well, collect garbage.  Any non-gc stuff is already violating
> separation of concerns.

I share these concerns in a slightly more abstract way, as
I would bucket the actions into two separate bins:

One bin that throws away information.
this would include removing expired reflog entries (which
I do not think are garbage, or collection thereof), but their
usefulness is questionable.

The other bin would be actions that optimize but
do not throw away any information, repacking (without
dropping files) would be part of it, or the new
"write additional files".

Maybe we can move all actions of the second bin into a new
"git optimize" command, and git gc would do first the "throw away
things" and then the optimize action, whereas clone would only
go for the second optimizing part?


Re: [PATCH v3 5/5] list-objects-filter: implement filter tree:0

2018-10-03 Thread Matthew DeVore
On Tue, Aug 14, 2018 at 8:13 AM Jeff Hostetler  wrote:
>
> There are a couple of options here:
> [] If really want to omit all trees and blobs (and we DO NOT want
> the oidset of everything omitted), then we might be able to
> shortcut the traversal and speed things up.
>
> {} add a LOFR_SKIP_TREE bit to list_objects_filter_result
> {} test this bit process_tree() and avoid the init_tree_desc() and
>the while loop and some adjacent setup/tear-down code.
> {} make this filter something like:
>
> case LOFS_BEGIN_TREE:
> if (filter_data->omits) {
> oidset_insert(filter_data->omits, >oid);
> return LOFR_MARK_SEEN; /* ... (hard omit) */
> } else
> return LOFR_SKIP_TREE;
> case LOFS_BLOB:
> if (filter_data->omits) {
> oidset_insert(filter_data->omits, >oid);
> return LOFR_MARK_SEEN; /* ... (hard omit) */
> else
> assert(...should not happen...);
>
> [] Later, if we choose to actually support a depth>0, we'll probably
> want a different filter function to conditionally include/exclude
> blobs, include shallow tree[node]s, and do some of the provisional-
> omit logic on deep tree[nodes] (in case a tree appears at multiple
> places/depths in the history).  But that can wait.
>
> Jeff
>

Jeff, have you made any progress on depth>0 support for the tree
filter? I'd like to take a stab at it without duplicating work :)

- Matt


Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-03 Thread Derrick Stolee

On 10/3/2018 2:51 PM, Jeff King wrote:

On Wed, Oct 03, 2018 at 08:47:11PM +0200, Ævar Arnfjörð Bjarmason wrote:


On Wed, Oct 03 2018, Stefan Beller wrote:


So we wouldn't be spending 5 minutes repacking linux.git right after
cloning it, just ~10s generating the commit graph, and the same would
happen if you rm'd .git/objects/info/commit-graph and ran "git commit",
which would kick of "gc --auto" in the background and do the same thing.

Or generating local bitmaps or pack idx files as well?

I'm less familiar with this area, but when I clone I get a pack *.idx
file, why does it need to be regenerated?

But yeah, in principle this would be a sensible addition, but I'm not
aware of cases where clients get significant benefits from bitmaps (see
https://githubengineering.com/counting-objects/), and I don't turn it on
for clients, but maybe I've missed something.

They don't help yet, and there's no good reason to enable bitmaps for
clients. I have a few patches that use bitmaps for things like
ahead/behind and --contains checks, but the utility of those may be
lessened quite a bit by Stolee's commit-graph work.  And if it isn't,
I'm mildly in favor of replacing the existing .bitmap format with
something better integrated with commit-graphs (which would give us an
opportunity to clean up some of the rough edges).


If the commit-graph doesn't improve enough on those applications, then 
we could consider adding a commit-to-commit reachability bitmap inside 
the commit-graph. ;)


-Stolee


Re: [PATCH v2] gpg-interface.c: detect and reject multiple signatures on commits

2018-10-03 Thread Stefan Beller
On Wed, Oct 3, 2018 at 1:29 AM Michał Górny  wrote:
>
> On Fri, 2018-08-17 at 09:34 +0200, Michał Górny wrote:
> > GnuPG supports creating signatures consisting of multiple signature
> > packets.  If such a signature is verified, it outputs all the status
> > messages for each signature separately.  However, git currently does not
> > account for such scenario and gets terribly confused over getting
> > multiple *SIG statuses.
> >
> > For example, if a malicious party alters a signed commit and appends
> > a new untrusted signature, git is going to ignore the original bad
> > signature and report untrusted commit instead.  However, %GK and %GS
> > format strings may still expand to the data corresponding
> > to the original signature, potentially tricking the scripts into
> > trusting the malicious commit.
> >
> > Given that the use of multiple signatures is quite rare, git does not
> > support creating them without jumping through a few hoops, and finally
> > supporting them properly would require extensive API improvement, it
> > seems reasonable to just reject them at the moment.
> >
>
> Gentle ping.

I am not an expert on GPG, but the patch (design, code, test) looks
reasonable to me.


Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-03 Thread Jeff King
On Wed, Oct 03, 2018 at 08:47:11PM +0200, Ævar Arnfjörð Bjarmason wrote:

> 
> On Wed, Oct 03 2018, Stefan Beller wrote:
> 
> >> So we wouldn't be spending 5 minutes repacking linux.git right after
> >> cloning it, just ~10s generating the commit graph, and the same would
> >> happen if you rm'd .git/objects/info/commit-graph and ran "git commit",
> >> which would kick of "gc --auto" in the background and do the same thing.
> >
> > Or generating local bitmaps or pack idx files as well?
> 
> I'm less familiar with this area, but when I clone I get a pack *.idx
> file, why does it need to be regenerated?
> 
> But yeah, in principle this would be a sensible addition, but I'm not
> aware of cases where clients get significant benefits from bitmaps (see
> https://githubengineering.com/counting-objects/), and I don't turn it on
> for clients, but maybe I've missed something.

They don't help yet, and there's no good reason to enable bitmaps for
clients. I have a few patches that use bitmaps for things like
ahead/behind and --contains checks, but the utility of those may be
lessened quite a bit by Stolee's commit-graph work.  And if it isn't,
I'm mildly in favor of replacing the existing .bitmap format with
something better integrated with commit-graphs (which would give us an
opportunity to clean up some of the rough edges).

-Peff


Re: inside the git folder

2018-10-03 Thread Stefan Beller
On Wed, Oct 3, 2018 at 5:26 AM Chris Jeschke
 wrote:
>
> Hey git-team,
> I am working on a plug-in for a distributed pair programming tool. To
> skip the details: I was thinking about sending parts of the git folder
> as a zip folder with our own Bytestream instead of using the git API.
> Is there a common sense about what should and what shouldn't be done
> when working with the files inside the git folder?

This contradicts the security model of git.
Locally I can do things like:
git config alias.co "rm -rf ~"
echo "rm -rf ~" >.git/hooks/{...}
and I would experience bad things, but that is ok,
as I configured it locally (supposedly I know what
I am doing); but if I have the ability to send these
tricks to my beloved coworkers, hilarity might ensue.

What stuff do you need to send around?

objects? Fine, as the receive could check they are
good using fsck.

refs/ ? Sure. It may be confusing to users,
but I am sure you'll figure UX out.

local config, hooks ? I would not.

Not sure what else you'd think of sending around.

Cheers,
Stefan


Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-03 Thread Ævar Arnfjörð Bjarmason


On Wed, Oct 03 2018, Stefan Beller wrote:

>> So we wouldn't be spending 5 minutes repacking linux.git right after
>> cloning it, just ~10s generating the commit graph, and the same would
>> happen if you rm'd .git/objects/info/commit-graph and ran "git commit",
>> which would kick of "gc --auto" in the background and do the same thing.
>
> Or generating local bitmaps or pack idx files as well?

I'm less familiar with this area, but when I clone I get a pack *.idx
file, why does it need to be regenerated?

But yeah, in principle this would be a sensible addition, but I'm not
aware of cases where clients get significant benefits from bitmaps (see
https://githubengineering.com/counting-objects/), and I don't turn it on
for clients, but maybe I've missed something.


Re: Fwd: Git credentials not working

2018-10-03 Thread Jeff King
On Thu, Oct 04, 2018 at 01:12:11AM +0700, Dimitri Kopriwa wrote:

> Thanks for your reply. I have activated GIT_TRACE_CURL=1 and I can see that
> the request is failing 401.
> 
> I can't see which token is used and using what header ?
> 
> The log say:
> 
> 17:50:26.414654 http.c:657  => Send header: Authorization: Basic 
> 

Yeah, we redact the auth information so people don't accidentally share
it publicly. If you use the older GIT_CURL_VERBOSE=1, it will include
the credential (I think it may be base64 encoded, though, so you'll have
to decipher it).

> I have retested the token locally and it work when used in the url or using
> `Private-Token: ` as stated in the Gitlab documentation
> https://docs.gitlab.com/ee/api/README.html#personal-access-tokens

I don't think Git will ever send your token in either of those ways. It
will always some as an Authorization header.

> Peff, what would be the appropriate way to input my git credential in a 100%
> success way in a CI?

I don't know the details of what GitLab would want, but...

> Is this good:
> 
> git credential approve < protocol=https
> host=example.com
> username=bob
> password=secr3t
> OEF

Yes, that would work to preload a token into any configured helpers.

-Peff


Re: Fwd: Git credentials not working

2018-10-03 Thread Dimitri Kopriwa
Thanks for your reply. I have activated GIT_TRACE_CURL=1 and I can see 
that the request is failing 401.


I can't see which token is used and using what header ?

The log say:

17:50:26.414654 http.c:657  => Send header: Authorization: Basic 


I have retested the token locally and it work when used in the url or 
using `Private-Token: ` as stated in the Gitlab documentation 
https://docs.gitlab.com/ee/api/README.html#personal-access-tokens


Peff, what would be the appropriate way to input my git credential in a 
100% success way in a CI?


Is this good:

git credential approve 

Re: [PATCH] coccicheck: process every source file at once

On Wed, Oct 3, 2018 at 8:52 AM Jacob Keller  wrote:
>
> On Wed, Oct 3, 2018 at 8:05 AM SZEDER Gábor  wrote:
> > In fact, it works finer than ever, because running 1.0.0 with this
> > patch on Travis CI notices a possible memmove() -> MOVE_ARRAY()
> > conversion:
> >
> >   https://travis-ci.org/szeder/git/jobs/436542684#L576
> >
> > Surprisingly, running 1.0.0 without this patch, or running 1.0.4
> > locally either with or without this patch doesn't notice that
> > memmove() call.  Presumably that's why Jonathan could kind-of "revert"
> > my conversion from f919ffebed (Use MOVE_ARRAY, 2018-01-22) in his
> > 6a1a79fd14 (object: move grafts to object parser, 2018-05-15) without
> > anyone noticing.
> >
>
> That seems very odd...

That looks like a bad rebase on my side as I was carrying that patch
for a while as the development of object store series took some time.

And I agree we should re-introduce the MOVE_ARRAY there.


Re: We should add a "git gc --auto" after "git clone" due to commit graph

> So we wouldn't be spending 5 minutes repacking linux.git right after
> cloning it, just ~10s generating the commit graph, and the same would
> happen if you rm'd .git/objects/info/commit-graph and ran "git commit",
> which would kick of "gc --auto" in the background and do the same thing.

Or generating local bitmaps or pack idx files as well?


Re: Git credentials not working




On 10/3/18 11:03 PM, Christian Couder wrote:

(removing git-security from CC)

On Wed, Oct 3, 2018 at 4:09 PM Dimitri Kopriwa  wrote:


Git credentials in ~/.git-credentials and ~/.config/git/credentials are
being removed by git upon reading.

https://git-scm.com/docs/git-credential says:

"If the action is reject, git-credential will send the description to
any configured credential helpers, which may erase any stored
credential matching the description."

So maybe this is expected.
I am using this script to create my credential file, how am I supposed 
to do in a non tty environment? Is there a prefered way?


Another possibility is that your .gitlab-ci.yml might launch scripts
writing into those files, like the before_script.sh script that is
described on:

https://stackoverflow.com/questions/50553049/is-it-possible-to-do-a-git-push-within-a-gitlab-ci-without-ssh

Could you also check which credential helper and which options are
used? For example with commands like:

$ git config -l --show-origin | grep -i cred
$ git config -l --show-origin | grep -i http
$ git config -l --show-origin | grep -i askpass
$ env | grep -i askpass

 * branch    HEAD   -> FETCH_HEAD
17:15:36.175966 run-command.c:637   trace: run_command: git gc --auto
17:15:36.177688 git.c:415   trace: built-in: git gc --auto
$ git config -l --show-origin | grep -i cred
17:15:36.180191 git.c:415   trace: built-in: git config -l 
--show-origin

file:/root/.gitconfig    credential.helper=store
file:.git/config    credential.helper=store
$ git config -l --show-origin | grep -i http
17:15:36.182768 git.c:415   trace: built-in: git config -l 
--show-origin
file:.git/config 
remote.origin.url=https://git.example.com/example/sample-project.git
$ git config -l --show-origin | grep -i askpass || echo nothing 
to do
17:15:36.185306 git.c:415   trace: built-in: git config -l 
--show-origin

nothing to do
$ env | grep -i askpass || echo nothing to do
nothing to do


[PATCH v2 1/3] commit-graph: clean up leaked memory during write

From: Derrick Stolee 

The write_commit_graph() method in commit-graph.c leaks some lits
and strings during execution. In addition, a list of strings is
leaked in write_commit_graph_reachable(). Clean these up so our
memory checking is cleaner.

Further, if we use a list of pack-files to find the commits, we
can leak the packed_git structs after scanning them for commits.

Running the following commands demonstrates the leak before and
the fix after:

* valgrind --leak-check=full ./git commit-graph write --reachable
* valgrind --leak-check=full ./git commit-graph write --stdin-packs

Signed-off-by: Martin Ågren 
Signed-off-by: Derrick Stolee 
---
 commit-graph.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 2a24eb8b5a..ceca6026b0 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -693,11 +693,12 @@ static int add_ref_to_list(const char *refname,
 void write_commit_graph_reachable(const char *obj_dir, int append,
  int report_progress)
 {
-   struct string_list list;
+   struct string_list list = STRING_LIST_INIT_DUP;
 
-   string_list_init(, 1);
for_each_ref(add_ref_to_list, );
write_commit_graph(obj_dir, NULL, , append, report_progress);
+
+   string_list_clear(, 0);
 }
 
 void write_commit_graph(const char *obj_dir,
@@ -764,6 +765,7 @@ void write_commit_graph(const char *obj_dir,
die(_("error opening index for %s"), 
packname.buf);
for_each_object_in_pack(p, add_packed_commits, , 
0);
close_pack(p);
+   free(p);
}
stop_progress();
strbuf_release();
@@ -846,9 +848,11 @@ void write_commit_graph(const char *obj_dir,
compute_generation_numbers(, report_progress);
 
graph_name = get_commit_graph_filename(obj_dir);
-   if (safe_create_leading_directories(graph_name))
+   if (safe_create_leading_directories(graph_name)) {
+   UNLEAK(graph_name);
die_errno(_("unable to create leading directories of %s"),
  graph_name);
+   }
 
hold_lock_file_for_update(, graph_name, LOCK_DIE_ON_ERROR);
f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf);
@@ -893,9 +897,9 @@ void write_commit_graph(const char *obj_dir,
finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
commit_lock_file();
 
+   free(graph_name);
+   free(commits.list);
free(oids.list);
-   oids.alloc = 0;
-   oids.nr = 0;
 }
 
 #define VERIFY_COMMIT_GRAPH_ERROR_HASH 2
-- 
gitgitgadget



[PATCH v2 3/3] commit-graph: reduce initial oid allocation

From: Derrick Stolee 

While writing a commit-graph file, we store the full list of
commits in a flat list. We use this list for sorting and ensuring
we are closed under reachability.

The initial allocation assumed that (at most) one in four objects
is a commit. This is a dramatic over-count for many repos,
especially large ones. Since we grow the repo dynamically, reduce
this count by a factor of eight. We still set it to a minimum of
1024 before allocating.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index ceca6026b0..e773703e1d 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -720,7 +720,7 @@ void write_commit_graph(const char *obj_dir,
struct progress *progress = NULL;
 
oids.nr = 0;
-   oids.alloc = approximate_object_count() / 4;
+   oids.alloc = approximate_object_count() / 32;
oids.progress = NULL;
oids.progress_done = 0;
 
-- 
gitgitgadget


[PATCH v2 2/3] builtin/commit-graph.c: UNLEAK variables

From: =?UTF-8?q?Martin=20=C3=85gren?= 

`graph_verify()`, `graph_read()` and `graph_write()` do the hard work of
`cmd_commit_graph()`. As soon as these return, so does
`cmd_commit_graph()`.

`strbuf_getline()` may allocate memory in the strbuf, yet return EOF.
We need to release the strbuf or UNLEAK it. Go for the latter since we
are close to returning from `graph_write()`.

`graph_write()` also fails to free the strings in the string list. They
have been added to the list with `strdup_strings` set to 0. We could
flip `strdup_strings` before clearing the list, which is our usual hack
in situations like this. But since we are about to exit, let's just
UNLEAK the whole string list instead.

UNLEAK `graph` in `graph_verify`. While at it, and for consistency,
UNLEAK in `graph_read()` as well, and remove an unnecessary UNLEAK just
before dying.

Signed-off-by: Martin Ågren 
Signed-off-by: Derrick Stolee 
---
 builtin/commit-graph.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index bc0fa9ba52..66f12eb009 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -64,6 +64,7 @@ static int graph_verify(int argc, const char **argv)
if (!graph)
return 0;
 
+   UNLEAK(graph);
return verify_commit_graph(the_repository, graph);
 }
 
@@ -89,10 +90,8 @@ static int graph_read(int argc, const char **argv)
graph_name = get_commit_graph_filename(opts.obj_dir);
graph = load_commit_graph_one(graph_name);
 
-   if (!graph) {
-   UNLEAK(graph_name);
+   if (!graph)
die("graph file %s does not exist", graph_name);
-   }
 
FREE_AND_NULL(graph_name);
 
@@ -115,7 +114,7 @@ static int graph_read(int argc, const char **argv)
printf(" large_edges");
printf("\n");
 
-   free_commit_graph(graph);
+   UNLEAK(graph);
 
return 0;
 }
@@ -166,6 +165,8 @@ static int graph_write(int argc, const char **argv)
pack_indexes = 
if (opts.stdin_commits)
commit_hex = 
+
+   UNLEAK(buf);
}
 
write_commit_graph(opts.obj_dir,
@@ -174,7 +175,7 @@ static int graph_write(int argc, const char **argv)
   opts.append,
   1);
 
-   string_list_clear(, 0);
+   UNLEAK(lines);
return 0;
 }
 
-- 
gitgitgadget



[PATCH v2 0/3] Clean up leaks in commit-graph.c

While looking at the commit-graph code, I noticed some memory leaks. These
can be found by running

valgrind --leak-check=full ./git commit-graph write --reachable

The impact of these leaks are small, as we never call write_commit_graph
_reachable in a loop, but it is best to be diligent here.

While looking at memory consumption within write_commit_graph(), I noticed
that we initialize our oid list with "object count / 4", which seems to be a
huge over-count. I reduce this by a factor of eight.

I built off of ab/commit-graph-progress, because my patch involves lines
close to those changes.

V2 includes feedback from V1 along with Martin's additional patches.

Thanks, -Stolee

Derrick Stolee (2):
  commit-graph: clean up leaked memory during write
  commit-graph: reduce initial oid allocation

Martin Ågren (1):
  builtin/commit-graph.c: UNLEAK variables

 builtin/commit-graph.c | 11 ++-
 commit-graph.c | 16 ++--
 2 files changed, 16 insertions(+), 11 deletions(-)


base-commit: 6b89a34c89fc763292f06012318b852b74825619
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-42%2Fderrickstolee%2Fcommit-graph-leak-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-42/derrickstolee/commit-graph-leak-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/42

Range-diff vs v1:

 1:  6906c25415 ! 1:  ba65680b3d commit-graph: clean up leaked memory during 
write
 @@ -7,17 +7,29 @@
  leaked in write_commit_graph_reachable(). Clean these up so our
  memory checking is cleaner.
  
 -Running 'valgrind --leak-check=full git commit-graph write
 ---reachable' demonstrates these leaks and how they are fixed after
 -this change.
 +Further, if we use a list of pack-files to find the commits, we
 +can leak the packed_git structs after scanning them for commits.
  
 +Running the following commands demonstrates the leak before and
 +the fix after:
 +
 +* valgrind --leak-check=full ./git commit-graph write --reachable
 +* valgrind --leak-check=full ./git commit-graph write --stdin-packs
 +
 +Signed-off-by: Martin Ågren 
  Signed-off-by: Derrick Stolee 
  
  diff --git a/commit-graph.c b/commit-graph.c
  --- a/commit-graph.c
  +++ b/commit-graph.c
  @@
 -  string_list_init(, 1);
 + void write_commit_graph_reachable(const char *obj_dir, int append,
 +int report_progress)
 + {
 +- struct string_list list;
 ++ struct string_list list = STRING_LIST_INIT_DUP;
 + 
 +- string_list_init(, 1);
for_each_ref(add_ref_to_list, );
write_commit_graph(obj_dir, NULL, , append, report_progress);
  +
 @@ -25,6 +37,14 @@
   }
   
   void write_commit_graph(const char *obj_dir,
 +@@
 +  die(_("error opening index for %s"), 
packname.buf);
 +  for_each_object_in_pack(p, add_packed_commits, , 
0);
 +  close_pack(p);
 ++ free(p);
 +  }
 +  stop_progress();
 +  strbuf_release();
  @@
compute_generation_numbers(, report_progress);
   
 @@ -45,5 +65,8 @@
  + free(graph_name);
  + free(commits.list);
free(oids.list);
 -  oids.alloc = 0;
 -  oids.nr = 0;
 +- oids.alloc = 0;
 +- oids.nr = 0;
 + }
 + 
 + #define VERIFY_COMMIT_GRAPH_ERROR_HASH 2
 -:  -- > 2:  13032d8475 builtin/commit-graph.c: UNLEAK variables
 2:  e29a0eaf03 = 3:  1002fd34fc commit-graph: reduce initial oid allocation

-- 
gitgitgadget


Re: Fwd: Git credentials not working

On Wed, Oct 03, 2018 at 09:06:38PM +0700, Dimitri Kopriwa wrote:

> 18:25:52.940307 git.c:659   trace: exec: git-credential-store 
> erase
> 18:25:52.940365 run-command.c:637   trace: run_command: 
> git-credential-store erase
> remote: HTTP Basic: Access denied
> fatal: Authentication failed for
> 'https://git.example.com/example/some-project.git/'
> [...]
> 
> Can you please help me found why is git credential-store erase called ?

This is expected. We tried to use a credential that was rejected by the
server, so we told all of the helpers it was invalid. You can try
running GIT_TRACE_CURL=1 to see the HTTP conversation. There will be an
HTTP 401 with the authentication failure, though it may not tell you
anything more useful than that.

git-credential-store is meant to be used interactively, to insert and
erase credentials as they're grabbed from the terminal.

It sounds more like you want to just have a stored credential that you
try to use. You could do that with a custom helper. E.g., something like
this in your ~/.gitconfig:

  [credential "https://example.com;]
  helper = "!f() { test $1 = get && echo password=$(cat /path/with/password); 
}; f"

-Peff


Re: We should add a "git gc --auto" after "git clone" due to commit graph

On Wed, Oct 03, 2018 at 05:19:41PM +0200, Ævar Arnfjörð Bjarmason wrote:
> >> >> >> So we should make "git gc --auto" be run on clone,
> >> >> >
> >> >> > There is no garbage after 'git clone'...
> >> >>
> >> >> "git gc" is really "git gc-or-create-indexes" these days.
> >> >
> >> > Because it happens to be convenient to create those indexes at
> >> > gc-time.  But that should not be an excuse to run gc when by
> >> > definition no gc is needed.
> >>
> >> Ah, I thought you just had an objection to the "gc" name being used for
> >> non-gc stuff,
> >
> > But you thought right, I do have an objection against that.  'git gc'
> > should, well, collect garbage.  Any non-gc stuff is already violating
> > separation of concerns.
> 
> Ever since git-gc was added back in 30f610b7b0 ("Create 'git gc' to
> perform common maintenance operations.", 2006-12-27) it has been
> described as:
> 
> git-gc - Cleanup unnecessary files and optimize the local repository
> 
> Creating these indexes like the commit-graph falls under "optimize the
> local repository",

But it doesn't fall under "cleanup unnecessary files", which the
commit-graph file is, since, strictly speaking, it's purely
optimization.

That description came about, because cleaning up unnecessary files,
notably combining lots of loose refs into a single packed-refs file
and combining lots of loose objects and pack files into a single pack
file, could not only make the repository smaller (barring too many
exploding unreachable objects), but, as it turned out, could also make
Git operations in that repository faster.

To me, the main goal of the command is cleanup.  Optimization, however
beneficial, is its side effect, and I assume the "optimize" part was
added to the description mainly to inform and "encourage" users.
After all, the command is called 'git gc', not 'git optimize-repo'.

> and 3rd party tools (e.g. the repo tool doing this
> came up on list recently) have been calling "gc --auto" with this
> assumption.
> 
> >>  but if you mean we shouldn't do a giant repack right after
> >> clone I agree.
> >
> > And, I also mean that since 'git clone' knows that there can't
> > possibly be any garbage in the first place, then it shouldn't call 'gc
> > --auto' at all.  However, since it also knows that there is a lot of
> > new stuff, then it should create a commit-graph if enabled.
> 
> Is this something you think just because the tool isn't called
> git-gc-and-optimzie, or do you think this regardless of what it's
> called?

Well, that still has 'gc' in it...

> I don't see how splitting up the entry points for "detect if we need to
> cleanup or optimize the repo" leaves us with a better codebase for the
> reasons noted in
> https://public-inbox.org/git/87pnwrgll2@evledraar.gmail.com/

Such a separation would be valuable for those having gc.auto = 0 in
their config.  Or, in general, to have a clearly marked entry point to
update all the enabled "purely-optimization" files without 'gc'
exploding a bunch of "just-became-unreachable" objects from deleted
reflog entries and packfiles, or without performing a comparatively
expensive repacking.  Note the "clearly marked"; I don't think
teaching 'gc [--auto]' various tricks to only create/update these
files without doing what it is fundamentally supposed to do qualifies
for that.




Re: We should add a "git gc --auto" after "git clone" due to commit graph

On Wed, Oct 3, 2018 at 3:23 PM Ævar Arnfjörð Bjarmason  wrote:
>
> Don't have time to patch this now, but thought I'd send a note / RFC
> about this.
>
> Now that we have the commit graph it's nice to be able to set
> e.g. core.commitGraph=true & gc.writeCommitGraph=true in ~/.gitconfig or
> /etc/gitconfig to apply them to all repos.
>
> But when I clone e.g. linux.git stuff like 'tag --contains' will be slow
> until whenever my first "gc" kicks in, which may be quite some time if
> I'm just using it passively.

Since you have core.hooksPath, you can already force gc (even in
detached mode) in post-checkout hook. I'm adding a new
"post-init-repo" hook to allow more customization right after clone or
init-db which may also be useful for other things than gc.
-- 
Duy


Re: [PATCH v4 7/7] tests: order arguments to git-rev-list properly

On Wed, Oct 3, 2018 at 9:26 AM Matthew DeVore  wrote:
>
> -   git -C pc1 rev-list HEAD --quiet --objects --missing=print >revs &&
> +   git -C pc1 rev-list --quiet --objects --missing=print >revs HEAD &&
> awk -f print_1.awk revs |
...
> git -C pc1 blame origin/master -- file.1.txt >observed.blame &&
> test_cmp expect.blame observed.blame &&
> -   git -C pc1 rev-list master..origin/master --quiet --objects 
> --missing=print >observed &&
> +   git -C pc1 rev-list --quiet --objects --missing=print >observed \
> +   master..origin/master &&
> test_line_count = 0 observed

I screwed up by putting the positional argument *after* the
redirection. Sorry for the mix-up. This is interestingly syntactically
valid, though bad stylistically. Here is an inter-diff:

diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index eeedd1623..6ff614692 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -35,7 +35,7 @@ test_expect_success 'setup bare clone for server' '
 test_expect_success 'do partial clone 1' '
 git clone --no-checkout --filter=blob:none
"file://$(pwd)/srv.bare" pc1 &&

-git -C pc1 rev-list --quiet --objects --missing=print >revs HEAD &&
+git -C pc1 rev-list --quiet --objects --missing=print HEAD >revs &&
 awk -f print_1.awk revs |
 sed "s/?//" |
 sort >observed.oids &&
@@ -93,8 +93,8 @@ test_expect_success 'verify diff causes dynamic
object fetch' '
 test_expect_success 'verify blame causes dynamic object fetch' '
 git -C pc1 blame origin/master -- file.1.txt >observed.blame &&
 test_cmp expect.blame observed.blame &&
-git -C pc1 rev-list --quiet --objects --missing=print >observed \
-master..origin/master &&
+git -C pc1 rev-list --quiet --objects --missing=print \
+master..origin/master >observed &&
 test_line_count = 0 observed
 '


[PATCH v4 7/7] tests: order arguments to git-rev-list properly

It is a common mistake to put positional arguments before flags when
invoking git-rev-list. Order the positional arguments last.

This patch skips git-rev-list invocations which include the --not flag,
since the ordering of flags and positional arguments affects the
behavior. This patch also skips invocations of git-rev-list that occur
in command substitution in which the exit code is discarded, since
fixing those properly will require a more involved cleanup.

Signed-off-by: Matthew DeVore 
---
 t/t5616-partial-clone.sh| 26 +
 t/t5702-protocol-v2.sh  |  4 +--
 t/t6112-rev-list-filters-objects.sh | 43 ++---
 3 files changed, 44 insertions(+), 29 deletions(-)

diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index fc7aeb1ab..eeedd1623 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -35,7 +35,7 @@ test_expect_success 'setup bare clone for server' '
 test_expect_success 'do partial clone 1' '
git clone --no-checkout --filter=blob:none "file://$(pwd)/srv.bare" pc1 
&&
 
-   git -C pc1 rev-list HEAD --quiet --objects --missing=print >revs &&
+   git -C pc1 rev-list --quiet --objects --missing=print >revs HEAD &&
awk -f print_1.awk revs |
sed "s/?//" |
sort >observed.oids &&
@@ -48,10 +48,10 @@ test_expect_success 'do partial clone 1' '
 
 # checkout master to force dynamic object fetch of blobs at HEAD.
 test_expect_success 'verify checkout with dynamic object fetch' '
-   git -C pc1 rev-list HEAD --quiet --objects --missing=print >observed &&
+   git -C pc1 rev-list --quiet --objects --missing=print HEAD >observed &&
test_line_count = 4 observed &&
git -C pc1 checkout master &&
-   git -C pc1 rev-list HEAD --quiet --objects --missing=print >observed &&
+   git -C pc1 rev-list --quiet --objects --missing=print HEAD >observed &&
test_line_count = 0 observed
 '
 
@@ -74,7 +74,8 @@ test_expect_success 'push new commits to server' '
 # have the new blobs.
 test_expect_success 'partial fetch inherits filter settings' '
git -C pc1 fetch origin &&
-   git -C pc1 rev-list master..origin/master --quiet --objects 
--missing=print >observed &&
+   git -C pc1 rev-list --quiet --objects --missing=print \
+   master..origin/master >observed &&
test_line_count = 5 observed
 '
 
@@ -82,7 +83,8 @@ test_expect_success 'partial fetch inherits filter settings' '
 # we should only get 1 new blob (for the file in origin/master).
 test_expect_success 'verify diff causes dynamic object fetch' '
git -C pc1 diff master..origin/master -- file.1.txt &&
-   git -C pc1 rev-list master..origin/master --quiet --objects 
--missing=print >observed &&
+   git -C pc1 rev-list --quiet --objects --missing=print \
+master..origin/master >observed &&
test_line_count = 4 observed
 '
 
@@ -91,7 +93,8 @@ test_expect_success 'verify diff causes dynamic object fetch' 
'
 test_expect_success 'verify blame causes dynamic object fetch' '
git -C pc1 blame origin/master -- file.1.txt >observed.blame &&
test_cmp expect.blame observed.blame &&
-   git -C pc1 rev-list master..origin/master --quiet --objects 
--missing=print >observed &&
+   git -C pc1 rev-list --quiet --objects --missing=print >observed \
+   master..origin/master &&
test_line_count = 0 observed
 '
 
@@ -111,7 +114,8 @@ test_expect_success 'push new commits to server for 
file.2.txt' '
 # Verify we have all the new blobs.
 test_expect_success 'override inherited filter-spec using --no-filter' '
git -C pc1 fetch --no-filter origin &&
-   git -C pc1 rev-list master..origin/master --quiet --objects 
--missing=print >observed &&
+   git -C pc1 rev-list --quiet --objects --missing=print \
+   master..origin/master >observed &&
test_line_count = 0 observed
 '
 
@@ -133,8 +137,8 @@ test_expect_success 'push new commits to server for 
file.3.txt' '
 test_expect_success 'manual prefetch of missing objects' '
git -C pc1 fetch --filter=blob:none origin &&
 
-   git -C pc1 rev-list master..origin/master --quiet --objects 
--missing=print \
-   >revs &&
+   git -C pc1 rev-list --quiet --objects --missing=print \
+master..origin/master >revs &&
awk -f print_1.awk revs |
sed "s/?//" |
sort >observed.oids &&
@@ -142,8 +146,8 @@ test_expect_success 'manual prefetch of missing objects' '
test_line_count = 6 observed.oids &&
git -C pc1 fetch-pack --stdin "file://$(pwd)/srv.bare" revs &&
+   git -C pc1 rev-list --quiet --objects --missing=print \
+   master..origin/master >revs &&
awk -f print_1.awk revs |
sed "s/?//" |
sort >observed.oids &&
diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index 54727450b..11a84efff 100755
--- 

[PATCH v4 4/7] t/*: fix ordering of expected/observed arguments

Fix various places where the ordering was obviously wrong, meaning it
was easy to find with grep.

Signed-off-by: Matthew DeVore 
---
 t/t-basic.sh   |  2 +-
 t/t0021-conversion.sh  |  4 +--
 t/t1300-config.sh  |  4 +--
 t/t1303-wacky-config.sh|  4 +--
 t/t2101-update-index-reupdate.sh   |  2 +-
 t/t3200-branch.sh  |  2 +-
 t/t3320-notes-merge-worktrees.sh   |  4 +--
 t/t3400-rebase.sh  |  8 +++---
 t/t3417-rebase-whitespace-fix.sh   |  6 ++---
 t/t3702-add-edit.sh|  4 +--
 t/t3903-stash.sh   |  8 +++---
 t/t3905-stash-include-untracked.sh |  2 +-
 t/t4025-hunk-header.sh |  2 +-
 t/t4117-apply-reject.sh|  6 ++---
 t/t4124-apply-ws-rule.sh   | 30 +++
 t/t4138-apply-ws-expansion.sh  |  2 +-
 t/t5317-pack-objects-filter-objects.sh | 34 +-
 t/t5318-commit-graph.sh|  2 +-
 t/t5701-git-serve.sh   | 14 +--
 t/t5702-protocol-v2.sh | 10 
 t/t6023-merge-file.sh  | 12 -
 t/t6027-merge-binary.sh|  4 +--
 t/t6031-merge-filemode.sh  |  2 +-
 t/t6112-rev-list-filters-objects.sh| 24 +-
 t/t7201-co.sh  |  4 +--
 t/t7406-submodule-update.sh|  8 +++---
 t/t7800-difftool.sh|  2 +-
 t/t9100-git-svn-basic.sh   |  2 +-
 t/t9133-git-svn-nested-git-repo.sh |  6 ++---
 t/t9600-cvsimport.sh   |  2 +-
 t/t9603-cvsimport-patchsets.sh |  4 +--
 t/t9604-cvsimport-timestamps.sh|  4 +--
 32 files changed, 112 insertions(+), 112 deletions(-)

diff --git a/t/t-basic.sh b/t/t-basic.sh
index 850f651e4..224c098a8 100755
--- a/t/t-basic.sh
+++ b/t/t-basic.sh
@@ -1018,7 +1018,7 @@ test_expect_success SHA1 'validate git diff-files output 
for a know cache/work t
 :12 12 6649a1ebe9e9f1c553b66f5a6e74136a07ccc57c 
 M path3/subp3/file3sym
 EOF
git diff-files >current &&
-   test_cmp current expected
+   test_cmp expected current
 '
 
 test_expect_success 'git update-index --refresh should succeed' '
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 308cd28f3..fd5f1ac64 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -166,10 +166,10 @@ test_expect_success expanded_in_repo '
rm -f expanded-keywords expanded-keywords-crlf &&
 
git checkout -- expanded-keywords &&
-   test_cmp expanded-keywords expected-output &&
+   test_cmp expected-output expanded-keywords &&
 
git checkout -- expanded-keywords-crlf &&
-   test_cmp expanded-keywords-crlf expected-output-crlf
+   test_cmp expected-output-crlf expanded-keywords-crlf
 '
 
 # The use of %f in a filter definition is expanded to the path to
diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index 5869d6cb6..e2cd50ecf 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -1001,7 +1001,7 @@ EOF
 
 test_expect_success 'value continued on next line' '
git config --list > result &&
-   test_cmp result expect
+   test_cmp expect result
 '
 
 cat > .git/config <<\EOF
@@ -1882,7 +1882,7 @@ test_expect_success '--replace-all does not invent 
newlines' '
Qkey = b
EOF
git config --replace-all abc.key b &&
-   test_cmp .git/config expect
+   test_cmp expect .git/config
 '
 
 test_done
diff --git a/t/t1303-wacky-config.sh b/t/t1303-wacky-config.sh
index 3b92083e1..e664e 100755
--- a/t/t1303-wacky-config.sh
+++ b/t/t1303-wacky-config.sh
@@ -14,7 +14,7 @@ setup() {
 check() {
echo "$2" >expected
git config --get "$1" >actual 2>&1
-   test_cmp actual expected
+   test_cmp expected actual
 }
 
 # 'check section.key regex value' verifies that the entry for
@@ -22,7 +22,7 @@ check() {
 check_regex() {
echo "$3" >expected
git config --get "$1" "$2" >actual 2>&1
-   test_cmp actual expected
+   test_cmp expected actual
 }
 
 test_expect_success 'modify same key' '
diff --git a/t/t2101-update-index-reupdate.sh b/t/t2101-update-index-reupdate.sh
index 685ec4563..6c32d42c8 100755
--- a/t/t2101-update-index-reupdate.sh
+++ b/t/t2101-update-index-reupdate.sh
@@ -73,7 +73,7 @@ test_expect_success 'update-index --update from subdir' '
100644 $(git hash-object dir1/file3) 0  dir1/file3
100644 $file2 0 file2
EOF
-   test_cmp current expected
+   test_cmp expected current
 '
 
 test_expect_success 'update-index --update with pathspec' '
diff --git a/t/t3200-branch.sh b/t/t3200-branch.sh
index 93f21ab07..478b82cf9 100755
--- a/t/t3200-branch.sh
+++ b/t/t3200-branch.sh
@@ -1221,7 +1221,7 @@ test_expect_success 'use --edit-description' '
EOF
 

[PATCH v4 6/7] t9109: don't swallow Git errors upstream of pipes

'git ... | foo' will mask any errors or crashes in git, so split up such
pipes in this file.

One testcase uses several separate pipe sequences in a row which are
awkward to split up. Wrap the split-up pipe in a function so the
awkwardness is not repeated. Also change that testcase's surrounding
quotes from double to single to avoid premature string interpolation.

Signed-off-by: Matthew DeVore 
---
 t/t9101-git-svn-props.sh | 31 +++
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/t/t9101-git-svn-props.sh b/t/t9101-git-svn-props.sh
index 8cba331fc..c26c4b092 100755
--- a/t/t9101-git-svn-props.sh
+++ b/t/t9101-git-svn-props.sh
@@ -190,17 +190,21 @@ EOF
 # This test can be improved: since all the svn:ignore contain the same
 # pattern, it can pass even though the propget did not execute on the
 # right directory.
-test_expect_success 'test propget' "
-   git svn propget svn:ignore . | cmp - prop.expect &&
+test_expect_success 'test propget' '
+   test_propget () {
+   git svn propget $1 $2 >actual &&
+   cmp $3 actual
+   } &&
+   test_propget svn:ignore . prop.expect &&
cd deeply &&
-   git svn propget svn:ignore . | cmp - ../prop.expect &&
-   git svn propget svn:entry:committed-rev nested/directory/.keep \
- | cmp - ../prop2.expect &&
-   git svn propget svn:ignore .. | cmp - ../prop.expect &&
-   git svn propget svn:ignore nested/ | cmp - ../prop.expect &&
-   git svn propget svn:ignore ./nested | cmp - ../prop.expect &&
-   git svn propget svn:ignore .././deeply/nested | cmp - ../prop.expect
-   "
+   test_propget svn:ignore . ../prop.expect &&
+   test_propget svn:entry:committed-rev nested/directory/.keep \
+   ../prop2.expect &&
+   test_propget svn:ignore .. ../prop.expect &&
+   test_propget svn:ignore nested/ ../prop.expect &&
+   test_propget svn:ignore ./nested ../prop.expect &&
+   test_propget svn:ignore .././deeply/nested ../prop.expect
+   '
 
 cat >prop.expect <<\EOF
 Properties on '.':
@@ -219,8 +223,11 @@ Properties on 'nested/directory/.keep':
 EOF
 
 test_expect_success 'test proplist' "
-   git svn proplist . | cmp - prop.expect &&
-   git svn proplist nested/directory/.keep | cmp - prop2.expect
+   git svn proplist . >actual &&
+   cmp prop.expect actual &&
+
+   git svn proplist nested/directory/.keep >actual &&
+   cmp prop2.expect actual
"
 
 test_done
-- 
2.19.0.605.g01d371f741-goog



[PATCH v4 5/7] tests: don't swallow Git errors upstream of pipes

Some pipes in tests lose the exit code of git processes, which can mask
unexpected behavior like crashes. Split these pipes up so that git
commands are only at the end of pipes rather than the beginning or
middle.

The violations fixed in this patch were found in the process of fixing
pipe placement in a prior patch.

Signed-off-by: Matthew DeVore 
---
 t/t5317-pack-objects-filter-objects.sh | 156 +
 t/t5616-partial-clone.sh   |  14 ++-
 t/t6112-rev-list-filters-objects.sh| 103 
 t/t9101-git-svn-props.sh   |   3 +-
 4 files changed, 143 insertions(+), 133 deletions(-)

diff --git a/t/t5317-pack-objects-filter-objects.sh 
b/t/t5317-pack-objects-filter-objects.sh
index c093eb891..2e718f0bd 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -20,8 +20,9 @@ test_expect_success 'setup r1' '
 '
 
 test_expect_success 'verify blob count in normal packfile' '
-   git -C r1 ls-files -s file.1 file.2 file.3 file.4 file.5 |
-   awk -f print_2.awk |
+   git -C r1 ls-files -s file.1 file.2 file.3 file.4 file.5 \
+   >ls_files_result &&
+   awk -f print_2.awk ls_files_result |
sort >expected &&
 
git -C r1 pack-objects --rev --stdout >all.pack <<-EOF &&
@@ -29,8 +30,8 @@ test_expect_success 'verify blob count in normal packfile' '
EOF
git -C r1 index-pack ../all.pack &&
 
-   git -C r1 verify-pack -v ../all.pack |
-   grep blob |
+   git -C r1 verify-pack -v ../all.pack >verify_result &&
+   grep blob verify_result |
awk -f print_1.awk |
sort >observed &&
 
@@ -43,8 +44,8 @@ test_expect_success 'verify blob:none packfile has no blobs' '
EOF
git -C r1 index-pack ../filter.pack &&
 
-   git -C r1 verify-pack -v ../filter.pack |
-   grep blob |
+   git -C r1 verify-pack -v ../filter.pack >verify_result &&
+   grep blob verify_result |
awk -f print_1.awk |
sort >observed &&
 
@@ -53,13 +54,13 @@ test_expect_success 'verify blob:none packfile has no 
blobs' '
 '
 
 test_expect_success 'verify normal and blob:none packfiles have same 
commits/trees' '
-   git -C r1 verify-pack -v ../all.pack |
-   grep -E "commit|tree" |
+   git -C r1 verify-pack -v ../all.pack >verify_result &&
+   grep -E "commit|tree" verify_result |
awk -f print_1.awk |
sort >expected &&
 
-   git -C r1 verify-pack -v ../filter.pack |
-   grep -E "commit|tree" |
+   git -C r1 verify-pack -v ../filter.pack >verify_result &&
+   grep -E "commit|tree" verify_result |
awk -f print_1.awk |
sort >observed &&
 
@@ -82,8 +83,8 @@ test_expect_success 'setup r2' '
 '
 
 test_expect_success 'verify blob count in normal packfile' '
-   git -C r2 ls-files -s large.1000 large.1 |
-   awk -f print_2.awk |
+   git -C r2 ls-files -s large.1000 large.1 >ls_files_result &&
+   awk -f print_2.awk ls_files_result |
sort >expected &&
 
git -C r2 pack-objects --rev --stdout >all.pack <<-EOF &&
@@ -91,8 +92,8 @@ test_expect_success 'verify blob count in normal packfile' '
EOF
git -C r2 index-pack ../all.pack &&
 
-   git -C r2 verify-pack -v ../all.pack |
-   grep blob |
+   git -C r2 verify-pack -v ../all.pack >verify_result &&
+   grep blob verify_result |
awk -f print_1.awk |
sort >observed &&
 
@@ -105,8 +106,8 @@ test_expect_success 'verify blob:limit=500 omits all blobs' 
'
EOF
git -C r2 index-pack ../filter.pack &&
 
-   git -C r2 verify-pack -v ../filter.pack |
-   grep blob |
+   git -C r2 verify-pack -v ../filter.pack >verify_result &&
+   grep blob verify_result |
awk -f print_1.awk |
sort >observed &&
 
@@ -120,8 +121,8 @@ test_expect_success 'verify blob:limit=1000' '
EOF
git -C r2 index-pack ../filter.pack &&
 
-   git -C r2 verify-pack -v ../filter.pack |
-   grep blob |
+   git -C r2 verify-pack -v ../filter.pack >verify_result &&
+   grep blob verify_result |
awk -f print_1.awk |
sort >observed &&
 
@@ -130,8 +131,8 @@ test_expect_success 'verify blob:limit=1000' '
 '
 
 test_expect_success 'verify blob:limit=1001' '
-   git -C r2 ls-files -s large.1000 |
-   awk -f print_2.awk |
+   git -C r2 ls-files -s large.1000 >ls_files_result &&
+   awk -f print_2.awk ls_files_result |
sort >expected &&
 
git -C r2 pack-objects --rev --stdout --filter=blob:limit=1001 
>filter.pack <<-EOF &&
@@ -139,8 +140,8 @@ test_expect_success 'verify blob:limit=1001' '
EOF
git -C r2 index-pack ../filter.pack &&
 
-   git -C r2 verify-pack -v ../filter.pack |
-   grep blob |
+   git -C r2 verify-pack -v ../filter.pack >verify_result &&
+   grep blob verify_result |
awk -f print_1.awk |
sort >observed &&
 
@@ 

[PATCH v4 1/7] t/README: reformat Do, Don't, Keep in mind lists

The list of Don'ts for test writing has grown large such that it is hard
to see at a glance which section an item is in. In other words, if I
ignore a little bit of surrounding context, the "don'ts" look like
"do's."

To make the list more readable, prefix "Don't" in front of every first
sentence in the items.

Also, the "Keep in mind" list is out of place and awkward, because it
was a very short "list" beneath two very long ones, and it seemed easy
to miss under the list of "don'ts," and it only had one item. So move
this item to the list of "do's" and phrase as "Remember..."

Signed-off-by: Matthew DeVore 
---
 t/README | 40 +++-
 1 file changed, 19 insertions(+), 21 deletions(-)

diff --git a/t/README b/t/README
index 9028b47d9..85024aba6 100644
--- a/t/README
+++ b/t/README
@@ -393,13 +393,13 @@ This test harness library does the following things:
consistently when command line arguments --verbose (or -v),
--debug (or -d), and --immediate (or -i) is given.
 
-Do's, don'ts & things to keep in mind
+Do's & don'ts
 -
 
 Here are a few examples of things you probably should and shouldn't do
 when writing tests.
 
-Do:
+Here are the "do's:"
 
  - Put all code inside test_expect_success and other assertions.
 
@@ -444,16 +444,21 @@ Do:
Windows, where the shell (MSYS bash) mangles absolute path names.
For details, see the commit message of 4114156ae9.
 
-Don't:
+ - Remember that inside the 

[PATCH v4 2/7] Documentation: add shell guidelines

Add the following guideline to Documentation/CodingGuidelines:

&&, ||, and | should appear at the end of lines, not the
beginning, and the \ line continuation character should be
omitted

And the following to t/README (since it is specific to writing tests):

pipes and $(git ...) should be avoided when they swallow exit
codes of Git processes

Signed-off-by: Matthew DeVore 
---
 Documentation/CodingGuidelines | 18 ++
 t/README   | 28 
 2 files changed, 46 insertions(+)

diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
index 48aa4edfb..72967deb7 100644
--- a/Documentation/CodingGuidelines
+++ b/Documentation/CodingGuidelines
@@ -118,6 +118,24 @@ For shell scripts specifically (not exhaustive):
do this
fi
 
+ - If a command sequence joined with && or || or | spans multiple
+   lines, put each command on a separate line and put && and || and |
+   operators at the end of each line, rather than the start. This
+   means you don't need to use \ to join lines, since the above
+   operators imply the sequence isn't finished.
+
+   (incorrect)
+   grep blob verify_pack_result \
+   | awk -f print_1.awk \
+   | sort >actual &&
+   ...
+
+   (correct)
+   grep blob verify_pack_result |
+   awk -f print_1.awk |
+   sort >actual &&
+   ...
+
  - We prefer "test" over "[ ... ]".
 
  - We do not write the noiseword "function" in front of shell
diff --git a/t/README b/t/README
index 85024aba6..9a71d5732 100644
--- a/t/README
+++ b/t/README
@@ -466,6 +466,34 @@ And here are the "don'ts:"
platform commands; just use '! cmd'.  We are not in the business
of verifying that the world given to us sanely works.
 
+ - Don't use Git upstream in the non-final position in a piped chain, as
+   in:
+
+ git -C repo ls-files |
+ xargs -n 1 basename |
+ grep foo
+
+   which will discard git's exit code and may mask a crash. In the
+   above example, all exit codes are ignored except grep's.
+
+   Instead, write the output of that command to a temporary
+   file with ">" or assign it to a variable with "x=$(git ...)" rather
+   than pipe it.
+
+ - Don't use command substitution in a way that discards git's exit
+   code. When assigning to a variable, the exit code is not discarded,
+   e.g.:
+
+ x=$(git cat-file -p $sha) &&
+ ...
+
+   is OK because a crash in "git cat-file" will cause the "&&" chain
+   to fail, but:
+
+ test "refs/heads/foo" = "$(git symbolic-ref HEAD)"
+
+   is not OK and a crash in git could go undetected.
+
  - Don't use perl without spelling it as "$PERL_PATH". This is to help
our friends on Windows where the platform Perl often adds CR before
the end of line, and they bundle Git with a version of Perl that
-- 
2.19.0.605.g01d371f741-goog



[PATCH v4 3/7] tests: standardize pipe placement

Instead of using a line-continuation and pipe on the second line, take
advantage of the shell's implicit line continuation after a pipe
character.  So for example, instead of

some long line \
| next line

use

some long line |
next line

And add a blank line before and after the pipe where it aids readability
(it usually does).

This better matches the coding style documented in
Documentation/CodingGuidelines and used in shell scripts elsewhere in
the tree.

Signed-off-by: Matthew DeVore 
---
 t/lib-gpg.sh   |   9 +-
 t/t1006-cat-file.sh|   8 +-
 t/t1300-config.sh  |   5 +-
 t/t5317-pack-objects-filter-objects.sh | 330 ++---
 t/t5500-fetch-pack.sh  |   7 +-
 t/t5616-partial-clone.sh   |  32 ++-
 t/t6112-rev-list-filters-objects.sh| 203 ---
 7 files changed, 344 insertions(+), 250 deletions(-)

diff --git a/t/lib-gpg.sh b/t/lib-gpg.sh
index 3fe02876c..f1277bef4 100755
--- a/t/lib-gpg.sh
+++ b/t/lib-gpg.sh
@@ -57,9 +57,12 @@ then
echo | gpgsm --homedir "${GNUPGHOME}" 2>/dev/null \
--passphrase-fd 0 --pinentry-mode loopback \
--import "$TEST_DIRECTORY"/lib-gpg/gpgsm_cert.p12 &&
-   gpgsm --homedir "${GNUPGHOME}" 2>/dev/null -K \
-   | grep fingerprint: | cut -d" " -f4 | tr -d '\n' > \
-   ${GNUPGHOME}/trustlist.txt &&
+
+   gpgsm --homedir "${GNUPGHOME}" 2>/dev/null -K |
+   grep fingerprint: |
+   cut -d" " -f4 |
+   tr -d '\n' >"${GNUPGHOME}/trustlist.txt" &&
+
echo " S relax" >> ${GNUPGHOME}/trustlist.txt &&
(gpgconf --kill gpg-agent >/dev/null 2>&1 || : ) &&
echo hello | gpgsm --homedir "${GNUPGHOME}" >/dev/null \
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 7f19d591f..a0fa926d3 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -218,8 +218,8 @@ test_expect_success "--batch-check for a non-existent hash" 
'
 test "0042 missing
 0084 missing" = \
 "$( ( echo 0042;
- echo_without_newline 0084; ) \
-   | git cat-file --batch-check)"
+ echo_without_newline 0084; ) |
+   git cat-file --batch-check)"
 '
 
 test_expect_success "--batch for an existent and a non-existent hash" '
@@ -227,8 +227,8 @@ test_expect_success "--batch for an existent and a 
non-existent hash" '
 $tag_content
  missing" = \
 "$( ( echo $tag_sha1;
- echo_without_newline ; ) \
-   | git cat-file --batch)"
+ echo_without_newline ; ) |
+   git cat-file --batch)"
 '
 
 test_expect_success "--batch-check for an empty line" '
diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index cdf1fed5d..5869d6cb6 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -1770,8 +1770,9 @@ test_expect_success '--show-origin stdin with file 
include' '
cat >expect <<-EOF &&
file:$INCLUDE_DIR/stdin.include include
EOF
-   echo "[include]path=\"$INCLUDE_DIR\"/stdin.include" \
-   | git config --show-origin --includes --file - user.stdin 
>output &&
+   echo "[include]path=\"$INCLUDE_DIR\"/stdin.include" |
+   git config --show-origin --includes --file - user.stdin >output &&
+
test_cmp expect output
 '
 
diff --git a/t/t5317-pack-objects-filter-objects.sh 
b/t/t5317-pack-objects-filter-objects.sh
index 6710c8bc8..ce69148ec 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -20,17 +20,20 @@ test_expect_success 'setup r1' '
 '
 
 test_expect_success 'verify blob count in normal packfile' '
-   git -C r1 ls-files -s file.1 file.2 file.3 file.4 file.5 \
-   | awk -f print_2.awk \
-   | sort >expected &&
+   git -C r1 ls-files -s file.1 file.2 file.3 file.4 file.5 |
+   awk -f print_2.awk |
+   sort >expected &&
+
git -C r1 pack-objects --rev --stdout >all.pack <<-EOF &&
HEAD
EOF
git -C r1 index-pack ../all.pack &&
-   git -C r1 verify-pack -v ../all.pack \
-   | grep blob \
-   | awk -f print_1.awk \
-   | sort >observed &&
+
+   git -C r1 verify-pack -v ../all.pack |
+   grep blob |
+   awk -f print_1.awk |
+   sort >observed &&
+
test_cmp observed expected
 '
 
@@ -39,23 +42,27 @@ test_expect_success 'verify blob:none packfile has no 
blobs' '
HEAD
EOF
git -C r1 index-pack ../filter.pack &&
-   git -C r1 verify-pack -v ../filter.pack \

[PATCH v4 0/7] Clean up tests for test_cmp arg ordering and pipe placement

Apply suggestions by Szeder and Eric from last version of the patch,
namely:
 - Use a sensible example for how one can unknowingly drop a Git exit
   code in tests.
 - Fixed the message for the second commit.
 - Move the test-specific coding guidelines to t/README from
   Documentation/CodingGuidelines.
 - Particularly forbid invoking Git in ways that will mask crashes, but
   don't say this for non-Git commands, since validating them isn't or
   job.
 - Including a guideline about dropping exit codes for command
   substitution.
 - Reformat Do/Don't lists in Documentation/CodingGuidelines, and make my
   added points have consistent format. This is a new commit.

There may be other changes I didn't list above, but I would have
mentioned them in prior mails.

I created a new commit which cleans up invocations of git-rev-list so
that positional arguments are last, since my other patchset modifies the
tests where most of these mistakes are present, and I want the tests
added in the other patchset to be consistent with the code around it.

Thank you,

Matthew DeVore (7):
  t/README: reformat Do, Don't, Keep in mind lists
  Documentation: add shell guidelines
  tests: standardize pipe placement
  t/*: fix ordering of expected/observed arguments
  tests: don't swallow Git errors upstream of pipes
  t9109: don't swallow Git errors upstream of pipes
  tests: order arguments to git-rev-list properly

 Documentation/CodingGuidelines |  18 ++
 t/README   |  68 +++--
 t/lib-gpg.sh   |   9 +-
 t/t-basic.sh   |   2 +-
 t/t0021-conversion.sh  |   4 +-
 t/t1006-cat-file.sh|   8 +-
 t/t1300-config.sh  |   9 +-
 t/t1303-wacky-config.sh|   4 +-
 t/t2101-update-index-reupdate.sh   |   2 +-
 t/t3200-branch.sh  |   2 +-
 t/t3320-notes-merge-worktrees.sh   |   4 +-
 t/t3400-rebase.sh  |   8 +-
 t/t3417-rebase-whitespace-fix.sh   |   6 +-
 t/t3702-add-edit.sh|   4 +-
 t/t3903-stash.sh   |   8 +-
 t/t3905-stash-include-untracked.sh |   2 +-
 t/t4025-hunk-header.sh |   2 +-
 t/t4117-apply-reject.sh|   6 +-
 t/t4124-apply-ws-rule.sh   |  30 +--
 t/t4138-apply-ws-expansion.sh  |   2 +-
 t/t5317-pack-objects-filter-objects.sh | 360 ++---
 t/t5318-commit-graph.sh|   2 +-
 t/t5500-fetch-pack.sh  |   7 +-
 t/t5616-partial-clone.sh   |  50 ++--
 t/t5701-git-serve.sh   |  14 +-
 t/t5702-protocol-v2.sh |  14 +-
 t/t6023-merge-file.sh  |  12 +-
 t/t6027-merge-binary.sh|   4 +-
 t/t6031-merge-filemode.sh  |   2 +-
 t/t6112-rev-list-filters-objects.sh| 237 +---
 t/t7201-co.sh  |   4 +-
 t/t7406-submodule-update.sh|   8 +-
 t/t7800-difftool.sh|   2 +-
 t/t9100-git-svn-basic.sh   |   2 +-
 t/t9101-git-svn-props.sh   |  34 ++-
 t/t9133-git-svn-nested-git-repo.sh |   6 +-
 t/t9600-cvsimport.sh   |   2 +-
 t/t9603-cvsimport-patchsets.sh |   4 +-
 t/t9604-cvsimport-timestamps.sh|   4 +-
 39 files changed, 568 insertions(+), 398 deletions(-)

-- 
2.19.0.605.g01d371f741-goog



Re: [PATCH 0/2] commit-graph: more leak fixes

On Wed, 3 Oct 2018 at 18:19, Derrick Stolee  wrote:
> I'm fine with squashing it in with both our sign-offs. It is the same
> idea, it just requires a different set of arguments to hit it. I'll
> adjust the commit message as necessary.

> I'll add your PATCH 2/2 to my v2. Thanks!

Cool, thanks a lot.

Martin


Re: [PATCH 0/2] commit-graph: more leak fixes


On 10/3/2018 11:36 AM, Martin Ågren wrote:

Hi Derrick,

These two patches on top of yours make the test suite (i.e., the subset
of it that I run) leak-free with respect to builtin/commit-graph.c and
commit-graph.c.


Thanks!


The first could be squashed into your patch 1/2. It touches the same
function, but it requires a different usage to trigger, so squashing it
in would require broadening the scope. I understand if you don't want to
do that.
I'm fine with squashing it in with both our sign-offs. It is the same 
idea, it just requires a different set of arguments to hit it. I'll 
adjust the commit message as necessary.



If you want to pick these up as part of your re-roll in any way, shape
or form, go ahead. If not, they can go in separately, either in parallel
or after your series lands. Whatever the destiny of this posting, I'll
follow through as appropriate.


I'll add your PATCH 2/2 to my v2. Thanks!


Re: Git credentials not working

(removing git-security from CC)

On Wed, Oct 3, 2018 at 4:09 PM Dimitri Kopriwa  wrote:

> Git credentials in ~/.git-credentials and ~/.config/git/credentials are
> being removed by git upon reading.

https://git-scm.com/docs/git-credential says:

"If the action is reject, git-credential will send the description to
any configured credential helpers, which may erase any stored
credential matching the description."

So maybe this is expected.

Another possibility is that your .gitlab-ci.yml might launch scripts
writing into those files, like the before_script.sh script that is
described on:

https://stackoverflow.com/questions/50553049/is-it-possible-to-do-a-git-push-within-a-gitlab-ci-without-ssh

Could you also check which credential helper and which options are
used? For example with commands like:

$ git config -l --show-origin | grep -i cred
$ git config -l --show-origin | grep -i http
$ git config -l --show-origin | grep -i askpass
$ env | grep -i askpass


Re: [PATCH] coccicheck: process every source file at once

On Wed, Oct 3, 2018 at 8:05 AM SZEDER Gábor  wrote:
> In fact, it works finer than ever, because running 1.0.0 with this
> patch on Travis CI notices a possible memmove() -> MOVE_ARRAY()
> conversion:
>
>   https://travis-ci.org/szeder/git/jobs/436542684#L576
>
> Surprisingly, running 1.0.0 without this patch, or running 1.0.4
> locally either with or without this patch doesn't notice that
> memmove() call.  Presumably that's why Jonathan could kind-of "revert"
> my conversion from f919ffebed (Use MOVE_ARRAY, 2018-01-22) in his
> 6a1a79fd14 (object: move grafts to object parser, 2018-05-15) without
> anyone noticing.
>

That seems very odd...

Thanks,
Jake


Re: [PATCH] coccicheck: process every source file at once

On Wed, Oct 3, 2018 at 3:17 AM SZEDER Gábor  wrote:
>
> On Tue, Oct 02, 2018 at 03:55:19PM -0400, Jeff King wrote:
> > On Tue, Oct 02, 2018 at 12:16:42PM -0700, Jacob Keller wrote:
> >
> > > make coccicheck is used in order to apply coccinelle semantic patches,
> > > and see if any of the transformations found within contrib/coccinelle/
> > > can be applied to the current code base.
> > >
> > > Pass every file to a single invocation of spatch, instead of running
> > > spatch once per source file.
> > >
> > > This reduces the time required to run make coccicheck by a significant
> > > amount of time:
> > >
> > > Prior timing of make coccicheck
> > >   real6m14.090s
> > >   user25m2.606s
> > >   sys 1m22.919s
> > >
> > > New timing of make coccicheck
> > >   real1m36.580s
> > >   user7m55.933s
> > >   sys 0m18.219s
> >
> > Yay! This is a nice result.
> >
> > It's also one of the things that Julia suggested in an earlier thread.
> > One thing I wasn't quite sure about after digging into the various
> > versions (1.0.4 on Debian stable/unstable, 1.0.6 in experimental, and
> > pre-1.0.7 at the bleeding edge) was whether the old versions would be
> > similarly helped (or work at all).
> >
> > I just replicated your results with 1.0.4.deb-3+b2 from Debian stable.
> > It's possible there are older versions floating around, but for
> > something developer-only like this, I think "in Debian stable" is a
> > reasonable enough cutoff.
>
> Linux build jobs on Travis CI run Ubuntu Trusty 14.04 LTS, and
> therefore our static analysis build job still runs 1.0.0~rc19.deb-3.
>
> This patch appears to work fine with that version, too, though note
> that I also changed the build job to don't run two parallel jobs; for
> the reason see below.
>
> > > This is nearly a 4x decrease in the time required to run make
> > > coccicheck. This is due to the overhead of restarting spatch for every
> > > file. By processing all files at once, we can amortize this startup cost
> > > across the total number of files, rather than paying it once per file.
> >
> > One minor side effect is that we lose the opportunity to parallelize
> > quite as much. However, I think the reduction in total CPU makes it well
> > worth that (and we still have 8 cocci files and growing, which should
> > keep at least 8 cores busy).
>
> One major side effect, however, is the drastically increased memory
> usage resulting from processing all files at once.  With your patch on
> top of current master:
>
>   $ for cocci in contrib/coccinelle/*.cocci ; do command time -f 'max RSS: 
> %MkB' make ${cocci}.patch ; done
>SPATCH contrib/coccinelle/array.cocci
>   max RSS: 1537068kB
>SPATCH contrib/coccinelle/commit.cocci
>   max RSS: 1510920kB
>SPATCH contrib/coccinelle/free.cocci
>   max RSS: 1393712kB
>SPATCH contrib/coccinelle/object_id.cocci
>SPATCH result: contrib/coccinelle/object_id.cocci.patch
>   max RSS: 1831700kB
>SPATCH contrib/coccinelle/qsort.cocci
>   max RSS: 1384960kB
>SPATCH contrib/coccinelle/strbuf.cocci
>   max RSS: 1395800kB
>SPATCH contrib/coccinelle/swap.cocci
>   max RSS: 1393620kB
>SPATCH contrib/coccinelle/xstrdup_or_null.cocci
>   max RSS: 1371716kB
>
> Without your patch the max RSS lingers around 87048kB - 101980kB,
> meaning ~15x - 18x increase
>

Quite a significant increase.


> This might cause quite the surprise to developers who are used to run
> 'make -jN coccicheck'; my tiny laptop came to a grinding halt with
> -j4.
>
> I think this side effect should be mentioned in the commit message.
>

Yes I agree. I hadn't considered the memory problem.

Thanks,
Jake


[PATCH 0/2] commit-graph: more leak fixes

Hi Derrick,

These two patches on top of yours make the test suite (i.e., the subset
of it that I run) leak-free with respect to builtin/commit-graph.c and
commit-graph.c.

The first could be squashed into your patch 1/2. It touches the same
function, but it requires a different usage to trigger, so squashing it
in would require broadening the scope. I understand if you don't want to
do that.

If you want to pick these up as part of your re-roll in any way, shape
or form, go ahead. If not, they can go in separately, either in parallel
or after your series lands. Whatever the destiny of this posting, I'll
follow through as appropriate.

Martin

Martin Ågren (2):
  commit-graph: free `struct packed_git` after closing it
  builtin/commit-graph.c: UNLEAK variables

 builtin/commit-graph.c | 11 ++-
 commit-graph.c |  1 +
 2 files changed, 7 insertions(+), 5 deletions(-)

-- 
2.19.0.329.g76f2f5c1e3



[PATCH 2/2] builtin/commit-graph.c: UNLEAK variables

`graph_verify()`, `graph_read()` and `graph_write()` do the hard work of
`cmd_commit_graph()`. As soon as these return, so does
`cmd_commit_graph()`.

`strbuf_getline()` may allocate memory in the strbuf, yet return EOF.
We need to release the strbuf or UNLEAK it. Go for the latter since we
are close to returning from `graph_write()`.

`graph_write()` also fails to free the strings in the string list. They
have been added to the list with `strdup_strings` set to 0. We could
flip `strdup_strings` before clearing the list, which is our usual hack
in situations like this. But since we are about to exit, let's just
UNLEAK the whole string list instead.

UNLEAK `graph` in `graph_verify`. While at it, and for consistency,
UNLEAK in `graph_read()` as well, and remove an unnecessary UNLEAK just
before dying.

Signed-off-by: Martin Ågren 
---
 builtin/commit-graph.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index bc0fa9ba52..66f12eb009 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -64,6 +64,7 @@ static int graph_verify(int argc, const char **argv)
if (!graph)
return 0;
 
+   UNLEAK(graph);
return verify_commit_graph(the_repository, graph);
 }
 
@@ -89,10 +90,8 @@ static int graph_read(int argc, const char **argv)
graph_name = get_commit_graph_filename(opts.obj_dir);
graph = load_commit_graph_one(graph_name);
 
-   if (!graph) {
-   UNLEAK(graph_name);
+   if (!graph)
die("graph file %s does not exist", graph_name);
-   }
 
FREE_AND_NULL(graph_name);
 
@@ -115,7 +114,7 @@ static int graph_read(int argc, const char **argv)
printf(" large_edges");
printf("\n");
 
-   free_commit_graph(graph);
+   UNLEAK(graph);
 
return 0;
 }
@@ -166,6 +165,8 @@ static int graph_write(int argc, const char **argv)
pack_indexes = 
if (opts.stdin_commits)
commit_hex = 
+
+   UNLEAK(buf);
}
 
write_commit_graph(opts.obj_dir,
@@ -174,7 +175,7 @@ static int graph_write(int argc, const char **argv)
   opts.append,
   1);
 
-   string_list_clear(, 0);
+   UNLEAK(lines);
return 0;
 }
 
-- 
2.19.0.329.g76f2f5c1e3



[PATCH 1/2] commit-graph: free `struct packed_git` after closing it

`close_pack(p)` does not free the memory which `p` points to, so follow
up with a call to `free(p)`. All other users of `close_pack()` look ok.

Signed-off-by: Martin Ågren 
---
 commit-graph.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/commit-graph.c b/commit-graph.c
index 3d644fddc0..9b481bcd06 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -766,6 +766,7 @@ void write_commit_graph(const char *obj_dir,
die(_("error opening index for %s"), 
packname.buf);
for_each_object_in_pack(p, add_packed_commits, , 
0);
close_pack(p);
+   free(p);
}
stop_progress();
strbuf_release();
-- 
2.19.0.329.g76f2f5c1e3



Re: We should add a "git gc --auto" after "git clone" due to commit graph



On Wed, Oct 03 2018, SZEDER Gábor wrote:

> On Wed, Oct 03, 2018 at 04:22:12PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Wed, Oct 03 2018, SZEDER Gábor wrote:
>>
>> > On Wed, Oct 03, 2018 at 04:01:40PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> >>
>> >> On Wed, Oct 03 2018, SZEDER Gábor wrote:
>> >>
>> >> > On Wed, Oct 03, 2018 at 03:23:57PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> >> >> Don't have time to patch this now, but thought I'd send a note / RFC
>> >> >> about this.
>> >> >>
>> >> >> Now that we have the commit graph it's nice to be able to set
>> >> >> e.g. core.commitGraph=true & gc.writeCommitGraph=true in ~/.gitconfig 
>> >> >> or
>> >> >> /etc/gitconfig to apply them to all repos.
>> >> >>
>> >> >> But when I clone e.g. linux.git stuff like 'tag --contains' will be 
>> >> >> slow
>> >> >> until whenever my first "gc" kicks in, which may be quite some time if
>> >> >> I'm just using it passively.
>> >> >>
>> >> >> So we should make "git gc --auto" be run on clone,
>> >> >
>> >> > There is no garbage after 'git clone'...
>> >>
>> >> "git gc" is really "git gc-or-create-indexes" these days.
>> >
>> > Because it happens to be convenient to create those indexes at
>> > gc-time.  But that should not be an excuse to run gc when by
>> > definition no gc is needed.
>>
>> Ah, I thought you just had an objection to the "gc" name being used for
>> non-gc stuff,
>
> But you thought right, I do have an objection against that.  'git gc'
> should, well, collect garbage.  Any non-gc stuff is already violating
> separation of concerns.

Ever since git-gc was added back in 30f610b7b0 ("Create 'git gc' to
perform common maintenance operations.", 2006-12-27) it has been
described as:

git-gc - Cleanup unnecessary files and optimize the local repository

Creating these indexes like the commit-graph falls under "optimize the
local repository", and 3rd party tools (e.g. the repo tool doing this
came up on list recently) have been calling "gc --auto" with this
assumption.

>>  but if you mean we shouldn't do a giant repack right after
>> clone I agree.
>
> And, I also mean that since 'git clone' knows that there can't
> possibly be any garbage in the first place, then it shouldn't call 'gc
> --auto' at all.  However, since it also knows that there is a lot of
> new stuff, then it should create a commit-graph if enabled.

Is this something you think just because the tool isn't called
git-gc-and-optimzie, or do you think this regardless of what it's
called?

I don't see how splitting up the entry points for "detect if we need to
cleanup or optimize the repo" leaves us with a better codebase for the
reasons noted in
https://public-inbox.org/git/87pnwrgll2@evledraar.gmail.com/


Re: [PATCH] coccicheck: process every source file at once

On Wed, Oct 03, 2018 at 12:16:58PM +0200, SZEDER Gábor wrote:
> On Tue, Oct 02, 2018 at 03:55:19PM -0400, Jeff King wrote:
> > On Tue, Oct 02, 2018 at 12:16:42PM -0700, Jacob Keller wrote:
> > 
> > > make coccicheck is used in order to apply coccinelle semantic patches,
> > > and see if any of the transformations found within contrib/coccinelle/
> > > can be applied to the current code base.
> > > 
> > > Pass every file to a single invocation of spatch, instead of running
> > > spatch once per source file.
> > > 
> > > This reduces the time required to run make coccicheck by a significant
> > > amount of time:
> > > 
> > > Prior timing of make coccicheck
> > >   real6m14.090s
> > >   user25m2.606s
> > >   sys 1m22.919s
> > > 
> > > New timing of make coccicheck
> > >   real1m36.580s
> > >   user7m55.933s
> > >   sys 0m18.219s
> > 
> > Yay! This is a nice result.
> > 
> > It's also one of the things that Julia suggested in an earlier thread.
> > One thing I wasn't quite sure about after digging into the various
> > versions (1.0.4 on Debian stable/unstable, 1.0.6 in experimental, and
> > pre-1.0.7 at the bleeding edge) was whether the old versions would be
> > similarly helped (or work at all).
> > 
> > I just replicated your results with 1.0.4.deb-3+b2 from Debian stable.
> > It's possible there are older versions floating around, but for
> > something developer-only like this, I think "in Debian stable" is a
> > reasonable enough cutoff.
> 
> Linux build jobs on Travis CI run Ubuntu Trusty 14.04 LTS, and
> therefore our static analysis build job still runs 1.0.0~rc19.deb-3.
> 
> This patch appears to work fine with that version, too,

In fact, it works finer than ever, because running 1.0.0 with this
patch on Travis CI notices a possible memmove() -> MOVE_ARRAY()
conversion:

  https://travis-ci.org/szeder/git/jobs/436542684#L576

Surprisingly, running 1.0.0 without this patch, or running 1.0.4
locally either with or without this patch doesn't notice that
memmove() call.  Presumably that's why Jonathan could kind-of "revert"
my conversion from f919ffebed (Use MOVE_ARRAY, 2018-01-22) in his
6a1a79fd14 (object: move grafts to object parser, 2018-05-15) without
anyone noticing.



[PATCH 1/1] rebase -i: introduce the 'break' command

From: Johannes Schindelin 

The 'edit' command can be used to cherry-pick a commit and then
immediately drop out of the interactive rebase, with exit code 0, to let
the user amend the commit, or test it, or look around.

Sometimes this functionality would come in handy *without*
cherry-picking a commit, e.g. to interrupt the interactive rebase even
before cherry-picking a commit, or immediately after an 'exec' or a
'merge'.

This commit introduces that functionality, as the spanking new 'break' command.

Suggested-by: Stefan Beller 
Signed-off-by: Johannes Schindelin 
---
 rebase-interactive.c   | 1 +
 sequencer.c| 7 ++-
 t/lib-rebase.sh| 2 +-
 t/t3418-rebase-continue.sh | 9 +
 4 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/rebase-interactive.c b/rebase-interactive.c
index 0f4119cbae..78f3263fc1 100644
--- a/rebase-interactive.c
+++ b/rebase-interactive.c
@@ -14,6 +14,7 @@ void append_todo_help(unsigned edit_todo, unsigned keep_empty,
 "s, squash  = use commit, but meld into previous commit\n"
 "f, fixup  = like \"squash\", but discard this commit's log message\n"
 "x, exec  = run command (the rest of the line) using shell\n"
+"b, break = stop here (continue rebase later with 'git rebase --continue')\n"
 "d, drop  = remove commit\n"
 "l, label  = label current HEAD with a name\n"
 "t, reset  = reset HEAD to a label\n"
diff --git a/sequencer.c b/sequencer.c
index 8dd6db5a01..b209f8af46 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -1416,6 +1416,7 @@ enum todo_command {
TODO_SQUASH,
/* commands that do something else than handling a single commit */
TODO_EXEC,
+   TODO_BREAK,
TODO_LABEL,
TODO_RESET,
TODO_MERGE,
@@ -1437,6 +1438,7 @@ static struct {
{ 'f', "fixup" },
{ 's', "squash" },
{ 'x', "exec" },
+   { 'b', "break" },
{ 'l', "label" },
{ 't', "reset" },
{ 'm', "merge" },
@@ -1964,7 +1966,7 @@ static int parse_insn_line(struct todo_item *item, const 
char *bol, char *eol)
padding = strspn(bol, " \t");
bol += padding;
 
-   if (item->command == TODO_NOOP) {
+   if (item->command == TODO_NOOP || item->command == TODO_BREAK) {
if (bol != eol)
return error(_("%s does not accept arguments: '%s'"),
 command_to_string(item->command), bol);
@@ -3293,6 +3295,9 @@ static int pick_commits(struct todo_list *todo_list, 
struct replay_opts *opts)
unlink(rebase_path_stopped_sha());
unlink(rebase_path_amend());
delete_ref(NULL, "REBASE_HEAD", NULL, REF_NO_DEREF);
+
+   if (item->command == TODO_BREAK)
+   break;
}
if (item->command <= TODO_SQUASH) {
if (is_rebase_i(opts))
diff --git a/t/lib-rebase.sh b/t/lib-rebase.sh
index 25a77ee5cb..584604ee63 100644
--- a/t/lib-rebase.sh
+++ b/t/lib-rebase.sh
@@ -49,7 +49,7 @@ set_fake_editor () {
case $line in
squash|fixup|edit|reword|drop)
action="$line";;
-   exec*)
+   exec*|break)
echo "$line" | sed 's/_/ /g' >> "$1";;
"#")
echo '# comment' >> "$1";;
diff --git a/t/t3418-rebase-continue.sh b/t/t3418-rebase-continue.sh
index c145dbac38..185a491089 100755
--- a/t/t3418-rebase-continue.sh
+++ b/t/t3418-rebase-continue.sh
@@ -239,5 +239,14 @@ test_rerere_autoupdate -m
 GIT_SEQUENCE_EDITOR=: && export GIT_SEQUENCE_EDITOR
 test_rerere_autoupdate -i
 test_rerere_autoupdate --preserve-merges
+unset GIT_SEQUENCE_EDITOR
+
+test_expect_success 'the todo command "break" works' '
+   rm -f execed &&
+   FAKE_LINES="break exec_>execed" git rebase -i HEAD &&
+   test_path_is_missing execed &&
+   git rebase --continue &&
+   test_path_is_file execed
+'
 
 test_done
-- 
gitgitgadget


[PATCH 0/1] rebase -i: introduce the 'break' command

Stefan asked a while back
[https://public-inbox.org/git/20180118183618.39853-3-sbel...@google.com/] 
for a todo command in interactive rebases that would essentially perform the
"stop" part of the edit command, but without the "pick" part: interrupt the
interactive rebase, with exit code 0, letting the user do things and stuff
and look around, with the idea of continuing the rebase later (using git
rebase --continue).

This patch introduces that, based on ag/rebase-i-in-c.

Johannes Schindelin (1):
  rebase -i: introduce the 'break' command

 rebase-interactive.c   | 1 +
 sequencer.c| 7 ++-
 t/lib-rebase.sh| 2 +-
 t/t3418-rebase-continue.sh | 9 +
 4 files changed, 17 insertions(+), 2 deletions(-)


base-commit: b3fe2e1f8cbf5522e7ba49db76bff38f204e2093
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-43%2Fdscho%2Frebase-i-break-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-43/dscho/rebase-i-break-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/43
-- 
gitgitgadget


Re: We should add a "git gc --auto" after "git clone" due to commit graph

On Wed, Oct 03, 2018 at 04:22:12PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Oct 03 2018, SZEDER Gábor wrote:
> 
> > On Wed, Oct 03, 2018 at 04:01:40PM +0200, Ævar Arnfjörð Bjarmason wrote:
> >>
> >> On Wed, Oct 03 2018, SZEDER Gábor wrote:
> >>
> >> > On Wed, Oct 03, 2018 at 03:23:57PM +0200, Ævar Arnfjörð Bjarmason wrote:
> >> >> Don't have time to patch this now, but thought I'd send a note / RFC
> >> >> about this.
> >> >>
> >> >> Now that we have the commit graph it's nice to be able to set
> >> >> e.g. core.commitGraph=true & gc.writeCommitGraph=true in ~/.gitconfig or
> >> >> /etc/gitconfig to apply them to all repos.
> >> >>
> >> >> But when I clone e.g. linux.git stuff like 'tag --contains' will be slow
> >> >> until whenever my first "gc" kicks in, which may be quite some time if
> >> >> I'm just using it passively.
> >> >>
> >> >> So we should make "git gc --auto" be run on clone,
> >> >
> >> > There is no garbage after 'git clone'...
> >>
> >> "git gc" is really "git gc-or-create-indexes" these days.
> >
> > Because it happens to be convenient to create those indexes at
> > gc-time.  But that should not be an excuse to run gc when by
> > definition no gc is needed.
> 
> Ah, I thought you just had an objection to the "gc" name being used for
> non-gc stuff,

But you thought right, I do have an objection against that.  'git gc'
should, well, collect garbage.  Any non-gc stuff is already violating
separation of concerns.

>  but if you mean we shouldn't do a giant repack right after
> clone I agree.

And, I also mean that since 'git clone' knows that there can't
possibly be any garbage in the first place, then it shouldn't call 'gc
--auto' at all.  However, since it also knows that there is a lot of
new stuff, then it should create a commit-graph if enabled.

> I meant that "gc --auto" would learn to do a subset of
> its work, instead of the current "I have work to do, let's do all of
> pack-refs/repack/commit-graph etc.".
> 
> So we wouldn't be spending 5 minutes repacking linux.git right after
> cloning it, just ~10s generating the commit graph, and the same would
> happen if you rm'd .git/objects/info/commit-graph and ran "git commit",
> which would kick of "gc --auto" in the background and do the same thing.


Re: We should add a "git gc --auto" after "git clone" due to commit graph

On Wed, Oct 3, 2018 at 4:01 PM Ævar Arnfjörð Bjarmason  wrote:
> >> and change the
> >> need_to_gc() / cmd_gc() behavior so that we detect that the
> >> gc.writeCommitGraph=true setting is on, but we have no commit graph, and
> >> then just generate that without doing a full repack.
> >
> > Or just teach 'git clone' to run 'git commit-graph write ...'
>
> Then when adding something like the commit graph we'd need to patch both
> git-clone and git-gc, it's much more straightforward to make
> need_to_gc() more granular.

It is straightforward and misleading. If we organize the code well,
patching both would not take much more effort and it reduces wtf?
moments reading the code.
-- 
Duy


Re: We should add a "git gc --auto" after "git clone" due to commit graph



On Wed, Oct 03 2018, SZEDER Gábor wrote:

> On Wed, Oct 03, 2018 at 04:01:40PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Wed, Oct 03 2018, SZEDER Gábor wrote:
>>
>> > On Wed, Oct 03, 2018 at 03:23:57PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> >> Don't have time to patch this now, but thought I'd send a note / RFC
>> >> about this.
>> >>
>> >> Now that we have the commit graph it's nice to be able to set
>> >> e.g. core.commitGraph=true & gc.writeCommitGraph=true in ~/.gitconfig or
>> >> /etc/gitconfig to apply them to all repos.
>> >>
>> >> But when I clone e.g. linux.git stuff like 'tag --contains' will be slow
>> >> until whenever my first "gc" kicks in, which may be quite some time if
>> >> I'm just using it passively.
>> >>
>> >> So we should make "git gc --auto" be run on clone,
>> >
>> > There is no garbage after 'git clone'...
>>
>> "git gc" is really "git gc-or-create-indexes" these days.
>
> Because it happens to be convenient to create those indexes at
> gc-time.  But that should not be an excuse to run gc when by
> definition no gc is needed.

Ah, I thought you just had an objection to the "gc" name being used for
non-gc stuff, but if you mean we shouldn't do a giant repack right after
clone I agree. I meant that "gc --auto" would learn to do a subset of
its work, instead of the current "I have work to do, let's do all of
pack-refs/repack/commit-graph etc.".

So we wouldn't be spending 5 minutes repacking linux.git right after
cloning it, just ~10s generating the commit graph, and the same would
happen if you rm'd .git/objects/info/commit-graph and ran "git commit",
which would kick of "gc --auto" in the background and do the same thing.


Re: We should add a "git gc --auto" after "git clone" due to commit graph



On Wed, Oct 03 2018, Derrick Stolee wrote:

> On 10/3/2018 9:36 AM, SZEDER Gábor wrote:
>> On Wed, Oct 03, 2018 at 03:23:57PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>> Don't have time to patch this now, but thought I'd send a note / RFC
>>> about this.
>>>
>>> Now that we have the commit graph it's nice to be able to set
>>> e.g. core.commitGraph=true & gc.writeCommitGraph=true in ~/.gitconfig or
>>> /etc/gitconfig to apply them to all repos.
>>>
>>> But when I clone e.g. linux.git stuff like 'tag --contains' will be slow
>>> until whenever my first "gc" kicks in, which may be quite some time if
>>> I'm just using it passively.
>>>
>>> So we should make "git gc --auto" be run on clone,
>> There is no garbage after 'git clone'...
>
> And since there is no garbage, the gc will not write the commit-graph.

I should probably have replied to this instead of SZEDER's in
https://public-inbox.org/git/87r2h7gmd7@evledraar.gmail.com/ anyway
my 0.02 on that there.

>>
>>> and change the
>>> need_to_gc() / cmd_gc() behavior so that we detect that the
>>> gc.writeCommitGraph=true setting is on, but we have no commit graph, and
>>> then just generate that without doing a full repack.
>> Or just teach 'git clone' to run 'git commit-graph write ...'
>
> I plan to add a 'fetch.writeCommitGraph' config setting. I was waiting
> until the file is incremental (on my to-do list soon), so the write is
> fast when only adding a few commits at a time. This would cover the
> clone case, too.

It's re-arranging deck chairs on the Titanic at this point, but this
approach seems like the wrong way to go in this whole "do we have crap
to do?" git-gc state-machine.

In my mind we should have only one entry point into that, and there
shouldn't be magic like "here's the gc-ish stuff we do on
fetch". Because if we care about a bunch of new commits being added on
"fetch", that can also happen on "commit", "am", "merge", all of which
run "gc --auto" now.

Which is why I'm suggesting that we could add a sub-mode in need_to_gc()
that detects if a file we want to generate is entirely missing, which is
extendable to future formats, and the only caveat at that point is if
we'd like that subset of "gc" to block and run in the foreground in the
"clone" (or "fetch", ...) case.

And then if we have a desire to incrementally add recently added commits
to such formats, "gc --auto" could learn to consume reflogs or some
other general inventory of "stuff added since last gc", and then we
wouldn't have to instrument "fetch" specifically, the same would work
for "commit", "am", "merge" etc.


Re: We should add a "git gc --auto" after "git clone" due to commit graph

On Wed, Oct 03, 2018 at 04:01:40PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Oct 03 2018, SZEDER Gábor wrote:
> 
> > On Wed, Oct 03, 2018 at 03:23:57PM +0200, Ævar Arnfjörð Bjarmason wrote:
> >> Don't have time to patch this now, but thought I'd send a note / RFC
> >> about this.
> >>
> >> Now that we have the commit graph it's nice to be able to set
> >> e.g. core.commitGraph=true & gc.writeCommitGraph=true in ~/.gitconfig or
> >> /etc/gitconfig to apply them to all repos.
> >>
> >> But when I clone e.g. linux.git stuff like 'tag --contains' will be slow
> >> until whenever my first "gc" kicks in, which may be quite some time if
> >> I'm just using it passively.
> >>
> >> So we should make "git gc --auto" be run on clone,
> >
> > There is no garbage after 'git clone'...
> 
> "git gc" is really "git gc-or-create-indexes" these days.

Because it happens to be convenient to create those indexes at
gc-time.  But that should not be an excuse to run gc when by
definition no gc is needed.

> >> and change the
> >> need_to_gc() / cmd_gc() behavior so that we detect that the
> >> gc.writeCommitGraph=true setting is on, but we have no commit graph, and
> >> then just generate that without doing a full repack.
> >
> > Or just teach 'git clone' to run 'git commit-graph write ...'
> 
> Then when adding something like the commit graph we'd need to patch both
> git-clone and git-gc, it's much more straightforward to make
> need_to_gc() more granular.
> 
> >> As an aside such more granular "gc" would be nice for e.g. pack-refs
> >> too. It's possible for us to just have one pack, but to have 100k loose
> >> refs.
> >>
> >> It might also be good to have some gc.autoDetachOnClone option and have
> >> it false by default, so we don't have a race condition where "clone
> >> linux && git -C linux tag --contains" is slow because the graph hasn't
> >> been generated yet, and generating the graph initially doesn't take that
> >> long compared to the time to clone a large repo (and on a small one it
> >> won't matter either way).
> >>
> >> I was going to say "also for midx", but of course after clone we have
> >> just one pack, so I can't imagine us needing this. But I can see us
> >> having other such optional side-indexes in the future generated by gc,
> >> and they'd also benefit from this.
> >>
> >> #leftoverbits


Fwd: Git credentials not working


Dear Git list,


I have tried to used git credentials within Gitlab-CI runners. I have 4 
instance of GitLab and discovered a weird bug with Git credentials when 
use within a CI process.


Please note before all that the time spend allowed me multiple time to 
check that my credentials are valid for the repository. And calling git 
fetch --tags with the full remote url that include the credentials 
always succeeded.


Tested with Git 2.11, 2.19

Git credentials in ~/.git-credentials and ~/.config/git/credentials are 
being removed by git upon reading.


This happen randomly accross my CI runner, and change that make them 
work on not related.



{ Error: Command failed: git fetch --tags 
https://git.example.com/example/some-project.git
18:25:52.554903 git.c:415   trace: built-in: git fetch 
--tags https://git.example.com/example/some-project.git
18:25:52.555234 run-command.c:637   trace: run_command: GIT_DIR=.git 
git-remote-https https://git.example.com/example/some-project.git 
https://git.example.com/example/some-project.git
18:25:52.692741 run-command.c:637   trace: run_command: 'git 
credential-store get'
18:25:52.697314 git.c:659   trace: exec: 
git-credential-store get
18:25:52.697372 run-command.c:637   trace: run_command: 
git-credential-store get
18:25:52.936024 run-command.c:637   trace: run_command: 'git 
credential-store erase'
18:25:52.940307 git.c:659   trace: exec: 
git-credential-store erase
18:25:52.940365 run-command.c:637   trace: run_command: 
git-credential-store erase

remote: HTTP Basic: Access denied
fatal: Authentication failed for 
'https://git.example.com/example/some-project.git/'



See the full question here: 
https://stackoverflow.com/questions/52614467/why-does-git-credential-store-call-git-credential-erase-and-make-my-credential-f



Can you please help me found why is git credential-store erase called ?


Best regards,



Re: We should add a "git gc --auto" after "git clone" due to commit graph



On Wed, Oct 03 2018, SZEDER Gábor wrote:

> On Wed, Oct 03, 2018 at 03:23:57PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> Don't have time to patch this now, but thought I'd send a note / RFC
>> about this.
>>
>> Now that we have the commit graph it's nice to be able to set
>> e.g. core.commitGraph=true & gc.writeCommitGraph=true in ~/.gitconfig or
>> /etc/gitconfig to apply them to all repos.
>>
>> But when I clone e.g. linux.git stuff like 'tag --contains' will be slow
>> until whenever my first "gc" kicks in, which may be quite some time if
>> I'm just using it passively.
>>
>> So we should make "git gc --auto" be run on clone,
>
> There is no garbage after 'git clone'...

"git gc" is really "git gc-or-create-indexes" these days.

>> and change the
>> need_to_gc() / cmd_gc() behavior so that we detect that the
>> gc.writeCommitGraph=true setting is on, but we have no commit graph, and
>> then just generate that without doing a full repack.
>
> Or just teach 'git clone' to run 'git commit-graph write ...'

Then when adding something like the commit graph we'd need to patch both
git-clone and git-gc, it's much more straightforward to make
need_to_gc() more granular.

>> As an aside such more granular "gc" would be nice for e.g. pack-refs
>> too. It's possible for us to just have one pack, but to have 100k loose
>> refs.
>>
>> It might also be good to have some gc.autoDetachOnClone option and have
>> it false by default, so we don't have a race condition where "clone
>> linux && git -C linux tag --contains" is slow because the graph hasn't
>> been generated yet, and generating the graph initially doesn't take that
>> long compared to the time to clone a large repo (and on a small one it
>> won't matter either way).
>>
>> I was going to say "also for midx", but of course after clone we have
>> just one pack, so I can't imagine us needing this. But I can see us
>> having other such optional side-indexes in the future generated by gc,
>> and they'd also benefit from this.
>>
>> #leftoverbits


RE: git projects with submodules in different sites - in txt format (:+(

Thank you so much, I will check this GIT bundle
Since you asked, here are my answers suffixed by MHS >>

Michele

-Original Message-
From: Philip Oakley [mailto:philipoak...@iee.org] 
Sent: Wednesday, October 3, 2018 12:00 AM
To: Michele Hallak; git@vger.kernel.org
Subject: Re: git projects with submodules in different sites - in txt format 
(:+(

On 02/10/2018 06:47, Michele Hallak wrote:
> Hi,
> 
> I am getting out of idea about how to change the methodology we are using in 
> order to ease our integration process... Close to despair, I am throwing the 
> question to you...
> 
> We have 6 infrastructure repositories [A, B, C, D, E, F ?].

> Each project [W,X,Y,Z] is composed of 4 repositories [1-4], each one 
> using one or two infrastructure repositories as sub-modules. (Not the 
> same)

e.g. W1-W4; with say B & D as submodules

MHS >> Exact

> 
> The infrastructure repositories are common to several projects and in the 
> case we have to make change in the infrastructure for a specific project, we 
> are doing it on a specific branch until properly merged.

Do you also have remotes setup that provide backup and central authority to the 
projects..?

MHS >> If you mean that we have a GIT server, as yes... 
> 
> Everything is fine (more or less) and somehow working.

Good..
> 
> Now, we have one project that will be developed in another site and with 
> another git server physically separated from the main site.

Is it networked? Internal control, external internet, sneakernet?

MHS>>> Totally disconnected - Two isolated networks.
> 
> I copied the infrastructure repositories in the new site and removed and add 
> the sub-modules in order for them to point to the url in the separated git 
> server.
> 
> Every 2 weeks, the remotely developed code has to be integrated back in the 
> main site.
> My idea was to format GIT patches, integrate in the main site, tag the whole 
> thing and ship back the integrated tagged code to the remote site.
> ... and now the nightmare starts:
yep, you have lost the validation & verification capability of Git's sha1/oid 
and DAG.

MHS>>> It's soothing to be understood. (:+)

> 
> Since the .gitmodules is different, I cannot have the same SHA and then same 
> tag and I am never sure that the integrated code is proper.

Remotes, remotes...
> 
> May be there is a simple solution that I don't know about to my problem? Is 
> there something else than GIT patches? Should I simply ship to the remote 
> site the code as is and change the submodules each time?
>

I think the solution you need is `git bundle` 
https://git-scm.com/docs/git-bundle. This is designed for the case where you do 
not have the regular git transport infrastructure. Instead it records the 
expected data that would be 'on the wire', which is then read in at the far 
end. The bundle can contain excess data to ensure overlap between site 
transmissions.

You just run the projects in the same way but add the courier step for shipping 
the CD, or some password protected archive as per your security needs.

Everything is should be just fine (more or less) and somehow it will just work. 
;-)

MHS>>> I am hopeful! Thanks a lot

--
Philip
https://stackoverflow.com/questions/11792671/how-to-git-bundle-a-complete-repo

Default Profile
***
 Please consider the environment before printing this email ! The information 
contained in this communication is proprietary to Israel Aerospace Industries 
Ltd. and/or third parties, may contain confidential or privileged information, 
and is intended only for the use of the intended addressee thereof. If you are 
not the intended addressee, please be aware that any use, disclosure, 
distribution and/or copying of this communication is strictly prohibited. If 
you receive this communication in error, please notify the sender immediately 
and delete it from your computer. Thank you. Visit us at: www.iai.co.il


Re: We should add a "git gc --auto" after "git clone" due to commit graph


On 10/3/2018 9:36 AM, SZEDER Gábor wrote:

On Wed, Oct 03, 2018 at 03:23:57PM +0200, Ævar Arnfjörð Bjarmason wrote:

Don't have time to patch this now, but thought I'd send a note / RFC
about this.

Now that we have the commit graph it's nice to be able to set
e.g. core.commitGraph=true & gc.writeCommitGraph=true in ~/.gitconfig or
/etc/gitconfig to apply them to all repos.

But when I clone e.g. linux.git stuff like 'tag --contains' will be slow
until whenever my first "gc" kicks in, which may be quite some time if
I'm just using it passively.

So we should make "git gc --auto" be run on clone,

There is no garbage after 'git clone'...


And since there is no garbage, the gc will not write the commit-graph.




and change the
need_to_gc() / cmd_gc() behavior so that we detect that the
gc.writeCommitGraph=true setting is on, but we have no commit graph, and
then just generate that without doing a full repack.

Or just teach 'git clone' to run 'git commit-graph write ...'


I plan to add a 'fetch.writeCommitGraph' config setting. I was waiting 
until the file is incremental (on my to-do list soon), so the write is 
fast when only adding a few commits at a time. This would cover the 
clone case, too.


Thanks,
-Stolee


Re: We should add a "git gc --auto" after "git clone" due to commit graph

On Wed, Oct 03, 2018 at 03:23:57PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Don't have time to patch this now, but thought I'd send a note / RFC
> about this.
> 
> Now that we have the commit graph it's nice to be able to set
> e.g. core.commitGraph=true & gc.writeCommitGraph=true in ~/.gitconfig or
> /etc/gitconfig to apply them to all repos.
> 
> But when I clone e.g. linux.git stuff like 'tag --contains' will be slow
> until whenever my first "gc" kicks in, which may be quite some time if
> I'm just using it passively.
> 
> So we should make "git gc --auto" be run on clone,

There is no garbage after 'git clone'...

> and change the
> need_to_gc() / cmd_gc() behavior so that we detect that the
> gc.writeCommitGraph=true setting is on, but we have no commit graph, and
> then just generate that without doing a full repack.

Or just teach 'git clone' to run 'git commit-graph write ...'

> As an aside such more granular "gc" would be nice for e.g. pack-refs
> too. It's possible for us to just have one pack, but to have 100k loose
> refs.
> 
> It might also be good to have some gc.autoDetachOnClone option and have
> it false by default, so we don't have a race condition where "clone
> linux && git -C linux tag --contains" is slow because the graph hasn't
> been generated yet, and generating the graph initially doesn't take that
> long compared to the time to clone a large repo (and on a small one it
> won't matter either way).
> 
> I was going to say "also for midx", but of course after clone we have
> just one pack, so I can't imagine us needing this. But I can see us
> having other such optional side-indexes in the future generated by gc,
> and they'd also benefit from this.
> 
> #leftoverbits


We should add a "git gc --auto" after "git clone" due to commit graph

Don't have time to patch this now, but thought I'd send a note / RFC
about this.

Now that we have the commit graph it's nice to be able to set
e.g. core.commitGraph=true & gc.writeCommitGraph=true in ~/.gitconfig or
/etc/gitconfig to apply them to all repos.

But when I clone e.g. linux.git stuff like 'tag --contains' will be slow
until whenever my first "gc" kicks in, which may be quite some time if
I'm just using it passively.

So we should make "git gc --auto" be run on clone, and change the
need_to_gc() / cmd_gc() behavior so that we detect that the
gc.writeCommitGraph=true setting is on, but we have no commit graph, and
then just generate that without doing a full repack.

As an aside such more granular "gc" would be nice for e.g. pack-refs
too. It's possible for us to just have one pack, but to have 100k loose
refs.

It might also be good to have some gc.autoDetachOnClone option and have
it false by default, so we don't have a race condition where "clone
linux && git -C linux tag --contains" is slow because the graph hasn't
been generated yet, and generating the graph initially doesn't take that
long compared to the time to clone a large repo (and on a small one it
won't matter either way).

I was going to say "also for midx", but of course after clone we have
just one pack, so I can't imagine us needing this. But I can see us
having other such optional side-indexes in the future generated by gc,
and they'd also benefit from this.

#leftoverbits


[PATCH v2 2/2] oidset: use khash

Reimplement oidset using khash.h in order to reduce its memory footprint
and make it faster.

Performance of a command that mainly checks for duplicate objects using
an oidset, with master and Clang 6.0.1:

  $ cmd="./git-cat-file --batch-all-objects --unordered --buffer 
--batch-check='%(objectname)'"

  $ /usr/bin/time $cmd >/dev/null
  0.22user 0.03system 0:00.25elapsed 99%CPU (0avgtext+0avgdata 
48484maxresident)k
  0inputs+0outputs (0major+11204minor)pagefaults 0swaps

  $ hyperfine "$cmd"
  Benchmark #1: ./git-cat-file --batch-all-objects --unordered --buffer 
--batch-check='%(objectname)'

Time (mean ± σ): 250.0 ms ±   6.0 ms[User: 225.9 ms, System: 23.6 
ms]

Range (min … max):   242.0 ms … 261.1 ms

And with this patch:

  $ /usr/bin/time $cmd >/dev/null
  0.14user 0.00system 0:00.15elapsed 100%CPU (0avgtext+0avgdata 
41396maxresident)k
  0inputs+0outputs (0major+8318minor)pagefaults 0swaps

  $ hyperfine "$cmd"
  Benchmark #1: ./git-cat-file --batch-all-objects --unordered --buffer 
--batch-check='%(objectname)'

Time (mean ± σ): 151.9 ms ±   4.9 ms[User: 130.5 ms, System: 21.2 
ms]

Range (min … max):   148.2 ms … 170.4 ms

Initial-patch-by: Jeff King 
Signed-off-by: Rene Scharfe 
---
 fetch-pack.c |  2 +-
 oidset.c | 34 --
 oidset.h | 36 
 3 files changed, 41 insertions(+), 31 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index 75047a4b2a..a839315726 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -536,7 +536,7 @@ static int tip_oids_contain(struct oidset *tip_oids,
 * add to "newlist" between calls, the additions will always be for
 * oids that are already in the set.
 */
-   if (!tip_oids->map.map.tablesize) {
+   if (!tip_oids->set.n_buckets) {
add_refs_to_oidset(tip_oids, unmatched);
add_refs_to_oidset(tip_oids, newlist);
}
diff --git a/oidset.c b/oidset.c
index 454c54f933..9836d427ef 100644
--- a/oidset.c
+++ b/oidset.c
@@ -3,38 +3,28 @@
 
 int oidset_contains(const struct oidset *set, const struct object_id *oid)
 {
-   if (!set->map.map.tablesize)
-   return 0;
-   return !!oidmap_get(>map, oid);
+   khiter_t pos = kh_get_oid(>set, *oid);
+   return pos != kh_end(>set);
 }
 
 int oidset_insert(struct oidset *set, const struct object_id *oid)
 {
-   struct oidmap_entry *entry;
-
-   if (!set->map.map.tablesize)
-   oidmap_init(>map, 0);
-   else if (oidset_contains(set, oid))
-   return 1;
-
-   entry = xmalloc(sizeof(*entry));
-   oidcpy(>oid, oid);
-
-   oidmap_put(>map, entry);
-   return 0;
+   int added;
+   kh_put_oid(>set, *oid, );
+   return !added;
 }
 
 int oidset_remove(struct oidset *set, const struct object_id *oid)
 {
-   struct oidmap_entry *entry;
-
-   entry = oidmap_remove(>map, oid);
-   free(entry);
-
-   return (entry != NULL);
+   khiter_t pos = kh_get_oid(>set, *oid);
+   if (pos == kh_end(>set))
+   return 0;
+   kh_del_oid(>set, pos);
+   return 1;
 }
 
 void oidset_clear(struct oidset *set)
 {
-   oidmap_free(>map, 1);
+   kh_release_oid(>set);
+   oidset_init(set, 0);
 }
diff --git a/oidset.h b/oidset.h
index 40ec5f87fe..4b90540cd4 100644
--- a/oidset.h
+++ b/oidset.h
@@ -1,7 +1,8 @@
 #ifndef OIDSET_H
 #define OIDSET_H
 
-#include "oidmap.h"
+#include "hashmap.h"
+#include "khash.h"
 
 /**
  * This API is similar to sha1-array, in that it maintains a set of object ids
@@ -15,19 +16,33 @@
  *  table overhead.
  */
 
+static inline unsigned int oid_hash(struct object_id oid)
+{
+   return sha1hash(oid.hash);
+}
+
+static inline int oid_equal(struct object_id a, struct object_id b)
+{
+   return oideq(, );
+}
+
+KHASH_INIT(oid, struct object_id, int, 0, oid_hash, oid_equal)
+
 /**
  * A single oidset; should be zero-initialized (or use OIDSET_INIT).
  */
 struct oidset {
-   struct oidmap map;
+   kh_oid_t set;
 };
 
-#define OIDSET_INIT { OIDMAP_INIT }
+#define OIDSET_INIT { { 0 } }
 
 
 static inline void oidset_init(struct oidset *set, size_t initial_size)
 {
-   oidmap_init(>map, initial_size);
+   memset(>set, 0, sizeof(set->set));
+   if (initial_size)
+   kh_resize_oid(>set, initial_size);
 }
 
 /**
@@ -58,19 +73,24 @@ int oidset_remove(struct oidset *set, const struct 
object_id *oid);
 void oidset_clear(struct oidset *set);
 
 struct oidset_iter {
-   struct oidmap_iter m_iter;
+   kh_oid_t *set;
+   khiter_t iter;
 };
 
 static inline void oidset_iter_init(struct oidset *set,
struct oidset_iter *iter)
 {
-   oidmap_iter_init(>map, >m_iter);
+   iter->set = >set;
+   iter->iter = kh_begin(iter->set);
 }
 
 static inline struct object_id *oidset_iter_next(struct oidset_iter *iter)
 {
-   struct oidmap_entry *e = 

[PATCH v2 1/2] khash: factor out kh_release_*

Add a function for releasing the khash-internal allocations, but not the
khash structure itself.  It can be used with on-stack khash structs.

Signed-off-by: Rene Scharfe 
---
1 tab = 4 spaces here.

 khash.h | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/khash.h b/khash.h
index 07b4cc2e67..d10caa0c35 100644
--- a/khash.h
+++ b/khash.h
@@ -82,11 +82,16 @@ static const double __ac_HASH_UPPER = 0.77;
SCOPE kh_##name##_t *kh_init_##name(void) { 
\
return (kh_##name##_t*)xcalloc(1, sizeof(kh_##name##_t));   
\
}   
\
+   SCOPE void kh_release_##name(kh_##name##_t *h)  
\
+   {   
\
+   free(h->flags); 
\
+   free((void *)h->keys);  
\
+   free((void *)h->vals);  
\
+   }   
\
SCOPE void kh_destroy_##name(kh_##name##_t *h)  
\
{   
\
if (h) {
\
-   free((void *)h->keys); free(h->flags);  
\
-   free((void *)h->vals);  
\
+   kh_release_##name(h);   
\
free(h);
\
}   
\
}   
\
-- 
2.19.0


[PATCH v2 0/2] oidset: use khash

Continue the discussion of speeding up oidset started in [1] here in
its own thread.

[1] https://public-inbox.org/git/20180811172350.ga2...@sigill.intra.peff.net/

The first patch does a mild refactoring to support khash structures
on the stack, and the second one converts oidset to khash.

  khash: factor out kh_release_*
  oidset: use khash

 fetch-pack.c |  2 +-
 khash.h  |  9 +++--
 oidset.c | 34 --
 oidset.h | 36 
 4 files changed, 48 insertions(+), 33 deletions(-)

-- 
2.19.0


[PATCH] sequencer: use return value of oidset_insert()

oidset_insert() returns 1 if the object ID is already in the set and
doesn't add it again, or 0 if it hadn't been present.  Make use of that
fact instead of checking with an extra oidset_contains() call.

Signed-off-by: Rene Scharfe 
---
 sequencer.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index ddb41a62d9..6387c9ee6e 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -4146,9 +4146,7 @@ static int make_script_with_merges(struct 
pretty_print_context *pp,
struct object_id *oid = >item->object.oid;
if (!oidset_contains(, oid))
continue;
-   if (!oidset_contains(_seen, oid))
-   oidset_insert(_seen, oid);
-   else
+   if (oidset_insert(_seen, oid))
label_oid(oid, "branch-point", );
}
 
-- 
2.19.0


Re: Git for games working group

Am 17.09.2018 um 17:58 schrieb Jonathan Nieder:

[...]

> Ah, thanks.  See git-config(1):
> 
>   core.bigFileThreshold
>   Files larger than this size are stored deflated,
>   without attempting delta compression.
> 
>   Default is 512 MiB on all platforms.
> 

In addition to config.bigFileThreshold you can also unset the delta
attribute for file extensions you don't want to get delta compressed.
See "git help attributes". And while you are at it, mark the files as
binary so that git diff/log don't have to guess.



inside the git folder

Hey git-team,
I am working on a plug-in for a distributed pair programming tool. To
skip the details: I was thinking about sending parts of the git folder
as a zip folder with our own Bytestream instead of using the git API.
Is there a common sense about what should and what shouldn't be done
when working with the files inside the git folder?

Kind Regards,

Christian


Re: [PATCH 1/2] commit-graph: clean up leaked memory during write


On 10/2/2018 6:44 PM, Stefan Beller wrote:

My preference is to avoid them in the name of simplicity. If you're
using "make SANITIZE=leak test" to check for leaks, it will skip these
cases. If you're using valgrind, I think these may be reported as
"reachable". But that number already isn't useful for finding real
leaks, because it includes cases like the "foo" above as well as
program-lifetime globals.

The only argument (IMHO) for such an UNLEAK() is that it annotates the
location for when somebody later changes the function to "return -1"
instead of dying. But if we are going to do such annotation, we may as
well actually call free(), which is what the "return" version will
ultimately have to do.

Heh, that was part of my reasoning why we'd want to have *something*.


I'd actually be _more_ favorable to calling free() instead of UNLEAK()
there, but I'm still mildly negative, just because it may go stale (and
our leak-checking wouldn't usefully notice these cases). Anybody
converting that die() to a return needs to re-analyze the function for
what might need to be released (and that includes non-memory bits like
descriptors, too).

Sounds reasonable, so then the consensus (between Martin, you and me)
is to drop the UNLEAK.
Thanks for the discussion here. I'll drop the UNLEAK for now and think 
about how to remove the die() calls from commit-graph.c in a later series.


Thanks,
-Stolee


[PATCH v3 2/3] git.c: handle_alias: prepend alias info when first argument is -h

Most git commands respond to -h anywhere in the command line, or at
least as a first and lone argument, by printing the usage
information. For aliases, we can provide a little more information that
might be useful in interpreting/understanding the following output by
prepending a line telling that the command is an alias, and for what.

When one invokes a simple alias, such as "cp = cherry-pick"
with -h, this results in

$ git cp -h
'cp' is aliased to 'cherry-pick'
usage: git cherry-pick [] ...
...

When the alias consists of more than one word, this provides the
additional benefit of informing the user which options are implicit in
using the alias, e.g. with "cp = cherry-pick -n":

$ git cp -h
'cp' is aliased to 'cherry-pick -n'
usage: git cherry-pick [] ...
...

For shell commands, we cannot know how it responds to -h, but printing
this line to stderr should not hurt, and can help in figuring out what
is happening in a case like

$ git sc -h
'sc' is aliased to '!somecommand'
somecommand: invalid option '-h'

Suggested-by: Jeff King 
Signed-off-by: Rasmus Villemoes 
---
 git.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/git.c b/git.c
index a6f4b44af5..0211c2d4c0 100644
--- a/git.c
+++ b/git.c
@@ -318,6 +318,9 @@ static int handle_alias(int *argcp, const char ***argv)
alias_command = (*argv)[0];
alias_string = alias_lookup(alias_command);
if (alias_string) {
+   if (*argcp > 1 && !strcmp((*argv)[1], "-h"))
+   fprintf_ln(stderr, _("'%s' is aliased to '%s'"),
+  alias_command, alias_string);
if (alias_string[0] == '!') {
struct child_process child = CHILD_PROCESS_INIT;
int nongit_ok;
-- 
2.19.0



[PATCH v3 0/3] alias help tweaks

v2: Added patches 2 and 3, made "git cmd --help" unconditionally (no
config option, no delay) redirect to the aliased command's help,
preserve pre-existing behaviour of the spelling "git help cmd".

v3: Add some additional comments in patch 1 and avoid triggering leak
checker reports. Use better wording in patch 3.

Rasmus Villemoes (3):
  help: redirect to aliased commands for "git cmd --help"
  git.c: handle_alias: prepend alias info when first argument is -h
  git-help.txt: document "git help cmd" vs "git cmd --help" for aliases

 Documentation/git-help.txt |  4 
 builtin/help.c | 34 +++---
 git.c  |  3 +++
 3 files changed, 38 insertions(+), 3 deletions(-)

-- 
2.19.0



[PATCH v3 1/3] help: redirect to aliased commands for "git cmd --help"

As discussed in the thread for v1 of this patch [1] [2], this changes the
rules for "git foo --help" when foo is an alias.

(0) When invoked as "git help foo", we continue to print the "foo is
aliased to bar" message and nothing else.

(1) If foo is an alias for a shell command, print "foo is aliased to
!bar" as usual.

(2) Otherwise, break the alias string into words, and pretend that "git
word0 --help" was called.

At least for me, getting the man page for git-cherry-pick directly with
"git cp --help" is more useful (and how I expect an alias to behave)
than the short "is aliased to" notice. It is also consistent with
"--help" generally providing more comprehensive help than "-h".

I believe that printing the "is aliased to" message also in case (2) has
value: Depending on pager setup, or if the user has help.format=web, the
message is still present immediately above the prompt when the user
quits the pager/returns to the terminal. That serves as an explanation
for why one was redirected to "man git-cherry-pick" from "git cp
--help", and if cp is actually 'cherry-pick -n', it reminds the user
that using cp has some flag implicitly set before firing off the next
command.

It also provides some useful info in case we end up erroring out, either
in the "bad alias string" check, or in the "No manual entry for gitbar"
case.

[1] https://public-inbox.org/git/20180926102636.30691-1...@rasmusvillemoes.dk/
[2] https://public-inbox.org/git/20180926184914.gc30...@sigill.intra.peff.net/

Signed-off-by: Rasmus Villemoes 
---
 builtin/help.c | 34 +++---
 1 file changed, 31 insertions(+), 3 deletions(-)

diff --git a/builtin/help.c b/builtin/help.c
index 8d4f6dd301..e0e3fe62e9 100644
--- a/builtin/help.c
+++ b/builtin/help.c
@@ -415,9 +415,37 @@ static const char *check_git_cmd(const char* cmd)
 
alias = alias_lookup(cmd);
if (alias) {
-   printf_ln(_("'%s' is aliased to '%s'"), cmd, alias);
-   free(alias);
-   exit(0);
+   const char **argv;
+   int count;
+
+   /*
+* handle_builtin() in git.c rewrites "git cmd --help"
+* to "git help --exclude-guides cmd", so we can use
+* exclude_guides to distinguish "git cmd --help" from
+* "git help cmd". In the latter case, or if cmd is an
+* alias for a shell command, just print the alias
+* definition.
+*/
+   if (!exclude_guides || alias[0] == '!') {
+   printf_ln(_("'%s' is aliased to '%s'"), cmd, alias);
+   free(alias);
+   exit(0);
+   }
+   /*
+* Otherwise, we pretend that the command was "git
+* word0 --help". We use split_cmdline() to get the
+* first word of the alias, to ensure that we use the
+* same rules as when the alias is actually
+* used. split_cmdline() modifies alias in-place.
+*/
+   fprintf_ln(stderr, _("'%s' is aliased to '%s'"), cmd, alias);
+   count = split_cmdline(alias, );
+   if (count < 0)
+   die(_("bad alias.%s string: %s"), cmd,
+   split_cmdline_strerror(count));
+   free(argv);
+   UNLEAK(alias);
+   return alias;
}
 
if (exclude_guides)
-- 
2.19.0



[PATCH v3 3/3] git-help.txt: document "git help cmd" vs "git cmd --help" for aliases

This documents the existing behaviour of "git help cmd" when cmd is an
alias, as well as providing a hint to use the "git cmd --help" form to
be taken directly to the man page for the aliased command.

Signed-off-by: Rasmus Villemoes 
---
 Documentation/git-help.txt | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/git-help.txt b/Documentation/git-help.txt
index 83d25d825a..86a6b42345 100644
--- a/Documentation/git-help.txt
+++ b/Documentation/git-help.txt
@@ -29,6 +29,10 @@ guide is brought up. The 'man' program is used by default 
for this
 purpose, but this can be overridden by other options or configuration
 variables.
 
+If an alias is given, git shows the definition of the alias on
+standard output. To get the manual page for the aliased command, use
+`git COMMAND --help`.
+
 Note that `git --help ...` is identical to `git help ...` because the
 former is internally converted into the latter.
 
-- 
2.19.0



Re: [PATCH] coccicheck: process every source file at once

On Tue, Oct 02, 2018 at 03:55:19PM -0400, Jeff King wrote:
> On Tue, Oct 02, 2018 at 12:16:42PM -0700, Jacob Keller wrote:
> 
> > make coccicheck is used in order to apply coccinelle semantic patches,
> > and see if any of the transformations found within contrib/coccinelle/
> > can be applied to the current code base.
> > 
> > Pass every file to a single invocation of spatch, instead of running
> > spatch once per source file.
> > 
> > This reduces the time required to run make coccicheck by a significant
> > amount of time:
> > 
> > Prior timing of make coccicheck
> >   real6m14.090s
> >   user25m2.606s
> >   sys 1m22.919s
> > 
> > New timing of make coccicheck
> >   real1m36.580s
> >   user7m55.933s
> >   sys 0m18.219s
> 
> Yay! This is a nice result.
> 
> It's also one of the things that Julia suggested in an earlier thread.
> One thing I wasn't quite sure about after digging into the various
> versions (1.0.4 on Debian stable/unstable, 1.0.6 in experimental, and
> pre-1.0.7 at the bleeding edge) was whether the old versions would be
> similarly helped (or work at all).
> 
> I just replicated your results with 1.0.4.deb-3+b2 from Debian stable.
> It's possible there are older versions floating around, but for
> something developer-only like this, I think "in Debian stable" is a
> reasonable enough cutoff.

Linux build jobs on Travis CI run Ubuntu Trusty 14.04 LTS, and
therefore our static analysis build job still runs 1.0.0~rc19.deb-3.

This patch appears to work fine with that version, too, though note
that I also changed the build job to don't run two parallel jobs; for
the reason see below.

> > This is nearly a 4x decrease in the time required to run make
> > coccicheck. This is due to the overhead of restarting spatch for every
> > file. By processing all files at once, we can amortize this startup cost
> > across the total number of files, rather than paying it once per file.
> 
> One minor side effect is that we lose the opportunity to parallelize
> quite as much. However, I think the reduction in total CPU makes it well
> worth that (and we still have 8 cocci files and growing, which should
> keep at least 8 cores busy).

One major side effect, however, is the drastically increased memory
usage resulting from processing all files at once.  With your patch on
top of current master:

  $ for cocci in contrib/coccinelle/*.cocci ; do command time -f 'max RSS: 
%MkB' make ${cocci}.patch ; done
   SPATCH contrib/coccinelle/array.cocci
  max RSS: 1537068kB
   SPATCH contrib/coccinelle/commit.cocci
  max RSS: 1510920kB
   SPATCH contrib/coccinelle/free.cocci
  max RSS: 1393712kB
   SPATCH contrib/coccinelle/object_id.cocci
   SPATCH result: contrib/coccinelle/object_id.cocci.patch
  max RSS: 1831700kB
   SPATCH contrib/coccinelle/qsort.cocci
  max RSS: 1384960kB
   SPATCH contrib/coccinelle/strbuf.cocci
  max RSS: 1395800kB
   SPATCH contrib/coccinelle/swap.cocci
  max RSS: 1393620kB
   SPATCH contrib/coccinelle/xstrdup_or_null.cocci
  max RSS: 1371716kB

Without your patch the max RSS lingers around 87048kB - 101980kB,
meaning ~15x - 18x increase

This might cause quite the surprise to developers who are used to run
'make -jN coccicheck'; my tiny laptop came to a grinding halt with
-j4.

I think this side effect should be mentioned in the commit message.

Furthermore, the above mentioned static analysis build job on Travis
CI runs two parallel jobs.  Perhaps we should be considerate and
reduce that to a single job, even if that means that he build job will
run longer.

> I think recent versions of Coccinelle will actually parallelize
> internally, too, but my 1.0.4 doesn't seem to. That's probably what I
> was thinking of earlier (but this is still a win even without that).

I think Coccinelle parallelizes only when invoked with -j , but
that option is not documented in 1.0.4.  I wrote "documented" instead
of "present", because e.g.:

  $ spatch --sp-file contrib/coccinelle/swap.cocci -j 2 a*.c

doesn't abort with "unknown option" and usage, but runs for a bit and
then outputs:

  init_defs_builtins: /usr/lib/coccinelle/standard.h
  HANDLING: abspath.c advice.c alias.c alloc.c apply.c archive.c archive-tar.c 
archive-zip.c argv-array.c attr.c
  Fatal error: exception Sys_error("swap: No such file or directory")

Without '-j 2' the command runs just fine.  I don't know what to make
of it.

> It looks like this also fixes a problem I ran into when doing the oideq
> work, which is that the patch for a header file would get shown multiple
> times (once for each file that includes it). So this is faster _and_
> more correct. Double-yay.

In that case I used to apply the change to the header first, and then
apply the semantic patch one more time.



Re: [PATCH 3/5] diff --color-moved-ws: fix a memory leak

On Tue, 2 Oct 2018 at 20:04, Phillip Wood  wrote:
> const struct emitted_diff_symbol *longer =  a->len > b->len ? a : b;
> const struct emitted_diff_symbol *shorter = a->len > b->len ? b : a;
> int d = longer->len - shorter->len;
> +   int ret = !strncmp(longer->line + d, shorter->line, shorter->len);
> +
> +   if (!ret)
> +   return ret;
>
> out->string = xmemdupz(longer->line, d);
> out->current_longer = (a == longer);
>
> -   return !strncmp(longer->line + d, shorter->line, shorter->len);
> +   return ret;
>  }

The caller only cares about 0 vs non-0, so this would also work:

   if (strncmp(...))
   return 0;

   ...

   return 1;

Just a thought, to avoid some "!"s and the new variable `ret`.

Martin


  1   2   >