Re: What's cooking in git.git (Oct 2018, #01; Wed, 10)

2018-10-10 Thread Junio C Hamano
Phillip Wood  writes:

> On 10/10/2018 06:43, Junio C Hamano wrote:
>> Here are the topics that have been cooking.  Commits prefixed with
>> '-' are only in 'pu' (proposed updates) while commits prefixed with
>> '+' are in 'next'.  The ones marked with '.' do not appear in any of
>> the integration branches, but I am still holding onto them.
>>
>> * pw/diff-color-moved-ws-fix (2018-10-04) 5 commits
>>   - diff --color-moved: fix a memory leak
>>   - diff --color-moved-ws: fix another memory leak
>>   - diff --color-moved-ws: fix a memory leak
>>   - diff --color-moved-ws: fix out of bounds string access
>>   - diff --color-moved-ws: fix double free crash
>>
>>   Various fixes to "diff --color-moved-ws".
>>
>>   What's the status of this topic?
>
> I think it is ready for next - Stefan was happy with the last iteration.

Thanks.


Re: What's cooking in git.git (Oct 2018, #01; Wed, 10)

2018-10-10 Thread Junio C Hamano
Johannes Sixt  writes:

> Am 10.10.18 um 07:43 schrieb Junio C Hamano:
>> We haven't seen much complaints and breakages reported against the
>> two big "rewrite in C" topics around "rebase"; perhaps it is a good
>> time to merge them to 'next' soonish to cook them for a few weeks
>> before moving them to 'master'?
>
> Please let me express my sincerest gratitude to Alban, Joel,
> Paul-Sebastian, Pratik, and Dscho. It is such a pleasure to work with
> the builtin rebase and stash commands on Windows now. I am using them
> since a month or two, and they work extremely well for me.
>
> Thank you all for your hard work!

OK.  With another Ack from Dscho, I'd feel safe to merge the
"rebase" topics 'next' and start cooking.  "stash" seems to be
almost there but I think it deserves a chance for a final touch-up
before hitting 'next' (see another thread with Thomas).

Thanks.


Re: What's cooking in git.git (Oct 2018, #01; Wed, 10)

2018-10-10 Thread Junio C Hamano
Thomas Gummerer  writes:

> There was a v9 of this series [*1*], which hasn't been picked up yet.
> Was that intentional, or an oversight?

;-) Yes, I often miss patches that are buried in other discussions,
but this time, it was quite deliberate.  I saw comments that pointed
out at least one thing that needs to be fixed before the series can
move forward, so I skipped that iteration, anticipating another
round of update.

Also, I was waiting for [*3*] to be answered.

> I left some comments on that iteration.  Some were just style nits,
> but I think at least [*2*] should be addressed before we merge this
> down to master, not sure if any of my other comments apply to v8 as
> well.  I'm happy to send fixup patches, or a patches on top of
> this series for that and my other comments, should they apply to v8,
> or wait for Paul-Sebastian to send a re-roll.  What do you prefer?

The ideal from my point of view is to see responses to your comments
in the original thread (which is about 1300 messages ago in the list
archive by now) by Paul-Sebastian, possibly responded by you and/or
others, resulting in a concensus on what the right update for the
patches should be, finally followed by v10, which hopefully would be
the final one.

> [*1*]: 
> [*2*]: <20180930174848.ge2...@hank.intra.tgummerer.com>

[*3*] 


Re: [PATCH 2/3] ls-remote: release memory instead of UNLEAK

2018-10-10 Thread Junio C Hamano
Olga Telezhnaya  writes:

> Use ref_array_clear() to release memory instead of UNLEAK macros.
>
> Signed-off-by: Olga Telezhnaia 
> ---
>  builtin/ls-remote.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

OK, this is immediately before the command exits, and we have a way
to clear and release the resource, so it is obvious we should use
it.

Good.


>
> diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
> index 1a25df7ee15b4..6a0cdec30d2d7 100644
> --- a/builtin/ls-remote.c
> +++ b/builtin/ls-remote.c
> @@ -151,6 +151,6 @@ int cmd_ls_remote(int argc, const char **argv, const char 
> *prefix)
>   }
>  
>   UNLEAK(sorting);
> - UNLEAK(ref_array);
> + ref_array_clear(_array);
>   return status;
>  }
>
> --
> https://github.com/git/git/pull/538


Re: [PATCH 3/3] ref-filter: free item->value and item->value->s

2018-10-10 Thread Junio C Hamano
Olga Telezhnaya  writes:

> Release item->value.
> Initialize item->value->s dynamically and then release its resources.
> Release some local variables.

Again, "why" is lacking.

> @@ -1373,36 +1379,31 @@ static void fill_remote_ref_details(struct used_atom 
> *atom, const char *refname,
>   }
>   } else if (atom->u.remote_ref.option == RR_TRACKSHORT) {
>   if (stat_tracking_info(branch, _ours, _theirs,
> -NULL, AHEAD_BEHIND_FULL) < 0)
> +NULL, AHEAD_BEHIND_FULL) < 0) {
> + *s = xstrdup("");
>   return;

It is a bit sad that we need to sprinkle xstrdup() all over the
place, but I do not offhand think of a better alternative to ensure
that it is safe to blindly free the .s field.

> - if (explicit)
> - *s = xstrdup(remote);
> - else
> - *s = "";
> + *s = explicit ? xstrdup(remote) : xstrdup("");

Next time, please avoid mixing this kind of unrelated changes with
the main theme (i.e. "the original had allocated and static pieces
of memory pointed by the same variable, which made it impossible to
blindly free it, so make sure everything is allocated").  It makes
it harder to let reviewers' eyes coast over the patch.

I say "Next time" because the change is already written this time,
and I already spent time to see it was an OK change.  By the way,

*s = xstrdup(explicit ? remote : "");

is probably shorter.

> @@ -1562,10 +1566,11 @@ static int populate_value(struct ref_array_item *ref, 
> struct strbuf *err)
>   if (!refname)
>   continue;
>   }
> + free((char *)v->s); // we will definitely re-init it on 
> the next line

No // comment, please.

>  static void free_array_item(struct ref_array_item *item)
>  {
>   free((char *)item->symref);
> + if (item->value) {
> + free((char *)item->value->s);
> + free(item->value);
> + }
>   free(item);
>  }

OK.


Re: [PATCH 1/3] ref-filter: free memory from used_atom

2018-10-10 Thread Junio C Hamano
Olga Telezhnaya  writes:

> Release memory from used_atom variable.

That much readers would know from a quick look of the patch text.

Without knowing what you are aiming at, it is impossible to judge if
the patch is a good change.

Seeing FREE_AND_NULL(array->items) in the same function makes me
think that the designer of ref_array_clear() function would want
this sequence of events to work correctly in an ideal future:

 * Do a ref-filter operation by calling filter_refs(), receiving the
   result into an array..

 * Do another ref-filter by calling filter_refs(), using different
   criteria, receiving the result into a different array.

 * Iterate over the resulting refs in the first array, and call
   format_ref_array_item().

 * ref_array_clear() the first array, as the caller is done with it.

 * Iterate over the resulting refs in the second array, and call
   format_ref_array_item().

 * ref_array_clear() the second array, as the caller is done with
   it.

However, I think it would make it impossible for the second call to
work correctly if this code freed used_atom without clearing, and
not re-initializing the used_atom_cnt etc.

If on the other hand, the only thing you are interested in is to
just discard pieces of memory we no longer use, and you are not
interested in helping to move us closer to the world in which we can
call filter_refs() twice, then the change this patch makes is
sufficient.

And the place to answer the "what are you aiming at?" question is in
the proposed commit log message.

In an ideal future, I _think_ the file-scope static variables in
ref-filter.c like used_atom and used_atom_cnt should become fields
of a new structure (say "struct ref_filter"), with initializer and
uninitializer ref_filter_new() and ref_filter_destroy().  When that
happens, I think FREE_AND_NULL(used_atom) + used_atom_cnt=0 should
become part of ref_filter_destroy(), not part of ref_array_clear().

But we are not there yet, and a clean-up patch like this does not
have to be a step towards that goal.

> Signed-off-by: Olga Telezhnaia 
> ---
>  ref-filter.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/ref-filter.c b/ref-filter.c
> index e1bcb4ca8a197..1b71d08a43a84 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -1996,6 +1996,9 @@ void ref_array_clear(struct ref_array *array)
>  {
>   int i;
>  
> + for (i = 0; i < used_atom_cnt; i++)
> + free((char *)used_atom[i].name);
> + free(used_atom);
>   for (i = 0; i < array->nr; i++)
>   free_array_item(array->items[i]);
>   FREE_AND_NULL(array->items);
>
> --
> https://github.com/git/git/pull/538


Re: [PATCH] doc: move git-rev-parse from porcelain to plumbing

2018-10-10 Thread Junio C Hamano
Daniels Umanovskis  writes:

> On 10/11/18 12:26 AM, Junio C Hamano wrote:
>> Among the remaining ones in the list, cherry and get-tar-commit-id
>> are probably better classified as plumbing.  I do not know why
>> cherry is marked for completion; perhaps some crazy people use that
>> on the command line?
>
> I think cherry could go either way, get-tar-commit-id is definitely
> plumbing. Would you like me to fix those two on the same patch then?

No, what you sent for rev-parse is already good.  A separate patch
that addresses other ones can be discussed as an orthogonal matter.
It even may deserve to make them two separate patches, as I
anticipate that some people would resist marking "cherry" as
plumbing, so that only one can be applied while dropping the other
as/if needed.

Thanks.



Re: [PATCH] gc: remove redundant check for gc_auto_threshold

2018-10-10 Thread Junio C Hamano
Brandon Casey  writes:

> ...  Again, I don't feel strongly about it, but I'm not
> sure this change actually improves the code.

Yeah, in the context of the current caller, this is a safe change
that does not break anybody and reduces the number of instructions
executed in this codepath.  A mistaken caller may be added in the
future that fails to check auto-threashold beforehand, but that
won't lead to anything bad like looping for a large number of times,
so as long as the API contract into this helper function is clear
that callers are responsible to check beforehand, it is still not
too bad.

So, I'd throw this into "Meh - I won't regret applying it, but it is
not the end of the world if I forget to apply it, either" pile.

I _think_ a change that actually improves the code would be to
restructure so that there is a helper that is responsible for
guestimating the number of loose objects, and another that uses the
helper to see if there are too many loose objects.  The latter is
the only one tha needs to know about auto-threashold.  But we are
not in immdiate need for such a clean-up, I guess, unless somebody
is actively looking into revamping how auto-gc works and doing a
preparatory clean-up.



Re: [PATCH] doc: fix a typo and clarify a sentence

2018-10-10 Thread Junio C Hamano
Mihir Mehta  writes:

> -Just in case if you are doing something exotic, it should be
> +Just in case you are doing something exotic, it should be

Thanks.  Somehow I didn't notice this change earlier, but it looks
good, too.

Will queue.


Re: `--rebase-merges' still failing badly

2018-10-10 Thread Junio C Hamano
Michael Witten  writes:

> On Wed, 10 Oct 2018 14:43:46 +0900, Junio wrote:
>
>> We haven't seen  much complaints and breakages  reported against the
>> two big "rewrite in C" topics  around "rebase"; perhaps it is a good
>> time to merge  them to 'next' soonish  to cook them for  a few weeks
>> before moving them to 'master'?
>
> In my opinion, the `--rebase-merges' feature has been broken since the
> beginning, and the builtin version should  be fixed before it is moved
> ahead.

I'll omit the remainder of the message not because I disagree with
your suggested improvements to "rebase-merges" (that conversation
should happen primarily with Dscho), but because I need to react to
the above three lines.

If "rebase-merges" has been broken since the beginning, as long as
the "rewrite in C" topics around "rebase" do not make it even worse,
I do not think it is a good move to block the topics moving forward.
If the feature were so broken that it is not practically useful,
then people wouldn't be using it in the versions of Git before the
rewrite, so it won't harm anybody if the same feature in the rewritten
version is equally (or even more severely) broken, as long as the
other parts of the feature works at least equally well compared to
the older version.

We are not in the business of hostage taking.

What *should* block the rewrited version is a regression,
i.e. something that used to work well no longer works or works
differently in such a way that established workflows need to be
adjusted.

In any case, suggestions to improve "rebase-merges" is a very much
welcome thing to be discussed on the list, so thanks for raising the
issue.  What I wanted to say is that I do not think that is a reason
to keep "rewrite in C" waiting in 'pu'.



Re: [PATCH] doc: move git-rev-parse from porcelain to plumbing

2018-10-10 Thread Junio C Hamano
Daniels Umanovskis  writes:

> git-rev-parse mostly seems like plumbing, and is more usd in
> scripts than in regular use. Online it's often mentioned as
> a plumbing command. Nonetheless it's listed under porcelain
> interrogators in `man git`. It seems appropriate to formally
> move git-rev-parse to plumbing interrogators.

Correct.  "ancillary" category ended up with full of Porcelain, it
seems, but there still are plumbing commands there, and this is a
prime example that should not be mixed up with Porcelain commands.

Among the remaining ones in the list, cherry and get-tar-commit-id
are probably better classified as plumbing.  I do not know why
cherry is marked for completion; perhaps some crazy people use that
on the command line?


Re: none

2018-10-10 Thread Junio C Hamano
Mihir Mehta  writes:

> Thanks, Junio. Instead of removing that part of the patch, I opted to
> expand it to make it a little clearer (in my opinion) than it was
> before. Let me know if this works.

I am mildly negative on that change.  "Omitting both would give an
empty diff" would be understandable to anybody who understands that
an omitted end of dot-dot is substituted with HEAD *and* thinks what
range HEAD..HEAD means, so it is just an additional noise to them,
and to those who do not want to waste time on thinking, it is a
statement that reads as if "it will be an error" without saying why
it is an error.  So overall, it seems, at least to me, that the
additional text adds negative value.

So, I dunno.


Re: [PATCH v5 17/23] userdiff.c: remove implicit dependency on the_index

2018-10-10 Thread Junio C Hamano
Jeff King  writes:

> I get why you're doing it: your topic here only cares about removing
> index dependencies, so you did the minimal thing to move that forward.
>
> But if you think about what this function is doing, it is quite clearly
> dependent on the whole repository, since the userdiff config we're
> looking up may come from repo config.

In the case of userdiff that is pretty much limited to read-only
operation, I fully agree, but in more general cases, we would need
to pass both the repository and an in-core index separately, I would
say.  Imagine doing a partial commit, where we construct a separate
istate that is not the "repo's index" and use that to write out a
tree object to be wrapped in a new commit, and update the current
branch ref.


Re: [PATCH 2/2] push: add an advice on unqualified push

2018-10-10 Thread Junio C Hamano
Jeff King  writes:

>> Fix both of those, now the message will look like this instead:
>> 
>> $ ./git-push avar v2.19.0^{commit}:newbranch -n
>> error: unable to push to unqualified destination: newbranch
>> hint: The destination refspec neither matches an existing
>> hint: ref on the remote nor begins with refs/, and we are
>> hint: unable to guess a prefix based on the source ref.
>> hint:
>> hint: The  part of the refspec is a commit object.
>> hint: Did you mean to create a new branch by pushing to
>> hint: 'v2.19.0^{commit}:refs/heads/newbranch'?
>> error: failed to push some refs to 'g...@github.com:avar/git.git'
>> 
>> When trying to push a tag, tree or a blob we suggest that perhaps the
>> user meant to push them to refs/tags/ instead.
>
> This is much better, and I love the customized behavior based on the
> object type.
>
> I wonder if we could reword the first paragraph to be a little less
> confusing, and spell out what we tried already. E.g., something like:
>
>   The destination you provided is not a full refname (i.e., starting
>   with "ref"). Git tried to guess what you meant by:

s|ref|refs/|; I fully agree that "unqualified destination" was a
poor way to communicate the failure to those who would likely hit
this error path, because somebody who can ell what's qualified and
what's not would not be triggering the error in the first place.

> - looking for a matching branch or tag on the remote side
>
> - looking at the refname of the local source
>
>   but neither worked.
>
>   The  part of the refspec is a commit object.
>   Did you mean...

Looks great.

> I'm not sure about saying "branch or tag" in the first bullet. It's
> friendlier to most users, but less technically correct (if you said
> "notes/foo", I believe we'd match an existing "refs/notes/foo", because
> it's really just using the normal lookup rules).

An alternative may be "looking for a ref that matches %s on the
remote side".  I am no longer a total newbie, so I cannot tell how
well that message would help one to connect notes/foo one just typed
with refs/notes/foo that potentially exists on the remote side.

> Also, as an aside, I wonder if we should allow "heads/foo" to work as
> "refs/heads/foo" (even when no such ref already exists). But that is
> totally orthogonal to changing the message.

I am neutral on this point but agree that it is better done outside
this patch.


Re: [PATCH] gc: introduce an --auto-exit-code option for undoing 3029970275

2018-10-10 Thread Junio C Hamano
Jonathan Nieder  writes:

> Perhaps this reporting could also print the message from a previous
> run, so you could write:
>
>   git gc --detached-status || exit
>   git gc --auto; # perhaps also passing --detach
>
> (Names still open for bikeshedding.)

When the command is given --detached-exit-code/status option, what
does it do?  Does it perform the "did an earlier run left gc.log?"
and report the result and nothing else?  In other words, is it a
pure replacement for "test -e .git/gc.log"?  Or does it do some of
the "auto-gc" prep logic like guestimating loose object count and
have that also in its exit status (e.g. "from the gc.log left
behind, we know that we failed to reduce loose object count down
sufficiently after finding there are more than 6700 earlier, but now
we do not have that many loose object, so there is nothing to
complain about the presence of gc.log")?

I am bad at naming myself, but worse at guessing what others meant
with a new thing that was given a new name whose name is fuzzy,
so... ;-)


Re: builtin stash/rebase, was Re: What's cooking in git.git (Oct 2018, #01; Wed, 10)

2018-10-10 Thread Junio C Hamano
Johannes Schindelin  writes:

> Speaking about the two `rebase` ones: they are simple fixup! commits,
> could I trouble you to fetch and cherry-pick them into `pu`, or would you
> prefer if I sent another iteration of `rebase-in-c-4-opts`?

If it were only about me, then the former if I can do my own pace is
easier.  If you promise that you won't complain if a few commits
lose the amlog notes by accident when such tree mangling is done,
that would be even better, but I'd be careful anyway.

I'd rather limit number of changes not seen on the list that come
into my tree, so it is likely that I'd parrot these fixup commits or
result of "commit --amend" to the list if we take that route.

Johannes Schindelin  writes:

> Hi Junio,
>
> On Wed, 10 Oct 2018, Junio C Hamano wrote:
>
>> We haven't seen much complaints and breakages reported against the
>> two big "rewrite in C" topics around "rebase"; perhaps it is a good
>> time to merge them to 'next' soonish to cook them for a few weeks
>> before moving them to 'master'?
>
> I would be in favor, as long as the fixup patches I have in Git for
> Windows made it in:
>
> https://github.com/git-for-windows/git/commit/6bc7024aecdb1aeb2760c519f7b26e6e5ef21051
> fixup! builtin rebase: support `-C` and `--whitespace=`
>
> https://github.com/git-for-windows/git/commit/1e6a1c510ffeae5bb0a4bda7f0528a8213728837
> fixup! builtin rebase: support `--gpg-sign` option
>
> https://github.com/git-for-windows/git/commit/ddb6e5ca19d5cdd318bc4bcbb7f7f3fb0892c8cc
> fixup! rebase -i: implement the main part of interactive rebase as a 
> builtin
>
> https://github.com/git-for-windows/git/commit/2af24038a95a3879aa0c29d91a43180b9465247e
> fixup! stash: convert apply to builtin
>
> It seems that Alban picked up the `rebase -i` one, but the other three
> have not made it into `pu` yet (the two `rebase` ones are really my fault,
> I did not yet find time).
>
> Speaking about the two `rebase` ones: they are simple fixup! commits,
> could I trouble you to fetch and cherry-pick them into `pu`, or would you
> prefer if I sent another iteration of `rebase-in-c-4-opts`?
>
> Ciao,
> Dscho


Re: js/mingw-wants-vista-or-above, was Re: What's cooking in git.git (Oct 2018, #01; Wed, 10)

2018-10-10 Thread Junio C Hamano
Johannes Schindelin  writes:

> Hi Junio,
>
> On Wed, 10 Oct 2018, Junio C Hamano wrote:
>
>> * js/mingw-wants-vista-or-above (2018-10-04) 3 commits
>>  - mingw: bump the minimum Windows version to Vista
>>  - mingw: set _WIN32_WINNT explicitly for Git for Windows
>>  - compat/poll: prepare for targeting Windows Vista
>> 
>>  The minimum version of Windows supported by Windows port fo Git is
>>  now set to Vista.
>> 
>>  Will merge to 'next'.
>
> Could I ask you to fast-track this to `master`? The code in `master`
> unfortunately no longer compiles in a current Git for Windows SDK, meaning
> that all of our Continuous Testing fails as long as these patches are not
> merged.

Absolutely.  There is no point keeping it in 'pu', as nobody would
touch it in my tree until it hits 'next' and probably 'master' and
the change would get wider exposure to folks to whom it matters in
your tree anyway.

Thanks for pinging.


Re: [PATCH][Outreachy] remove all the inclusions of git-compat-util.h in header files

2018-10-10 Thread Junio C Hamano
Christian Couder  writes:

> So I think one solution to this problem is already proposed on our web site.

OK, that's a good start.  The next step is to make those who are
involved in these education programs to be more aware of it ;-)


Re: git svn clone/fetch hits issues with gc --auto

2018-10-10 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

>  - We use this warning as a proxy for "let's not run for a day",
>otherwise we'll just grind on gc --auto trying to consolidate
>possibly many hundreds of K of loose objects only to find none of
>them can be pruned because the run into the expiry policy. With the
>warning we retry that once per day, which sucks less.
>
>  - This conflation of the user-visible warning and the policy is an
>emergent effect of how the different gc pieces interact, which as I
>note in the linked thread(s) sucks.
>
>But we can't just yank one piece away (as Jonathan's patch does)
>without throwing the baby out with the bathwater.
>
>It will mean that e.g. if you have 10k loose objects in your git.git,
>and created them just now, that every time you run anything that runs
>"gc --auto" we'll fork to the background, peg a core at 100% CPU for
>2-3 minutes or whatever it is, only do get nowhere and do the same
>thing again in ~3 minutes when you run your next command.

We probably can keep the "let's not run for a day" safety while
pretending that "git gc -auto" succeeded for callers like "git svn"
so that these callers do not hae to do "eval { ... }" to hide our
exit code, no?

I think that is what Jonathan's patch (jn/gc-auto) does.

From: Jonathan Nieder 
Date: Mon, 16 Jul 2018 23:57:40 -0700
Subject: [PATCH] gc: do not return error for prior errors in daemonized mode

diff --git a/builtin/gc.c b/builtin/gc.c
index 95c8afd07b..ce8a663a01 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -438,9 +438,15 @@ static const char *lock_repo_for_gc(int force, pid_t* 
ret_pid)
return NULL;
 }
 
-static void report_last_gc_error(void)
+/*
+ * Returns 0 if there was no previous error and gc can proceed, 1 if
+ * gc should not proceed due to an error in the last run. Prints a
+ * message and returns -1 if an error occured while reading gc.log
+ */
+static int report_last_gc_error(void)
 {
struct strbuf sb = STRBUF_INIT;
+   int ret = 0;
...
if (len < 0)
+   ret = error_errno(_("cannot read '%s'"), gc_log_path);
+   else if (len > 0) {
+   /*
+* A previous gc failed.  Report the error, and don't
+* bother with an automatic gc run since it is likely
+* to fail in the same way.
+*/
+   warning(_("The last gc run reported the following. "
   "Please correct the root cause\n"
   "and remove %s.\n"
   "Automatic cleanup will not be performed "
   "until the file is removed.\n\n"
   "%s"),
gc_log_path, sb.buf);
+   ret = 1;
+   }
strbuf_release();
 done:
free(gc_log_path);
+   return ret;
 }
 
I.e. report_last_gc_error() returns 1 when finds that the previous
attempt to "gc --auto" failed.  And then

@@ -561,7 +576,13 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
fprintf(stderr, _("See \"git help gc\" for manual 
housekeeping.\n"));
}
if (detach_auto) {
-   report_last_gc_error(); /* dies on error */
+   int ret = report_last_gc_error();
+   if (ret < 0)
+   /* an I/O error occured, already reported */
+   exit(128);
+   if (ret == 1)
+   /* Last gc --auto failed. Skip this one. */
+   return 0;

... it exits with 0 without bothering to rerun "gc".

So it won't get stuck for 3 minutes; the repository after "gc
--auto" punts will stay to be suboptimal for a day, and the user
kill not get an "actionable" error notice (due to this hiding of
previous error), hence cannot make changes that may help like
shortening expiry period, though.



Re: [PATCH][Outreachy] remove all the inclusions of git-compat-util.h in header files

2018-10-10 Thread Junio C Hamano
Johannes Schindelin  writes:

> Personally, I find the "whoever is picking it up should do the thinking"
> much too harsh for a first-time contributor who specifically came through
> the Outreachy program, i.e. expected to have a gentle introduction into
> the project, and into the ways we work.

Oh, absolutely I agree.

Any random discussion participant can say "left over bits" in any
random message with an idea that is left on the table.  Looking for
it may narrow the set messages to be examined, but the query result
will inevitably be still full of chaff.  It is not a very good match
for "gentle introduction" material for GSoC/Outreachy microprojects.

List of reasonable low-hanging fruits is hard to maintain, as the
cost of building and maintaining such a list would easily outweigh
the cost (and fun) of picking these low-hanging fruits yourself X-<.

I do not think of a good solution to help newcomers offhand.

Thanks, as always, for trying to be helpful to newcomers.



What's cooking in git.git (Oct 2018, #01; Wed, 10)

2018-10-09 Thread Junio C Hamano
Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.  The ones marked with '.' do not appear in any of
the integration branches, but I am still holding onto them.

We haven't seen much complaints and breakages reported against the
two big "rewrite in C" topics around "rebase"; perhaps it is a good
time to merge them to 'next' soonish to cook them for a few weeks
before moving them to 'master'?

You can find the changes described here in the integration branches
of the repositories listed at

http://git-blame.blogspot.com/p/git-public-repositories.html

--
[Graduated to "master"]

* ab/fsck-skiplist (2018-09-12) 10 commits
  (merged to 'next' on 2018-09-24 at 26adeb8b8f)
 + fsck: support comments & empty lines in skipList
 + fsck: use oidset instead of oid_array for skipList
 + fsck: use strbuf_getline() to read skiplist file
 + fsck: add a performance test for skipList
 + fsck: add a performance test
 + fsck: document that skipList input must be unabbreviated
 + fsck: document and test commented & empty line skipList input
 + fsck: document and test sorted skipList input
 + fsck tests: add a test for no skipList input
 + fsck tests: setup of bogus commit object

 (Originally merged to 'next' on 2018-09-17 at dc9094ba9b)

 Update fsck.skipList implementation and documentation.


* bc/hash-independent-tests (2018-09-17) 11 commits
  (merged to 'next' on 2018-09-24 at 7c4a61fe46)
 + t5318: use test_oid for HASH_LEN
 + t1407: make hash size independent
 + t1406: make hash-size independent
 + t1405: make hash size independent
 + t1400: switch hard-coded object ID to variable
 + t1006: make hash size independent
 + t0064: make hash size independent
 + t0002: abstract away SHA-1 specific constants
 + t: update tests for SHA-256
 + t: use hash translation table
 + t: add test functions to translate hash-related values

 (Originally merged to 'next' on 2018-09-17 at 9e94794d05)

 Various tests have been updated to make it easier to swap the
 hash function used for object identification.


* ds/multi-pack-verify (2018-09-17) 11 commits
  (merged to 'next' on 2018-09-24 at f294a34aaf)
 + fsck: verify multi-pack-index
 + multi-pack-index: report progress during 'verify'
 + multi-pack-index: verify object offsets
 + multi-pack-index: fix 32-bit vs 64-bit size check
 + multi-pack-index: verify oid lookup order
 + multi-pack-index: verify oid fanout order
 + multi-pack-index: verify missing pack
 + multi-pack-index: verify packname order
 + multi-pack-index: verify corrupt chunk lookup table
 + multi-pack-index: verify bad header
 + multi-pack-index: add 'verify' verb

 (Originally merged to 'next' on 2018-09-17 at f27244f302)

 "git multi-pack-index" learned to detect corruption in the .midx
 file it uses, and this feature has been integrated into "git fsck".


* nd/config-split (2018-09-12) 11 commits
  (merged to 'next' on 2018-09-24 at 150cb40d2c)
 + config.txt: move submodule part out to a separate file
 + config.txt: move sequence.editor out of "core" part
 + config.txt: move sendemail part out to a separate file
 + config.txt: move receive part out to a separate file
 + config.txt: move push part out to a separate file
 + config.txt: move pull part out to a separate file
 + config.txt: move gui part out to a separate file
 + config.txt: move gitcvs part out to a separate file
 + config.txt: move format part out to a separate file
 + config.txt: move fetch part out to a separate file
 + config.txt: follow camelCase naming

 (Originally merged to 'next' on 2018-09-17 at 33e6cb8f48)

 Split Documentation/config.txt for easier maintenance.


* nd/test-tool (2018-09-11) 6 commits
  (merged to 'next' on 2018-09-24 at 23ad767573)
 + Makefile: add a hint about TEST_BUILTINS_OBJS
 + t/helper: merge test-dump-fsmonitor into test-tool
 + t/helper: merge test-parse-options into test-tool
 + t/helper: merge test-pkt-line into test-tool
 + t/helper: merge test-dump-untracked-cache into test-tool
 + t/helper: keep test-tool command list sorted

 (Originally merged to 'next' on 2018-09-17 at decbf86eeb)

 Test helper binaries clean-up.

--
[New Topics]

* ds/reachable-final-cleanup (2018-09-25) 1 commit
 - commit-reach: cleanups in can_all_from_reach...

 Code already in 'master' is further cleaned-up by this patch.

 Will merge to 'next'.


* dz/credential-doc-url-matching-rules (2018-09-27) 1 commit
 - doc: clarify gitcredentials path component matching

 Doc update.

 Will merge to 'next'.


* en/status-multiple-renames-to-the-same-target-fix (2018-09-27) 1 commit
 - commit: fix erroneous BUG, 'multiple renames on the same target? how?'

 The code in "git status" sometimes hit an assertion failure.  This
 was caused by a structure that was reused without cleaning the data
 used for the first run, which has been 

Re: git svn clone/fetch hits issues with gc --auto

2018-10-09 Thread Junio C Hamano
Forwarding to Jonathan, as I think this is an interesting supporting
vote for the topic that we were stuck on.

Eric Wong  writes:

> Martin Langhoff  wrote:
>> Hi folks,
>> 
>> Long time no see! Importing a 3GB (~25K revs, tons of files) SVN repo
>> I hit the gc error:
>> 
>> warning: There are too many unreachable loose objects; run 'git prune'
>> to remove them.
>> gc --auto: command returned error: 255
>
> GC can be annoying when that happens... For git-svn, perhaps
> this can be appropriate to at least allow the import to continue:
>
> diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
> index 76b2965905..9b0caa3d47 100644
> --- a/perl/Git/SVN.pm
> +++ b/perl/Git/SVN.pm
> @@ -999,7 +999,7 @@ sub restore_commit_header_env {
>  }
>  
>  sub gc {
> - command_noisy('gc', '--auto');
> + eval { command_noisy('gc', '--auto') };
>  };
>  
>  sub do_git_commit {
>
>
> But yeah, somebody else who works on git regularly could
> probably stop repack from writing thousands of loose
> objects (and instead write a self-contained pack with
> those objects, instead).  I haven't followed git closely
> lately, myself.


Re: [PATCH 2/2] doc/git-branch: Document the --current option

2018-10-09 Thread Junio C Hamano
Daniels Umanovskis  writes:

> +--current::
> + Print the name of the current branch. In detached HEAD state,
> + or if otherwise impossible to resolve the branch name, print
> + "HEAD".

Where does "if otherwise impossible to resolve" come from?  In the
code in [PATCH 1/2], we see this bit

+   const char *refname = resolve_ref_unsafe("HEAD", 0, NULL, NULL);
+   char *shortname = shorten_unambiguous_ref(refname, 0);

and the output phase would become puts(shortname).

 * Under what condition resolve_ref_unsafe(HEAD) fail to resolve,
   and when that happens what does it return?  "HEAD"?  Can the
   caller tell the case in which .git/HEAD is a symref that points
   at refs/heads/HEAD (i.e. we are on a branch whose name is "HEAD")
   and the case in which .git/HEAD fails to resolve and you get
   "HEAD" back?

 * Or does the function return NULL in "otherwise impossible" case?
   Does shorten_unambiguous_ref() deal with refname==NULL
   gracefully?

 * Under what condition shorten_unambiguous_ref() fail to compute
   the branch name discovered by resolve_ref_unsafe()?

Also, I do not think the implementation is correct.  When you are on
the 'frotz' branch, and if you happen to have a tag whose name also
is 'frotz', then

 - Your .git/HEAD points at refs/heads/frotz, and refs/heads/frotz
   is what resolve_ref_unsafe() gives you.

 - You have refs/heads/frotz and refs/tags/frotz in the repository.
   Asking to shorten the ref refs/heads/frotz unambiguously will
   *not* yield 'frotz'.  It will give you something like 'heads/frotz'
   to avoid getting it confused with tags/frotz

 - Still "git branch --list" would show 'frotz' in such a case, and
   your "--current" would definitely want to match the behaviour.

I think the correct implementation should be more like:

 - Ask resolve-ref-unsafe about HEAD; if it is not a symbolic ref,
   then we are on a detached HEAD.  Silently exit with status 0.

 - If it is a symbolic ref, see if the target of the symblic ref
   (i.e. returned refname) begins with "refs/heads/".  Otherwise, we
   have a repository corruption.  Diagnose it as an error and die().

 - Otherwise, strip that leading "refs/heads/"; the remainder is the
   name of the "current branch".

I already said "current" by itself is an unacceptable name for this
option, so I won't be repeating myself.



Re: [PATCH 1/1] subtree: add build targets 'man' and 'html'

2018-10-09 Thread Junio C Hamano
Christian Hesse  writes:

> From: Christian Hesse 
>
> We have targets 'install-man' and 'install-html', let's add build
> targets as well.
>   ...
> +man: $(GIT_SUBTREE_DOC)
> +
> +html: $(GIT_SUBTREE_HTML)
> +

As 'contrib' material without real maintenance, I do not care too
deeply, but shouldn't this change be more like this to avoid
duplicating the list of targets?


diff --git a/contrib/subtree/Makefile b/contrib/subtree/Makefile
index 5c6cc4ab2c..4a10a020a0 100644
--- a/contrib/subtree/Makefile
+++ b/contrib/subtree/Makefile
@@ -59,17 +59,21 @@ $(GIT_SUBTREE): $(GIT_SUBTREE_SH)
 
 doc: $(GIT_SUBTREE_DOC) $(GIT_SUBTREE_HTML)
 
+man: $(GIT_SUBTREE_DOC)
+
+html: $(GIT_SUBTREE_HTML)
+
 install: $(GIT_SUBTREE)
$(INSTALL) -d -m 755 $(DESTDIR)$(gitexecdir)
$(INSTALL) -m 755 $(GIT_SUBTREE) $(DESTDIR)$(gitexecdir)
 
 install-doc: install-man install-html
 
-install-man: $(GIT_SUBTREE_DOC)
+install-man: man
$(INSTALL) -d -m 755 $(DESTDIR)$(man1dir)
$(INSTALL) -m 644 $^ $(DESTDIR)$(man1dir)
 
-install-html: $(GIT_SUBTREE_HTML)
+install-html: html
$(INSTALL) -d -m 755 $(DESTDIR)$(htmldir)
$(INSTALL) -m 644 $^ $(DESTDIR)$(htmldir)
 
@@ -94,4 +98,4 @@ clean:
$(RM) $(GIT_SUBTREE)
$(RM) *.xml *.html *.1
 
-.PHONY: FORCE
+.PHONY: FORCE man html install-man install-html



Re: [RFC PATCH 2/2] fuzz: Add fuzz testing for packfile indices.

2018-10-09 Thread Junio C Hamano
Josh Steadmon  writes:

>  ### Fuzz testing
>  #
> -.PHONY: fuzz-clean fuzz-objs fuzz-compile
> +.PHONY: fuzz-clean fuzz-objs fuzz-compile fuzz-all
> ...
>  FUZZ_CFLAGS = $(CFLAGS) -fsanitize-coverage=trace-pc-guard -fsanitize=address
> ...
> +
> +fuzz-all: $(FUZZ_PROGRAMS)

I guess I read your mind ;-) Please do this in 1/2 instead of adding
it at 2/2 as "oops, we'd need it more and more as we add these".





Re: [RFC PATCH 1/2] fuzz: Add basic fuzz testing target.

2018-10-09 Thread Junio C Hamano
Josh Steadmon  writes:

> +FUZZ_OBJS += fuzz-pack-headers.o
> +
> +FUZZ_PROGRAMS += $(patsubst %.o,%,$(FUZZ_OBJS))
> +
> ...
> +### Fuzz testing
> +#
> +.PHONY: fuzz-clean fuzz-objs fuzz-compile

I take it that you anticipate the fuzz programs in the future all
be named fuzz-$(blah), whose source is fuzz-$(blah).o (even though
we may grow some common code that may be linked with them, which can
be done by tweaking the rule for the $(FUZZ_PROGRAMS) target).  Am I
reading you correctly?  Would fuzz-{clean,objs,compile} risk squatting
on nicer names we may want to use for $(blah) down the line?

> + ...
> +$(FUZZ_PROGRAMS): fuzz-compile
> + clang++ $(FUZZ_LDFLAGS) $(LIB_OBJS) $(BUILTIN_OBJS) $(XDIFF_OBJS) \
> + $(EXTLIBS) git.o $@.o /usr/lib/llvm-4.0/lib/libFuzzer.a -o $@

Is the expected usage pattern to know a single fuzz-* program the
builder wants to build, to run "make fuzz-pack-headers"?  If not, it
also would be a good idea to have something like

fuzz-build-all:: $(FUZZ_PROGRAMS)
.PHONY: fuzz-build-all

perhaps?

Also, in the final version we unleash to general developer audience,
we'd want to support "make V=1" (and "make" that is "$(QUIET)").


Re: [PATCH v6] log: fix coloring of certain octupus merge shapes

2018-10-09 Thread Junio C Hamano
Noam Postavsky  writes:

> For octopus merges where the first parent edge immediately merges into
> the next column to the left:
>
> | *-.
> | |\ \
> |/ / /
>
> then the number of columns should be one less than the usual case:
>
> | *-.
> | |\ \
> | | | *

I had a bit hard time parsing the above, especially with "then",
which probably would make it easier to read if it is not there.

> Also refactor the code to iterate over columns rather than dashes,
> building from an initial patch suggestion by Jeff King.

s/suggestion/suggested/ perhaps?

>
> Signed-off-by: Noam Postavsky 
> Reviewed-by: Jeff King 
> ---

Thanks, both.

>  /*
> + * Draw the horizontal dashes of an octopus merge and return the number of
> + * characters written.
>   */
>  static int graph_draw_octopus_merge(struct git_graph *graph,
>   struct strbuf *sb)
>  {
>   /*
> +  * Here dashless_parents represents the number of parents which don't
> +  * need to have dashes (the edges labeled "0" and "1").  And
> +  * dashful_parents are the remaining ones.

Here "dash" refers to that horizontal line on the same line as the
resulting merge.  A very clearly explained definition.  OK.

> +  * | *---.
> +  * | |\ \ \
> +  * | | | | |
> +  * x 0 1 2 3
> +  *
> +  */
> + const int dashless_parents = 2;

That counts parent #0 (the first parent) and parent #1.

> + int dashful_parents = graph->num_parents - dashless_parents;

When a mistaken caller calls this function on a commit that is not
an octopus, this can underflow.  dashful_parents would be -1 for a
non-merge, dashful_parents would be 0 for a normal merge, and then
dashful_parents would be 1 for a merge of three histories.  OK.

> + /*
> +  * Usually, each parent gets its own column, like the diagram above, but
> +  * sometimes the first parent goes into an existing column, like this:
> +  *
> +  * | *---.
> +  * | |\ \ \
> +  * |/ / / /
> +  * x 0 1 2
> +  *
> +  * In which case there will be more parents than the delta of columns.
> +  */

It is unclear to me what "delta of columns" means here.  Is this
because I am unfamiliar with the internal of graph.[ch] API (and
'delta of columns' is used elsewhere in the API internals already)?

> + int delta_cols = (graph->num_new_columns - graph->num_columns);

So in the second picture above, new-columns (which is the columns
used after showing the current line) is narrower (because 'x' reuses
an already allocated column without getting a new one) than columns
(which is the columns for the octopus merge we are showing)?

I am not sure I follow what is going on around here, sorry.

> + int parent_in_old_cols = graph->num_parents - delta_cols;
> + /*
> +  * In both cases, commit_index corresponds to the edge labeled "0".
> +  */
> + int first_col = graph->commit_index + dashless_parents
> + - parent_in_old_cols;
> +
> + int i;
> + for (i = 0; i < dashful_parents; i++) {
> + strbuf_write_column(sb, >new_columns[i+first_col], '-');
> + strbuf_write_column(sb, >new_columns[i+first_col],
> + i == dashful_parents-1 ? '.' : '-');

Draw a dash-dash for each, except we show dash-dot only for the last
one.  OK.  It is interesting that dashful_parents does not have to
change between the two examples you gave above, and it is
understandable because it only depends on the shape of the graph
near the octopus merge itself (in other words, the placement of the
parent commits does not contribute to it at all).  Makes sense.

>   }
> - col_num = (i / 2) + dashless_commits + graph->commit_index;
> - strbuf_write_column(sb, >new_columns[col_num], '.');
> - return num_dashes + 1;
> + return 2 * dashful_parents;

This is natural, as we showed either dash-dash or dash-dot only for
dashful_parents after the merge itself. OK.

Thanks, will queue.



Re: [PATCH v2] cache-tree: skip some blob checks in partial clone

2018-10-09 Thread Junio C Hamano
Jonathan Tan  writes:

> After feedback, I restricted this to partial clone. Once restricted, I
> agree with Ben that this can be done for all users of
> cache_tree_update(), not just unpack-trees, so I have removed the
> ability to control the behavior using a flag.

Makes sense.  Great.

> I also took the opportunity to simplify the missing check by using a
> variable.
>  
> + ce_missing_ok = mode == S_IFGITLINK || missing_ok ||
> + (repository_format_partial_clone &&
> +  ce_skip_worktree(ce));
>   if (is_null_oid(oid) ||
> - (mode != S_IFGITLINK && !missing_ok && 
> !has_object_file(oid))) {
> + (!ce_missing_ok && !has_object_file(oid))) {

OK.  "An attempt to check out null object is bad, and otherwise,
unless we determined that it is OK to lack the object recorded in
ce, it is bad too.  By the way, the way we determine if it is OK to
be missing the object is given above".  Easier to read than the
original.


Re: What's so special about objects/17/ ?

2018-10-09 Thread Junio C Hamano
Stefan Beller  writes:
>> Oh, I think I misled you by saying "more important".
>> ...
> I do challenge the decision to take a hardcoded value, though, ...

I do not find any reason why you need to say "though" here.  If you
understood the message you are responding to that use of hardcoded
value was chosen not to help the end-user experience, it should have
been clear that we are in agreement.

I also sometimes find certain people here are unnecessarily
combative in their discussion.  It this just some language issue?




Re: [PoC -- do not apply 3/3] test-tree-bitmap: replace ewah with custom rle encoding

2018-10-09 Thread Junio C Hamano
Jeff King  writes:

> +static void strbuf_add_varint(struct strbuf *out, uintmax_t val)
> +{
> + size_t len;
> + strbuf_grow(out, 16); /* enough for any varint */
> + len = encode_varint(val, (unsigned char *)out->buf + out->len);
> + strbuf_setlen(out, out->len + len);
> +}
> +
> +static void bitmap_to_rle(struct strbuf *out, struct bitmap *bitmap)
> +{
> + int curval = 0; /* count zeroes, then ones, then zeroes, etc */
> + size_t run = 0;
> + size_t word;
> + size_t orig_len = out->len;
> +
> + for (word = 0; word < bitmap->word_alloc; word++) {
> + int bit;
> +
> + for (bit = 0; bit < BITS_IN_EWORD; bit++) {
> + int val = !!(bitmap->words[word] & (((eword_t)1) << 
> bit));
> + if (val == curval)
> + run++;
> + else {
> + strbuf_add_varint(out, run);
> + curval = 1 - curval; /* flip 0/1 */
> + run = 1;
> + }
> + }

OK.  I find it a bit disturbing to see that the loop knows a bit too
much about how "struct bitmap" is implemented, but that is a complaint
against the bitmap API, not this new user of the API.

We do not try to handle the case where bitmap has bits that is not
multiple of BITS_IN_EWORD and instead pretend that size of such a
bitmap can be rounded up, because we ignore trailing 0-bit anyway,
and we know the "struct bitmap" would pad with 0-bit at the tail?

> + }
> +
> + /*
> +  * complete the run, but do not bother with trailing zeroes, unless we
> +  * failed to write even an initial run of 0's.
> +  */
> + if (curval && run)
> + strbuf_add_varint(out, run);
> + else if (orig_len == out->len)
> + strbuf_add_varint(out, 0);
> +
> + /* signal end-of-input with an empty run */
> + strbuf_add_varint(out, 0);
> +}

OK.

> +static size_t rle_each_bit(const unsigned char *in, size_t len,
> +void (*fn)(size_t, void *), void *data)
> +{
> + int curval = 0; /* look for zeroes first, then ones, etc */
> + const unsigned char *cur = in;
> + const unsigned char *end = in + len;
> + size_t pos;
> +
> + /* we always have a first run, even if it's 0 zeroes */
> + pos = decode_varint();
> +
> + /*
> +  * ugh, varint does not seem to have a way to prevent reading past
> +  * the end of the buffer. We'll do a length check after each one,
> +  * so the worst case is bounded.
> +  */

Sorry about that :-).

> + if (cur > end) {
> + error("input underflow in rle");
> + return len;
> + }
> +
> + while (1) {
> + size_t run = decode_varint();
> +
> + if (cur > end) {
> + error("input underflow in rle");
> + return len;
> + }
> +
> + if (!run)
> + break; /* empty run signals end */
> +
> + curval = 1 - curval; /* flip 0/1 */
> + if (curval) {
> + /* we have a run of 1's; deliver them */
> + size_t i;
> + for (i = 0; i < run; i++)
> + fn(pos + i, data);
> + }
> + pos += run;
> + }
> +
> + return cur - in;
> +}

Makes sense.



Re: [PoC -- do not apply 2/3] test-tree-bitmap: add "dump" mode

2018-10-09 Thread Junio C Hamano
Jeff King  writes:

> The one difference is the sort order: git's diff output is
> in tree-sort order, so a subtree "foo" sorts like "foo/",
> which is after "foo.bar". Whereas the bitmap path list has a
> true byte sort, which puts "foo.bar" after "foo".

If we truly cared, it is easy enough to fix by having a custom
comparison function in 1/3 used in collect_paths() phase.

> + /* dump it while we have the sorted order in memory */
> + for (i = 0; i < n; i++) {
> + printf("%s", sorted[i]->path);
> + putchar('\0');
> + }

With printf("%s%c", sorted[i]->path, '\0'); you can lose the braces.

> + putchar('\0');
> +
>   free(sorted);
>  }
>  
> @@ -142,6 +150,8 @@ static void generate_bitmap(struct diff_queue_struct *q,
>  
>   ewah = bitmap_to_ewah(bitmap);
>   ewah_serialize_strbuf(ewah, );
> +
> + fwrite(data->commit->object.oid.hash, 1, GIT_SHA1_RAWSZ, stdout);
>   fwrite(out.buf, 1, out.len, stdout);

OK, so per commit, we have ewah bitmap that records the "changed
paths" after the commit object name.  Makes sense.

And the list of paths are based on the "one" side of the filepair.
When we do an equivalent of "git show X", we see "diff-tree X~1 X"
and by collecting the "one" side (i.e. subset of paths in the tree
of X~1 that were modified when going to X) we say "commit X changed
these paths".  Makes sense, too.

> -int cmd_main(int argc, const char **argv)
> +static void do_gen(void)
>  {
>   struct hashmap paths;
> -

Let's not lose this blank line.

>   setup_git_directory();
>   collect_paths();
>  
>   walk_paths(generate_bitmap, );
> +}
> +
> +static void do_dump(void)
> +{
> + struct strbuf in = STRBUF_INIT;
> + const char *cur;
> + size_t remain;
> +
> + const char **paths = NULL;
> + size_t alloc_paths = 0, nr_paths = 0;
> +
> + /* slurp stdin; in the real world we'd mmap all this */
> + strbuf_read(, 0, 0);
> + cur = in.buf;
> + remain = in.len;
> +
> + /* read path for each bit; in the real world this would be separate */
> + while (remain) {
> + const char *end = memchr(cur, '\0', remain);
> + if (!end) {
> + error("truncated input while reading path");
> + goto out;
> + }
> + if (end == cur) {
> + /* empty field signals end of paths */
> + cur++;
> + remain--;
> + break;
> + }
> +
> + ALLOC_GROW(paths, nr_paths + 1, alloc_paths);
> + paths[nr_paths++] = cur;
> +
> + remain -= end - cur + 1;
> + cur = end + 1;
> + }
> +

OK.

> + while (remain) {
> + struct object_id oid;
> + struct ewah_bitmap *ewah;
> + ssize_t len;
> +
> + if (remain < GIT_SHA1_RAWSZ) {
> + error("truncated input reading oid");
> + goto out;
> + }
> + hashcpy(oid.hash, (const unsigned char *)cur);
> + cur += GIT_SHA1_RAWSZ;
> + remain -= GIT_SHA1_RAWSZ;
> +
> + ewah = ewah_new();
> + len = ewah_read_mmap(ewah, cur, remain);
> + if (len < 0) {
> + ewah_free(ewah);
> + goto out;
> + }
> +
> + printf("%s\n", oid_to_hex());
> + ewah_each_bit(ewah, show_path, paths);
> +
> + ewah_free(ewah);
> + cur += len;
> + remain -= len;
> + }

Makes perfect sense.

> +out:
> + free(paths);
> + strbuf_release();
> +}
> +
> +int cmd_main(int argc, const char **argv)
> +{
> + const char *usage_msg = "test-tree-bitmap ";
> +
> + if (!argv[1])
> + usage(usage_msg);
> + else if (!strcmp(argv[1], "gen"))
> + do_gen();
> + else if (!strcmp(argv[1], "dump"))
> + do_dump();
> + else
> + usage(usage_msg);
>  
>   return 0;
>  }


Re: [PATCH 0/2] branch: introduce --current display option

2018-10-09 Thread Junio C Hamano
Daniels Umanovskis  writes:

> I often find myself needing the current branch name, for which
> currently there's git rev-parse --abrev-ref HEAD. I would expect
> `git branch` to have an option to output the branch name instead.

[jc:  wrapped an overlong line]

If "git branch" had many operations that work on multiple branches
by default, and we were adding an option to work on a single branch
that is currently checked out, then I would find "--current" is a
very good name for an option that turns all these operations to work
only on the one that is currently checked out.

But I do not think that is what is going on.  There is "--list" that
lists branches whose name match given patterns, and at the end-user
level (I haven't seen the implementation) this is another mode of
that operation that limits itself to the one that is currently
checked out, and you do not even allowed to give the "--list" option
explicitly so that in the future when "git branch" learns to perform
an operation other than "list" (let's call it 'distim') to bunch of
branches by default, you cannot say "git --distim --current" to
limit the distimming to the branch that you are currently on.

I do not offhand know if we want "show the current one only" option
that is "command mode" sitting next to "list", "delete", "rename"
etc., or "limit the operation to the one that is currently cheked
out".  If we want the former, the name of the option must *NOT* be
just "current".  Have a verb in its name to avoid it from getting
mistaken as a botched attempt to do the latter.  Somethng like
"--show-current", "--list-current", "--display-current", etc.

Even if we were doing the latter (i.e. focused "this is only for
listing/showing"), if we do not want to close the door to later
extend the concept of "current" to the former (i.e. "--show-current"
becomes a convenience synonym for "--list --current-only") we also
need to think about what to do with the detached HEAD state.  When
the concept of "current" is extended to become "usually an operation
can work on multiple branches but we are limiting it to the current
one", detached HEAD state is conceptually "not having any current
branch".  We could fail the operation (i.e. you told me to distim
the branch but there is no such branch) or make it a silent no-op
(i.e. you told me to distim no branch, so nothing happened and there
is no error).

My inclination is to recommend to:

 (1) name the "show the current one" not "--current" but with some
 verb

 (2) display nothing when there is no current branch (i.e. detached
 HEAD) and without any error.







Re: [PATCH][Outreachy] remove all the inclusions of git-compat-util.h in header files

2018-10-09 Thread Junio C Hamano
Derrick Stolee  writes:

> On 10/8/2018 1:05 PM, Ananya Krishna Maram wrote:
>> Hi All,
> Hello, Ananya! Welcome.
>
>> I was searching through #leftovers and found this.
>> https://public-inbox.org/git/cabpp-bgvvxcbzx44er6to-pusfen_6gnyj1u5cuon9deaa4...@mail.gmail.com/
>>
>> This patch address the task discussed in the above link.
> The discussion above seems to not be intended for your commit message,
> but it does show up when I run `git am` and provide your email as
> input. The typical way to avoid this is to place all commentary below
> the "---" 
> that signifies the commit message is over.

>> From: Ananya Krishan Maram 
>>
>> skip the #include of git-compat-util.h since all .c files include it.
>>
>> Signed-off-by: Ananya Krishna Maram 
>> ---
>>   advice.h | 1 -
>>   commit-graph.h   | 1 -
>>   hash.h   | 1 -
>>   pkt-line.h   | 1 -
>>   t/helper/test-tool.h | 1 -
>>   5 files changed, 5 deletions(-)
>>
>> diff --git a/advice.h b/advice.h
>> index ab24df0fd..09148baa6 100644
>> --- a/advice.h
>> +++ b/advice.h
>> @@ -1,7 +1,6 @@
>>   #ifndef ADVICE_H
>>   #define ADVICE_H
>>   -#include "git-compat-util.h"
>> extern int advice_push_update_rejected;
>>   extern int advice_push_non_ff_current;

The way I read the original discussion is "C source that includes
compat-util.h shouldn't if it already includes cache.h"; advice.h is
not C and does not (should not) include cache.h.

The "left over bits" should not be blindly trusted, and besides,
Elijah punted to examine and think about each case and left it to
others, so whoever is picking it up should do the thinking, not a
blind conversion.  I am not getting a feeling that this patch was
done with careful thinking after checking only this one.



Re: [PATCH] unpack-trees: allow missing CE_SKIP_WORKTREE objs

2018-10-09 Thread Junio C Hamano
Jonathan Tan  writes:

> @@ -1635,6 +1635,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, 
> struct unpack_trees_options
>   o->result.cache_tree = cache_tree();
>   if (!cache_tree_fully_valid(o->result.cache_tree))
>   cache_tree_update(>result,
> +   
> WRITE_TREE_SKIP_WORKTREE_MISSING_OK |
> WRITE_TREE_SILENT |
> WRITE_TREE_REPAIR);
>   }

H.  Should this be passing the bit unconditionally?  Shouldn't
it be set only when we are doing lazy clone?  A non-lazy clone that
merely uses sparse checkout has nowhere else to turn to if it loses
a blob object that currently is not necessary to complete a checkout,
unlike a repository with promisor.


Re: [PATCH] unpack-trees: allow missing CE_SKIP_WORKTREE objs

2018-10-09 Thread Junio C Hamano
Jonathan Tan  writes:

> Because cache_tree_update() is used from multiple places, this new
> behavior is guarded by a new flag WRITE_TREE_SKIP_WORKTREE_MISSING_OK.

The name of the new flag is mouthful, but we know we do not need to
materialize these blobs (exactly because the skip-worktree bit is
set, so we do not need to know what's in these blobs) and it is OK
for these to be missing (to put it differently, we do not care if
they exist or not---hence we short-circuit the otherwise required
call to has_object_file()), iow, the name of the mode is "A missing
object with skip-worktree bit set is OK", which makes sense to me.

>   if (is_null_oid(oid) ||
> - (mode != S_IFGITLINK && !missing_ok && 
> !has_object_file(oid))) {
> + (mode != S_IFGITLINK && !missing_ok &&
> +  !(skip_worktree_missing_ok && ce_skip_worktree(ce)) &&
> +  !has_object_file(oid))) {

For a non-gitlink entry, if the caller does not say "any missing
object is OK", we normally check has_object_file().  But now
has_object_file() call happens only when ...

Hmph, isn't this new condition somewhat wrong?  We do not want to
skip it for entries without skip-worktree bit set.  We only do not
care if we are operating in skip-worktree-missing-ok mode and the
bit is set on ce.  IOW:

if ((skip_worktree_missing_ok && ce_skip_worktree(ce)) ||
has_object_file(oid))
/* then we are happy */

but the whole thing above tries to catch problematic case, so I'd
need to negate that, i.e.

if ( ... &&
!((skip_worktree_missing_ok && ce_skip_worktree(ce)) ||
 has_object_file(oid)))
/* we are in trouble */

and pushing the negation down, we get

if ( ... &&
(!(skip_worktree_missing_ok && ce_skip_worktree(ce)) &&
 !has_object_file(oid)))
/* we are in trouble */

OK.  The logic in the patch is correct.  I however feel that it is a
bit too dense to make sense of.

Thanks, will queue.


Re: [PATCH 2/3] midx: close multi-pack-index on repack

2018-10-09 Thread Junio C Hamano
"Derrick Stolee via GitGitGadget"  writes:

> diff --git a/builtin/repack.c b/builtin/repack.c
> index c6a7943d5c..7925bb976e 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -432,6 +432,10 @@ int cmd_repack(int argc, const char **argv, const char 
> *prefix)
>  
>   if (!midx_cleared) {
>   /* if we move a packfile, it will invalidated 
> the midx */
> + if (the_repository->objects) {
> + 
> close_midx(the_repository->objects->multi_pack_index);
> + 
> the_repository->objects->multi_pack_index = NULL;
> + }
>   clear_midx_file(get_object_directory());
>   midx_cleared = 1;

It somehow looks like a bit of layering violation, doesn't it?  When
we are clearing a midx file, don't we always want to do this as well?


Re: [PATCH 1/3] midx: fix broken free() in close_midx()

2018-10-09 Thread Junio C Hamano
"Derrick Stolee via GitGitGadget"  writes:

> From: Derrick Stolee 
>
> When closing a multi-pack-index, we intend to close each pack-file
> and free the struct packed_git that represents it. However, this
> line was previously freeing the array of pointers, not the
> pointer itself. This leads to a double-free issue.
>
> Signed-off-by: Derrick Stolee 
> ---
>  midx.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/midx.c b/midx.c
> index f3e8dbc108..999717b96f 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -190,7 +190,7 @@ static void close_midx(struct multi_pack_index *m)
>   for (i = 0; i < m->num_packs; i++) {
>   if (m->packs[i]) {
>   close_pack(m->packs[i]);
> - free(m->packs);
> + free(m->packs[i]);
>   }
>   }
>   FREE_AND_NULL(m->packs);

Yup, kinda obvious when we view it with the post context.


Re: [PATCH 1/1] commit-graph: define GIT_TEST_COMMIT_GRAPH

2018-10-08 Thread Junio C Hamano
Derrick Stolee  writes:

> I see these failures, too, but I believe they are due to
> ds/commit-graph-with-grafts not being merged to 'next' yet. The
> purpose of that branch is to fix these test breaks. The environment
> variable got merged a lot faster.

A separate "ping" would have helped me.  Will merge it down to
'next'.


Re: How to handle patch series conflicts

2018-10-08 Thread Junio C Hamano
Stephen & Linda Smith  writes:

> Junio - I've been working this but would like your opinion on 7500, 7501 and 
> now 7510. 
>
> I note that the the commit tests have intermixed functionality.  An example 
> is 
> signoff tests that are in the three tests I mentioned. 
>
> I've been tempted multiple times over the last week to just merge the tests 
> into a single script, but that doesn't seem right either.
>
> So would you prefer a single script?   Would you prefer me to move tests 
> around?

The scripts themselves having the same name that is no more specific
tha just "commit" does not bother _me_ personally too much.  If I
were doing it, unless you are an obsessive type that wants to see
spanking cleanness everywhere, I'd limit the changes to the minimum.

If something tested in script X is tested in another script Y and it
is trivial to see they are testing exactly the same thing, removing
one copy from script Y would be good, and if the remaining changes
in script Y becomes more focused with only such removals, that would
even be better, as at that point we can rename "tY-commit.sh" to
something more specific like "tY-commit-signature.sh".


Re: [PATCH 1/1] rebase -i: introduce the 'break' command

2018-10-08 Thread Junio C Hamano
Junio C Hamano  writes:

>> There is one crucial difference: the exit code.
>
> OK, and it was good that you explicitly said "with exit code 0" in
> the log message.  Together with the idea to update the doc I floated
> earlier, this probably is worth documenting, too.

Heh, I am becoming sloppy in reviewing.  The patch does not even
update any doc.

It is not a reason to reject the change (the change looks reasonably
simple and reviewers and those who have to look at the code to build
upon it would understand it in the current shape), but it is a
blocker for the change to be merged to 'next' and down.



Re: git log -S or -G

2018-10-08 Thread Junio C Hamano
Jeff King  writes:

> I think that is the best we could do for "-S", though, which is
> inherently about counting hits.
>
> For "-G", we are literally grepping the diff. It does not seem
> unreasonable to add the ability to grep only "-" or "+" lines, and the
> interface for that should be pretty straightforward (a tri-state flag to
> look in remove, added, or both lines).

Yeah, here is a lunchtime hack that hasn't even been compile tested.

 diff.c |  4 
 diff.h |  2 ++
 diffcore-pickaxe.c | 22 --
 3 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index f0c7557b40..d1f2780844 100644
--- a/diff.c
+++ b/diff.c
@@ -5068,6 +5068,10 @@ int diff_opt_parse(struct diff_options *options,
}
else if (!strcmp(arg, "--pickaxe-all"))
options->pickaxe_opts |= DIFF_PICKAXE_ALL;
+   else if (!strcmp(arg, "--pickaxe-ignore-add"))
+   options->pickaxe_opts |= DIFF_PICKAXE_IGNORE_ADD;
+   else if (!strcmp(arg, "--pickaxe-ignore-del"))
+   options->pickaxe_opts |= DIFF_PICKAXE_IGNORE_DEL;
else if (!strcmp(arg, "--pickaxe-regex"))
options->pickaxe_opts |= DIFF_PICKAXE_REGEX;
else if ((argcount = short_opt('O', av, ))) {
diff --git a/diff.h b/diff.h
index a30cc35ec3..147c47ace7 100644
--- a/diff.h
+++ b/diff.h
@@ -358,6 +358,8 @@ int git_config_rename(const char *var, const char *value);
 DIFF_PICKAXE_KIND_OBJFIND)
 
 #define DIFF_PICKAXE_IGNORE_CASE   32
+#define DIFF_PICKAXE_IGNORE_ADD64
+#define DIFF_PICKAXE_IGNORE_DEL 128
 
 void diffcore_std(struct diff_options *);
 void diffcore_fix_diff_index(struct diff_options *);
diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index 800a899c86..826dde6bd4 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -16,6 +16,7 @@ typedef int (*pickaxe_fn)(mmfile_t *one, mmfile_t *two,
 
 struct diffgrep_cb {
regex_t *regexp;
+   struct diff_options *diff_options;
int hit;
 };
 
@@ -23,9 +24,14 @@ static void diffgrep_consume(void *priv, char *line, 
unsigned long len)
 {
struct diffgrep_cb *data = priv;
regmatch_t regmatch;
+   unsigned pickaxe_opts = data->diff_options->pickaxe_opts;
 
if (line[0] != '+' && line[0] != '-')
return;
+   if ((pickaxe_opts & DIFF_PICKAXE_IGNORE_ADD) && line[0] == '+')
+   return;
+   if ((pickaxe_opts & DIFF_PICKAXE_IGNORE_DEL) && line[0] == '-')
+   return;
if (data->hit)
/*
 * NEEDSWORK: we should have a way to terminate the
@@ -45,13 +51,20 @@ static int diff_grep(mmfile_t *one, mmfile_t *two,
xpparam_t xpp;
xdemitconf_t xecfg;
 
-   if (!one)
+   if (!one) {
+   if (o->pickaxe_opts & DIFF_PICKAXE_IGNORE_ADD)
+   return 0;
return !regexec_buf(regexp, two->ptr, two->size,
1, , 0);
-   if (!two)
+   }
+   if (!two) {
+   if (o->pickaxe_opts & DIFF_PICKAXE_IGNORE_DEL)
+   return 0;
return !regexec_buf(regexp, one->ptr, one->size,
1, , 0);
+   }
 
+   ecbdata.diff_options = o;
/*
 * We have both sides; need to run textual diff and see if
 * the pattern appears on added/deleted lines.
@@ -113,6 +126,11 @@ static int has_changes(mmfile_t *one, mmfile_t *two,
 {
unsigned int one_contains = one ? contains(one, regexp, kws) : 0;
unsigned int two_contains = two ? contains(two, regexp, kws) : 0;
+
+   if (o->pickaxe_opts & DIFF_PICKAXE_IGNORE_ADD)
+   return one_contains > two_contains;
+   if (o->pickaxe_opts & DIFF_PICKAXE_IGNORE_DEL)
+   return one_contains < two_contains;
return one_contains != two_contains;
 }
 


Re: [PATCH 1/1] rebase -i: introduce the 'break' command

2018-10-08 Thread Junio C Hamano
Johannes Schindelin  writes:

> Hi Junio,
>
> On Fri, 5 Oct 2018, Junio C Hamano wrote:
>
>> "Johannes Schindelin via GitGitGadget" 
>> writes:
>> 
>> > From: Johannes Schindelin 
>> >
>> > The 'edit' command can be used to cherry-pick a commit and then
>> > immediately drop out of the interactive rebase, with exit code 0, to let
>> > the user amend the commit, or test it, or look around.
>>  ...
>> If one wants to emulate this with the versions of Git that are
>> currently deployed, would it be sufficient to insert "exec false"
>> instead of "break"?
>
> There is one crucial difference: the exit code.

OK, and it was good that you explicitly said "with exit code 0" in
the log message.  Together with the idea to update the doc I floated
earlier, this probably is worth documenting, too.

>> I think the earlier request asked for 'pause' (I didn't dig the list
>> archive very carefully, though),
>
> No need to: I mentioned it in the cover letter. Here is the link again,
> for your convenience:
> https://public-inbox.org/git/20180118183618.39853-3-sbel...@google.com/

No, you misunderstood.  I knew what message in the immediate past
triggered this patch and that wasn't what I "could have dug for but
didn't".  "The earlier request" I meant was another one I recall
that was made long time ago---that was what I "could have dug for
but didn't".

> Good initial version. I would like it to be a bit more precise about who
> exits with what status. How about this:

Sounds good.  It is likely that I'll either forget or will continue
to be too busy to pick textual pieces and assemble together myself,
so I'll leave it as a left over bit for somebody reading from the
sideline to send in a finished version if they care deeply enough
;-)

Thanks.



Re: [PATCH v6 09/10] submodule: support reading .gitmodules when it's not in the working tree

2018-10-08 Thread Junio C Hamano
Junio C Hamano  writes:

> Antonio Ospite  writes:
>
>> Finally, add t7416-submodule-sparse-gitmodules.sh to verify that reading
>> from .gitmodules succeeds and that writing to it fails when the file is
>> not checked out.
>> ...
>>  t/t7416-submodule-sparse-gitmodules.sh | 78 ++
>
> This now triggers test-lint errors as the most recent maintenance
> release took t/t7416 for something else.  I'll do s/t7416-/t7418-/g
> on the mailbox before running "git am -s" on this series.

This is an unrelated tangent to the topic, but running "range-diff"
on what has been queued on 'pu' since mid September and this
replacement after doing the renaming was a surprisingly pleasant
experience.  In its comparison between 09/10 of the two iterations,
it showed that 7416's name has been changed to 7418 but otherwise
there is no change in the contents of that test script.

FWIW, tbdiff also gets this right, so the pleasant experience was
inherited without getting broken.  Kudos should go to both Thomas
and Dscho ;-).


Re: [PATCH v6 09/10] submodule: support reading .gitmodules when it's not in the working tree

2018-10-08 Thread Junio C Hamano
Antonio Ospite  writes:

> Finally, add t7416-submodule-sparse-gitmodules.sh to verify that reading
> from .gitmodules succeeds and that writing to it fails when the file is
> not checked out.
> ...
>  t/t7416-submodule-sparse-gitmodules.sh | 78 ++

This now triggers test-lint errors as the most recent maintenance
release took t/t7416 for something else.  I'll do s/t7416-/t7418-/g
on the mailbox before running "git am -s" on this series.



Re: What's so special about objects/17/ ?

2018-10-08 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> Depending on how we're counting there's at least two.

I thought you were asking "why the special sentinel is not 0{40}?"
You counted the number of reasons why 0{40} is used to stand in for
a real value, but that was the number I didn't find interesting in
the scope of this discussion, i.e. "why the special sample is 17?"

I vaguely recall we also used 0{39}1 for something else long time
ago; I offhand do not recall if we still do, or we got rid of it.


Re: What's so special about objects/17/ ?

2018-10-08 Thread Junio C Hamano
Stefan Beller  writes:

> On Sun, Oct 7, 2018 at 1:07 PM Junio C Hamano  wrote:
>>
>> Junio C Hamano  writes:
>
>> > ...
>> > by general public and I do not have to explain the choice to the
>> > general public ;-)
>>
>> One thing that is more important than "why not 00 but 17?" to answer
>> is why a hardcoded number rather than a runtime random.  It is for
>> repeatability.
>
> Let's talk about repeatability vs statistics for a second. ;-)

Oh, I think I misled you by saying "more important".  

I didn't mean that it is more important to stick to the "use
hardcoded value" design decision than sticking to "use 17".  I've
made sure that everybody would understnd choosing any arbitrary byte
value other than "17" does not make the resulting Git any better nor
worse.  But discussing the design decision to use hardcoded value is
"more important", as that affects the balance between the end-user
experience and debuggability, and I tried to help those who do not
know the history by giving the fact that choice was made for the
latter and not for other hidden reasons, that those who would
propose to change the system may have to keep in mind.

Sorry if you mistook it as if I were saying that it is important to
keep the design to use a hardcoded byte value.  That wasn't what the
message was about.


Re: git log -S or -G

2018-10-08 Thread Junio C Hamano
Julia Lawall  writes:

>> Doing the same for -S is much harder at the machinery level, as it
>> performs its thing without internally running "diff" twice, but just
>> counts the number of occurrences of 'foo'---that is sufficient for
>> its intended use, and more efficient.
>
> There is still the question of whether the number of occurrences of foo
> decreases or increases.

Hmph, taking the changes that makes the number of hits decrease
would catch a subset of "changes that removes 'foo' only---I am not
interested in the ones that adds 'foo'".  It will avoid getting
confused by a change that moves an existing 'foo' to another place
in the same file (as the number of hits does not change), but at the
same time, it will miss a change that genuinely removes an existing
'foo' and happens to add a 'foo' at a different place in the same
file that is unrelated to the original 'foo'.  Depending on the
definition of "I am only interested in removed ones", that may or
may not be acceptable.





Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-08 Thread Junio C Hamano
SZEDER Gábor  writes:

> There is certainly potential there.  With a (very) rough PoC
> experiment, a 8MB bloom filter, and a carefully choosen path I can
> achieve a nice, almost 25x speedup:
>
>   $ time git rev-list --count HEAD -- t/valgrind/valgrind.sh
>   6
>
>   real0m1.563s
>   user0m1.519s
>   sys 0m0.045s
>
>   $ time GIT_USE_POC_BLOOM_FILTER=y ~/src/git/git rev-list --count HEAD -- 
> t/valgrind/valgrind.sh
>   6
>
>   real0m0.063s
>   user0m0.043s
>   sys 0m0.020s

Even though I somehow sense a sign of exaggeration in [v] in the
pathname, it still is quite respectable.

>   bloom filter total queries: 16269 definitely not: 16195 maybe: 74 false 
> positives: 64 fp ratio: 0.003934
>
> But I'm afraid it will take a while until I get around to turn it into
> something presentable...

That's OK.  This is an encouraging result.

Just from curiousity, how are you keying the filter?  tree object
name of the top-level and full path concatenated or something like
that?


Re: What's so special about objects/17/ ?

2018-10-07 Thread Junio C Hamano
Johannes Sixt  writes:

> Am 07.10.18 um 21:06 schrieb Ævar Arnfjörð Bjarmason:
>> Picking any one number is explained in the comment. I'm asking why 17 in
>> particular not for correctness reasons but as a bit of historical lore,
>> and because my ulterior is to improve the GC docs.
>>
>> The number in that comic is 4 (and no datestamp on when it was
>> published). Are you saying Junio's patch is somehow a reference to that
>> xkcd in particular, or that it's just a funny reference in this context?
>
> No lore, AFAIR. It's just a random number, determined by a fair dice
> roll or something ;)

As I already said, I did not pick the number randomly, but rather
arbitrarily, and it is not 00 because the chosen number (unlike the
0{40} magic we use elsewhere) does not have to be memorable, and the
choice does not have to be explainable.

So people will not get any further explanation as to the reason
behind that arbitrary choice, but it was not random.


Re: What's so special about objects/17/ ?

2018-10-07 Thread Junio C Hamano
Junio C Hamano  writes:

> Ævar Arnfjörð Bjarmason  writes:
>
>> 1. We still have this check of objects/17/ in builtin/gc.c today. Why
>>objects/17/ and not e.g. objects/00/ to go with other 000* magic such
>>as the  SHA-1?d  Statistically
>>it doesn't matter, but 17 seems like an odd thing to pick at random
>>out of 00..ff, does it have any significance?
>
> ...
> by general public and I do not have to explain the choice to the
> general public ;-)

One thing that is more important than "why not 00 but 17?" to answer
is why a hardcoded number rather than a runtime random.  It is for
repeatability.



Re: What's so special about objects/17/ ?

2018-10-07 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> 1. We still have this check of objects/17/ in builtin/gc.c today. Why
>objects/17/ and not e.g. objects/00/ to go with other 000* magic such
>as the  SHA-1?d  Statistically
>it doesn't matter, but 17 seems like an odd thing to pick at random
>out of 00..ff, does it have any significance?

There is no "other 000* magic such as ...". There is only one 0{40}
magic and that one must be memorable and explainable.

The 1/256 sample can be any one among 256.  Just like the date
string on the first line of the output to be used as the /etc/magic
signature by format-patch, it was an arbitrary choice, rather than a
random choice, and unlike 0{40} this does not have to be memorable
by general public and I do not have to explain the choice to the
general public ;-)

> 2. It seems overly paranoid to be checking that the files in
>   .git/objects/17/ look like a SHA-1.

There is no other reason than futureproofing.  We were paying cost
to open and scan the directory anyway, and checking that we only
count the loose object files was (and still is) a sensible thing to
do to allow us not even worry about the other kind of things we
might end up creating there.


Re: [PATCH v2 3/8] refs: new ref types to make per-worktree refs visible to all worktrees

2018-10-06 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 2dd77f9485..9ca2a3706c 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> ...
>   case REF_TYPE_PSEUDOREF:
>   strbuf_addf(sb, "%s/logs/%s", refs->gitdir, refname);
>   break;
> + case REF_TYPE_OTHER_PSEUDOREF:
> + return files_reflog_path_other_worktrees(refs, sb, refname);
> + case REF_TYPE_MAIN_PSEUDOREF:
> + if (!skip_prefix(refname, "main-worktree/", ))
> + BUG("ref %s is not a main pseudoref", refname);
> + /* passthru */

Correct spelling of the colloquial phrase is fallthru, but see
1cf01a34 ("consistently use "fallthrough" comments in switches",
2017-09-21) that encourages use of "fallthrough" fully spelled out.

Otherwise you'd see something like this.

CC refs/files-backend.o
refs/files-backend.c: In function 'files_ref_path':
refs/files-backend.c:203:6: error: this statement may fall through 
[-Werror=implicit-fallthrough=]
   if (!skip_prefix(refname, "main-worktree/", ))
  ^
refs/files-backend.c:206:2: note: here
  case REF_TYPE_OTHER_PSEUDOREF:
  ^~~~
refs/files-backend.c: In function 'files_reflog_path':
refs/files-backend.c:181:6: error: this statement may fall through 
[-Werror=implicit-fallthrough=]
   if (!skip_prefix(refname, "main-worktree/", ))
  ^
refs/files-backend.c:184:2: note: here
  case REF_TYPE_NORMAL:
  ^~~~
cc1: all warnings being treated as errors
Makefile:2289: recipe for target 'refs/files-backend.o' failed
make: *** [refs/files-backend.o] Error 1


Re: [PATCH] git-completion.bash: Add completion for stash list

2018-10-06 Thread Junio C Hamano
Steve  writes:

> Since stash list accepts git-log options, add the following useful
> options that make sense in the context of the `git stash list` command:
>
>   --name-status --oneline --patch-with-stat
>
> Signed-off-by: Steven Fernandez 
> ---
>
> This is my first patch to the project so please be excuse any process
> errors.
> Although, I've tried my best to follow the guidelines in
> Documentation/SubmittingPatches but I'm not sure if I missed anything.

Thanks.  Will queue with manual fix-ups, but since you asked, here
are the things I'll be fixing up manually, which you may want to
avoid next time.

 (1) We strongly prefer to see contributor's identity recorded as
 the "author" of a commit to match the sign-off from the
 contributor.  Your MSA sent your message with only the more
 casual version of your first name on the "From: " header, which
 does not match your sign-off.  It would have been more correct
 if you added two lines at the beginning of the body of the
 message, i.e. before "Since stash list accepts...".  The first
 line would be

From: Steven Fernandez 

 and then you would have a blank line immediately after that.
 Your "Since stash list accepts..." would become the third line
 of the body.  That will tell the receiving end that the author
 identity of the resulting commit should not be "Steve" but
 should be "Steven Fernandez".

 (2) "git shortlog --no-merges" would show that we tend not to
 capitalize the first word after ":" on the subject line.

 (3) Your patch is line-wrapped (see below that has with-stat on its
 own line after the line you intended it to go).

 (4) You somehow generated your patch with "-p0".  It is OK to do
 whatever for your own use, but patch submission is 
 communication with others, so don't be original by doing
 unusual things.

None of the above is something I cannot fix on the receiving end
myself, but at the same time, it is not something people should be
wasting maintainer's time on, and small wastes add up.

>  contrib/completion/git-completion.bash | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git contrib/completion/git-completion.bash
> contrib/completion/git-completion.bash
> index d63d2dffd..06ec6ca11 100644
> --- contrib/completion/git-completion.bash
> +++ contrib/completion/git-completion.bash
> @@ -2567,6 +2567,9 @@ _git_stash ()
>   drop,--*)
>   __gitcomp "--quiet"
>   ;;
> + list,--*)
> + __gitcomp "--name-status --oneline --patch-
> with-stat"
> + ;;
>   show,--*|branch,--*)
>   ;;
>   branch,*)


Re: [RFC PATCH v2 2/4] transport: do not list refs if possible

2018-10-06 Thread Junio C Hamano
Stefan Beller  writes:

> On Thu, Sep 27, 2018 at 12:24 PM Jonathan Tan  
> wrote:
>
>>
>> +test_expect_success 'when dynamically fetching missing object, do not list 
>> refs' '
>> +   cat trace &&
>
> leftover debug cat?

It does look that way.


Re: git log -S or -G

2018-10-06 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> On Sat, Oct 6, 2018 at 5:16 PM Julia Lawall  wrote:
>> Git log -S or -G make it possible to find commits that have particular
>> words in the changed lines.  Sometimes it would be helpful to search for
>> words in the removed lines or in the added lines specifically.  From the
>> implementation, I had the impression that this would be easy to implement.
>> The main question would be how to allow the user to specify what is
>> wanted.
>
> As far as I know this isn't possible. The --diff-filter option is
> similar in spirit, but e.g. adding "foo" and then removing it from an
> existing file will both be covered under --diff-filter=M, so that
> isn't what you're looking for.

I agree with Julia that UI to the feature is harder than the
machinery to implement the feature to add "I am interested in seeing
a patch that contains a hunk that adds 'foo' but am not interested
in removal" (or vice versa) for -G.  You tweak
diffcore-pickaxe.c::diffgrep_consume() and you'are done.

Doing the same for -S is much harder at the machinery level, as it
performs its thing without internally running "diff" twice, but just
counts the number of occurrences of 'foo'---that is sufficient for
its intended use, and more efficient.



Re: [PATCH] builtin/grep.c: remote superflous submodule code

2018-10-06 Thread Junio C Hamano
Stefan Beller  writes:

> After ff6f1f564c4 (submodule-config: lazy-load a repository's .gitmodules
> file, 2017-08-03) this is no longer necessary, but that commit did not
> cleanup the whole tree, but just show cased the new way how to deal with
> submodules in ls-files.

The log message of the above one singles out "grep" as a special
case and explalins why it did not touch, by the way.  You probably
need to explain the reason why "this is no longer necessary" a bit
better than the above---as it stands, it is "ff6f1f564c4 said it
still is necessary, I say it is not".


Re: [PATCH] builtin/grep.c: remote superflous submodule code

2018-10-06 Thread Junio C Hamano
Stefan Beller  writes:

> In f9ee2fcdfa (grep: recurse in-process using 'struct repository',
> 2017-08-02), we introduced a call to repo_read_gitmodules in builtin/grep
> to simplify the submodule handling.
>
> After ff6f1f564c4 (submodule-config: lazy-load a repository's .gitmodules
> file, 2017-08-03) this is no longer necessary, but that commit did not
> cleanup the whole tree, but just show cased the new way how to deal with
> submodules in ls-files.
>
> Cleanup the only remaining caller to repo_read_gitmodules outside of
> submodule.c

Well, submodule-config.c has its implementation and another caller,
which technically is outside submodule.c ;-)  repo_read_gitmodules
has two more callers in unpack-trees.c these days, so perhaps we can
do without this last paragraph.



Re: [PATCH v11 8/8] list-objects-filter: implement filter tree:0

2018-10-06 Thread Junio C Hamano
Matthew DeVore  writes:

> The name "tree:0" allows later filtering based on depth, i.e. "tree:1"
> would filter out all but the root tree and blobs. In order to avoid
> confusion between 0 and capital O, the documentation was worded in a
> somewhat round-about way that also hints at this future improvement to
> the feature.
>
> Signed-off-by: Matthew DeVore 

Thanks.

> diff --git a/Documentation/rev-list-options.txt 
> b/Documentation/rev-list-options.txt
> index 7b273635d..5f1672913 100644
> --- a/Documentation/rev-list-options.txt
> +++ b/Documentation/rev-list-options.txt
> @@ -731,6 +731,11 @@ the requested refs.
>  +
>  The form '--filter=sparse:path=' similarly uses a sparse-checkout
>  specification contained in .
> ++
> +The form '--filter=tree:' omits all blobs and trees whose depth
> +from the root tree is >=  (minimum depth if an object is located
> +at multiple depths in the commits traversed). Currently, only =0
> +is supported, which omits all blobs and trees.

OK.

> diff --git a/t/t5317-pack-objects-filter-objects.sh 
> b/t/t5317-pack-objects-filter-objects.sh
> index 9839b48c1..510d3537f 100755
> --- a/t/t5317-pack-objects-filter-objects.sh
> +++ b/t/t5317-pack-objects-filter-objects.sh
> @@ -72,6 +72,34 @@ test_expect_success 'get an error for missing tree object' 
> '
>   grep -q "bad tree object" bad_tree
>  '

As output made inside test_expect_{succcess,failure} are discarded
by default and shown while debugging tests, there is no strong
reason to use "grep -q" in our tests.  I saw a few instances of
"grep -q" added in this series including this one

test_must_fail grep -q "$file_4" observed

that should probably be

! grep "$file_4" observed

> + printf "blob\ncommit\ntree\n" >unique_types.expected &&
> ...
> + printf "blob\ntree\n" >expected &&

Using test_write_lines is probably easier to read.

Other than these two minor classes of nits, the series look quite
well cooked to me.

Thanks.


Re: [PATCH v5 0/7] subject: Clean up tests for test_cmp arg ordering and pipe placement

2018-10-06 Thread Junio C Hamano
Matthew DeVore  writes:

> This version of the patchset fixes some wording and formatting issues
> pointed out by Junio. The commit message in the first patch has also
> been reworded.


Thanks.

If no further major issues are raised, let's merge it to 'next'.




Re: [PATCH v3 0/1] Teach the builtin rebase about the builtin interactive rebase

2018-10-06 Thread Junio C Hamano
"Johannes Schindelin via GitGitGadget" 
writes:

> Note: while this patch targets pk/rebase-in-c-6-final, it will not work
> correctly without ag/rebase-i-in-c. So my suggestion is to rewrite the 
> pk/rebas-in-c-6-final branch by first merging ag/rebase-i-in-c, then
> applying this here patch, and only then cherry-pick "rebase: default to
> using the builtin rebase".

Is this a stale comment in the context of v3, where we pretty much
know how the resulting topic should intertwine with other topics?  I
am trying to see if I have do to anything differently this time, or
just replacing the single commit while keeping the structure around
that commit the same is fine.

Also, I see there is a new iteration v8 of ag/rebase-i-in-c on the
list, sent on Sep 27th (you were CC'ed but I suspect it was before
you came back).  Are you happy with that update?  Otherwise, we
should make sure that topic is solid enough before extending
(meaning: I'd replace this one while keeping ag/rebase-i-in-c that
has been cooking in 'pu', without updating the latter).

> Changes since v2:
>
>  * Prepare for the break command, by skipping the call to finish_rebase() 
>for interactive rebases altogether (the built-in interactive rebase
>already takes care of that).

Thanks.


Re: [PATCH v6 08/10] submodule: add a helper to check if it is safe to write to .gitmodules

2018-10-06 Thread Junio C Hamano
Stefan Beller  writes:

>>  static int module_config(int argc, const char **argv, const char *prefix)
>>  {
>> +   enum {
>> +   CHECK_WRITEABLE = 1
>> +   } command = 0;
>
> Can we have the default named? Then we would only use states
> from within the enum?

Why?  Do we use a half-intelligent "switch () { case ...: ... }"
checker that would otherwise complain if we handled "case 0" in such
a switch statement, or something like that?

Are we going to gain a lot more enum members, by the way?  At this
point, this looks more like a

unsigned check_writable = 0; /* default is not to check */


to me.


Re: [PATCH] [Outreachy] git/userdiff.c fix regex pattern error

2018-10-06 Thread Junio C Hamano
Ananya Krishna Maram  writes:

>> But it does not need to be escaped, when you specify the regular
>> expression the way we do. And the way we specified it is really the
>> standard when specifying regular expressions in C code, i.e. *without* the
>> suggested backslash.
>
> Aha!. this makes total sense. I was thinking from a general regular expression
> point of view. But I should be thinking from C point of view and how C
> might interpret this newly submitted string.

If you were thinking from a general regex point of view, you would
never have treated '/' as anything special, though.

Historically, some languages (like sed and perl) had an regexp match
operator, which is denoted by enclosing a regexp inside a pair of
slashes.  In these languages, if you use // operator, and if
you want to match slash with the pattern, you somehow need a way to
write an regexp that has slash in it.  E.g. if you want a pattern
that would match 'a' followed by '/' followed by 'b', your regexp
would look like "a/b", but the regexp match operation you would
write in these languages would be like /a\/b/, so that '//'
parser can tell that the second slash is not a slash that signals
the end of the match operator.

And then there is an unnamed misdesigned language that has a
regmatch() function, which takes a string that contains a regular
expression, but somehow requires that string to begin and end with a
slash for no justifiable reason ;-).  If you were thinking about
regexp from that brain-dead languages' point of view, yes, you
should unlearn it and what Dscho gave you would make sense.

C's regexp(3) library does not share such a misdesign and just takes
a regular expression as a string.  You would still need to follow
the quoting rules of C string literals (e.g. write a literal
double-quote or backslash after an escaping backslash), but of
course slash is not special there.



Re: [PATCH 3/8] refs: new ref types to make per-worktree refs visible to all worktrees

2018-10-06 Thread Junio C Hamano
Duy Nguyen  writes:

>> > The main worktree has to be treated specially because well.. it's
>> > special from the beginning. So HEAD from the main worktree is
>> > acccessible via the name "main/HEAD" (we can't use
>> > "worktrees/main/HEAD" because "main" under "worktrees" is not
>> > reserved).
>>
>> I do not quite follow.  So with this, both refs/heads/master and
>> main/refs/heads/master are good names for the master branch (even
>> though the local branch names are not per worktree), because
>> in the main worktree, refs/bisect/bad and main/refs/bisect/bad ought
>> to mean the same thing.
>
> True. I think the ambiguation here is about the main worktree versus a
> secondary worktree that is accidentally named "main". Then suddenly we
> have to worktrees of the same name, and accessing them both via
> worktrees//HEAD will not work, and there is no other way to
> disambiguate them.

So those who have happily been referring 'refs/heads/main/foo' as
'main/foo' now suddenly have to say 'refs/heads/main/foo' instead?

> The rules are not touched. But it looks like everything still works as
> expected (I'm adding tests to verify this)

What I am worried about is that _your_ expectation may not coincide
with the expectations of users, especially with ones with existing
refs that overlap with the namespaces this series suddenly starts
carving out and squatting on.  As long as that won't be a problem, I
think it is OK, even with 'main' not renamed to 'worktree-main' or
somesuch.




[Announce] Git 2.14.5, 2.15.3, 2.16.5, 2.17.2, 2.18.1, and 2.19.1

2018-10-05 Thread Junio C Hamano
These releases fix a security flaw (CVE-2018-17456), which allowed an
attacker to execute arbitrary code by crafting a malicious .gitmodules
file in a project cloned with --recurse-submodules.

When running "git clone --recurse-submodules", Git parses the supplied
.gitmodules file for a URL field and blindly passes it as an argument
to a "git clone" subprocess.  If the URL field is set to a string that
begins with a dash, this "git clone" subprocess interprets the URL as
an option.  This can lead to executing an arbitrary script shipped in
the superproject as the user who ran "git clone".

In addition to fixing the security issue for the user running "clone",
the 2.17.2, 2.18.1 and 2.19.1 releases have an "fsck" check which can
be used to detect such malicious repository content when fetching or
accepting a push. See "transfer.fsckObjects" in git-config(1).

Credit for finding and fixing this vulnerability goes to joernchen
and Jeff King, respectively.

P.S. Folks at Microsoft tried to follow the known exploit recipe on
Git for Windows (but not Cygwin or other Git implementations on
Windows) and found that the recipe (or its variants they can think
of) would not make their system vulnerable.  This is due to the fact
that the type of submodule path require by the known exploit recipe
cannot be created on Windows.  Nonetheless, it is possible we have
missed some exploitation path and users are encouraged to upgrade.


Re: [PATCH v4 2/7] Documentation: add shell guidelines

2018-10-05 Thread Junio C Hamano
Matthew DeVore  writes:

> Add the following guideline to Documentation/CodingGuidelines:
>
>   &&, ||, and | should appear at the end of lines, not the
>   beginning, and the \ line continuation character should be
>   omitted

"should be omitted" sounds as if it is the norm to have such a
character, but it is not.  The text in the actual patch body does a
much better job than this.

Perhaps

Break overlong lines after "&&", "||", and "|", not before
them; that way the command can continue to subsequent lines
without backslash at the end.

> And the following to t/README (since it is specific to writing tests):
>
>   pipes and $(git ...) should be avoided when they swallow exit
>   codes of Git processes

Good.

> Signed-off-by: Matthew DeVore 
> ---
>  Documentation/CodingGuidelines | 18 ++
>  t/README   | 28 
>  2 files changed, 46 insertions(+)
>
> diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
> index 48aa4edfb..72967deb7 100644
> --- a/Documentation/CodingGuidelines
> +++ b/Documentation/CodingGuidelines
> @@ -118,6 +118,24 @@ For shell scripts specifically (not exhaustive):
>   do this
>   fi
>  
> + - If a command sequence joined with && or || or | spans multiple
> +   lines, put each command on a separate line and put && and || and |
> +   operators at the end of each line, rather than the start. This
> +   means you don't need to use \ to join lines, since the above
> +   operators imply the sequence isn't finished.

Correct.  Even though I wonder if we need to say the last sentence,
which is rather obvious, patch is already written and I do not see
much point editing this further.

> + (incorrect)
> + grep blob verify_pack_result \
> + | awk -f print_1.awk \
> + | sort >actual &&
> + ...
> +
> + (correct)
> + grep blob verify_pack_result |
> + awk -f print_1.awk |
> + sort >actual &&
> + ...
> +
>   - We prefer "test" over "[ ... ]".
>  
>   - We do not write the noiseword "function" in front of shell
> diff --git a/t/README b/t/README
> index 85024aba6..9a71d5732 100644
> --- a/t/README
> +++ b/t/README
> @@ -466,6 +466,34 @@ And here are the "don'ts:"
> platform commands; just use '! cmd'.  We are not in the business
> of verifying that the world given to us sanely works.
>  
> + - Don't use Git upstream in the non-final position in a piped chain, as
> +   in:

"upstream in the non-final position" is a bit redundant, isn't it?

  - Don't feed the output of 'git' to a pipe, as in:

> +
> + git -C repo ls-files |
> + xargs -n 1 basename |
> + grep foo
> +
> +   which will discard git's exit code and may mask a crash. In the
> +   above example, all exit codes are ignored except grep's.

Good.

> +   Instead, write the output of that command to a temporary
> +   file with ">" or assign it to a variable with "x=$(git ...)" rather
> +   than pipe it.
> +
> + - Don't use command substitution in a way that discards git's exit
> +   code. When assigning to a variable, the exit code is not discarded,
> +   e.g.:
> +
> + x=$(git cat-file -p $sha) &&
> + ...
> +
> +   is OK because a crash in "git cat-file" will cause the "&&" chain
> +   to fail, but:
> +
> + test "refs/heads/foo" = "$(git symbolic-ref HEAD)"
> +
> +   is not OK and a crash in git could go undetected.

Good.

>   - Don't use perl without spelling it as "$PERL_PATH". This is to help
> our friends on Windows where the platform Perl often adds CR before
> the end of line, and they bundle Git with a version of Perl that


Re: [PATCH v4 5/7] tests: don't swallow Git errors upstream of pipes

2018-10-05 Thread Junio C Hamano
Matthew DeVore  writes:

> Some pipes in tests lose the exit code of git processes, which can mask
> unexpected behavior like crashes. Split these pipes up so that git
> commands are only at the end of pipes rather than the beginning or
> middle.
>
> The violations fixed in this patch were found in the process of fixing
> pipe placement in a prior patch.

Hopefully this is not a blind mechanical patch, as introduction of
unexpected temporary files in the working tree could interfere with
later tests (e.g. they may expect exact set of untracked files, and
these new temporary files would b unexpected additions).

Thanks.


Re: Is there some script to find un-delta-able objects?

2018-10-05 Thread Junio C Hamano
Jeff King  writes:

> On Fri, Oct 05, 2018 at 04:20:27PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> I.e. something to generate the .gitattributes file using this format:
>> 
>> https://git-scm.com/docs/gitattributes#_packing_objects
>> 
>> Some stuff is obvious, like "*.gpg binary -delta", but I'm wondering if
>> there's some repo scanner utility to spew this out for a given repo.
>
> I'm not sure what you mean by "un-delta-able" objects. Do you mean ones
> where we're not likely to find a delta? Or ones where Git will not try
> to look for a delta?
>
> If the latter, I think the only rules are the "-delta" attribute and the
> object size. You should be able to use git-check-attr and "git-cat-file"
> to get that info.
>
> If the former, I don't know how you would know. We can only report on
> what isn't a delta _yet_.

I am reasonably sure that the question is about solving the former
so that "-delta" attribute is set appropriately.

Iniitially, I thought that it is likely an undeltifiable object has
higher randomness than deltifiable ones and that can be exploited,
but if you have such a highly random blob A (and no other object
like it) in the repository and then later acquire another blob B
that happens to share most of the data with A, then A and B by
themselves will pass the "highly random" test but still yet each can
be expressed as a delta derived from the other.  So your "what isn't
a delta yet" is a reasonable assessment of what mechanically can be
known.  

Knowledge/heuristic like "No two '*.gpg' files are expected to be
alike" needs something more than the randomness of individual files,
I guess.



Re: [PATCH v3 1/3] help: redirect to aliased commands for "git cmd --help"

2018-10-05 Thread Junio C Hamano
Rasmus Villemoes  writes:

>> It also is strange to count from (0); if the patchset is rerolled
>> again, I'd prefer to see these start counting from (1), in which
>> case this item will become (3).
>
> If you prefer, I can send a v4.

Sure, if you prefer, you can send a v4 for me to look at and queue.

Thanks.


Re: [PATCH] branch: trivial style fix

2018-10-05 Thread Junio C Hamano
Tao Qingyun  writes:

> ---
>  builtin/branch.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/branch.c b/builtin/branch.c
> index c396c41533..52aad0f602 100644
> --- a/builtin/branch.c
> +++ b/builtin/branch.c
> @@ -222,10 +222,11 @@ static int delete_branches(int argc, const char **argv, 
> int force, int kinds,
>   if (!head_rev)
>   die(_("Couldn't look up commit object for HEAD"));
>   }
> - for (i = 0; i < argc; i++, strbuf_reset()) {
> + for (i = 0; i < argc; i++) {
>   char *target = NULL;
>   int flags = 0;
>  
> + strbuf_reset();

This is not "trivial" nor "style fix", though.

It is not "trivial" because it also changes the behaviour.  Instead
of resetting the strbuf each time after the loop body runs, the new
code resets it before the loop body, even for the 0-th iteration
where the strbuf hasn't even been used at all.  It is not a "style
fix" because we do not have a particular reason to avoid using the
comma operator to perform two simple expressions with side effects
inside a single expression.

>   strbuf_branchname(, argv[i], allowed_interpret);
>   free(name);
>   name = mkpathdup(fmt, bname.buf);
> @@ -716,8 +717,7 @@ int cmd_branch(int argc, const char **argv, const char 
> *prefix)
>   print_columns(, colopts, NULL);
>   string_list_clear(, 0);
>   return 0;
> - }
> - else if (edit_description) {
> + } else if (edit_description) {

This one is a style fix that is trivial.  It does not change the
semantics a bit, and we do prefer "else if" to be on the same line
as the closing "}" of the earlier "if" that opened the matching "{".

>   const char *branch_name;
>   struct strbuf branch_ref = STRBUF_INIT;


Re: [PATCH v3 1/3] help: redirect to aliased commands for "git cmd --help"

2018-10-05 Thread Junio C Hamano
Rasmus Villemoes  writes:

> As discussed in the thread for v1 of this patch [1] [2], this changes the
> rules for "git foo --help" when foo is an alias.
>
> (0) When invoked as "git help foo", we continue to print the "foo is
> aliased to bar" message and nothing else.
>
> (1) If foo is an alias for a shell command, print "foo is aliased to
> !bar" as usual.
>
> (2) Otherwise, break the alias string into words, and pretend that "git
> word0 --help" was called.
>
> At least for me, getting the man page for git-cherry-pick directly with
> "git cp --help" is more useful (and how I expect an alias to behave)
> than the short "is aliased to" notice. It is also consistent with
> "--help" generally providing more comprehensive help than "-h".
>
> I believe that printing the "is aliased to" message also in case (2) has
> value: Depending on pager setup, or if the user has help.format=web, the
> message is still present immediately above the prompt when the user
> quits the pager/returns to the terminal. That serves as an explanation
> for why one was redirected to "man git-cherry-pick" from "git cp
> --help", and if cp is actually 'cherry-pick -n', it reminds the user
> that using cp has some flag implicitly set before firing off the next
> command.
>
> It also provides some useful info in case we end up erroring out, either
> in the "bad alias string" check, or in the "No manual entry for gitbar"
> case.

These two paragraphs were misleading, because they sounded as if you
were lamenting that you were somehow forbidden from doing so even
though you believe doing it is the right thing.

But that is not what is happening.  I think we should update the (2)
above to mention what you actually do in the code, perhaps like so:

(2) Otherwise, show "foo is aliased to bar" to the standard
error stream, and then break the alias string into words and
pretend as if "git word[0] --help" were called.  The former
is necessary to help users when 'foo' is aliased to a
command with an option (e.g. "[alias] cp = cherry-pick -n"),
and hopefully remain visible when help.format=web is used,
"git bar --help" errors out, or the manpage of "git bar" is
short enough. It may not help if the help shows manpage on
the terminal as usual, though.

As we explain why we show the alias information before going to the
manpage in the item itself and a brief discussion of pros-and-cons,
we can safely lose the "I believe..."  paragraph, which looks
somewhat out of place in a log message.

It also is strange to count from (0); if the patchset is rerolled
again, I'd prefer to see these start counting from (1), in which
case this item will become (3).

> [1] https://public-inbox.org/git/20180926102636.30691-1...@rasmusvillemoes.dk/
> [2] https://public-inbox.org/git/20180926184914.gc30...@sigill.intra.peff.net/
>
> Signed-off-by: Rasmus Villemoes 
> ---
>  builtin/help.c | 34 +++---
>  1 file changed, 31 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/help.c b/builtin/help.c
> index 8d4f6dd301..e0e3fe62e9 100644
> --- a/builtin/help.c
> +++ b/builtin/help.c
> @@ -415,9 +415,37 @@ static const char *check_git_cmd(const char* cmd)
>  
>   alias = alias_lookup(cmd);
>   if (alias) {
> - printf_ln(_("'%s' is aliased to '%s'"), cmd, alias);
> - free(alias);
> - exit(0);
> + const char **argv;
> + int count;
> +
> + /*
> +  * handle_builtin() in git.c rewrites "git cmd --help"
> +  * to "git help --exclude-guides cmd", so we can use
> +  * exclude_guides to distinguish "git cmd --help" from
> +  * "git help cmd". In the latter case, or if cmd is an
> +  * alias for a shell command, just print the alias
> +  * definition.
> +  */
> + if (!exclude_guides || alias[0] == '!') {
> + printf_ln(_("'%s' is aliased to '%s'"), cmd, alias);
> + free(alias);
> + exit(0);
> + }
> + /*
> +  * Otherwise, we pretend that the command was "git
> +  * word0 --help". We use split_cmdline() to get the
> +  * first word of the alias, to ensure that we use the
> +  * same rules as when the alias is actually
> +  * used. split_cmdline() modifies alias in-place.
> +  */
> + fprintf_ln(stderr, _("'%s' is aliased to '%s'"), cmd, alias);
> + count = split_cmdline(alias, );
> + if (count < 0)
> + die(_("bad alias.%s string: %s"), cmd,
> + split_cmdline_strerror(count));
> + free(argv);
> + UNLEAK(alias);
> + return alias;
>   }
>  
>   if (exclude_guides)


Re: [PATCH] grep: provide a noop --recursive option

2018-10-05 Thread Junio C Hamano
Stefan Beller  writes:

> git-grep is always file/tree recursive, but there is --recurse-submodules
> which is off by default. Instead of providing a short alias to a noop,
> we could use -r for submodules. (And if you happen to have no
> submodules, this is a noop for you)

I am not sure if it is an overall win for those who do have and use
submodules to easily be able to go recursive with a short-and-sweet
'r', or even they want to work inside one project at a time most of
the time.  If the latter, then using 'r' for recurse-submodules is
going to be a mistake (besides, other commands that have 'recursive'
typically use 'r' for its shorthand,and 'r' does not stand for
'recurse-submodules' for them).


Re: [PATCH] grep: provide a noop --recursive option

2018-10-05 Thread Junio C Hamano
René Scharfe  writes:

>
> Recognize -r and --recursive as synonyms for --max-depth=-1 for
> compatibility with GNU grep; it's still the default for git grep.
>
> This also adds --no-recursive as synonym for --max-depth=0 for free,
> which is welcome for completeness and consistency.
>
> Fix the description for --max-depth, while we're at it -- negative
> values other than -1 actually disable recursion, i.e. they are
> equivalent to --max-depth=0.
> ...
> diff --git a/builtin/grep.c b/builtin/grep.c
> index 601f801158..f6e127f0bc 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -811,6 +811,8 @@ int cmd_grep(int argc, const char **argv, const char 
> *prefix)
>   GREP_BINARY_NOMATCH),
>   OPT_BOOL(0, "textconv", _textconv,
>N_("process binary files with textconv filters")),
> + OPT_SET_INT('r', "recursive", _depth,
> + N_("search in subdirectories (default)"), -1),

Wow.  

I didn't think of this trick to let OPT_SET_INT() to grok --no-* and
set the variable to 0.  Being able to do this without a custom
callback is certainly very nice.

The patch looks good.

Thanks.


Re: [PATCH v4 7/7] tests: order arguments to git-rev-list properly

2018-10-05 Thread Junio C Hamano
Matthew DeVore  writes:

> I screwed up by putting the positional argument *after* the
> redirection. Sorry for the mix-up. This is interestingly syntactically
> valid, though bad stylistically. Here is an inter-diff:

Thanks for being careful.  Except for a rather idiomatic 

echo >&2 message ...

which has redirection at the beginning to emphasize that the output
goes to the standard error stream, I do agree with your "stylistic"
choice of keeping the redirection at the end.

> diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
> index eeedd1623..6ff614692 100755
> --- a/t/t5616-partial-clone.sh
> +++ b/t/t5616-partial-clone.sh
> @@ -35,7 +35,7 @@ test_expect_success 'setup bare clone for server' '
>  test_expect_success 'do partial clone 1' '
>  git clone --no-checkout --filter=blob:none
> "file://$(pwd)/srv.bare" pc1 &&
>
> -git -C pc1 rev-list --quiet --objects --missing=print >revs HEAD &&
> +git -C pc1 rev-list --quiet --objects --missing=print HEAD >revs &&
>  awk -f print_1.awk revs |
>  sed "s/?//" |
>  sort >observed.oids &&
> @@ -93,8 +93,8 @@ test_expect_success 'verify diff causes dynamic
> object fetch' '
>  test_expect_success 'verify blame causes dynamic object fetch' '
>  git -C pc1 blame origin/master -- file.1.txt >observed.blame &&
>  test_cmp expect.blame observed.blame &&
> -git -C pc1 rev-list --quiet --objects --missing=print >observed \
> -master..origin/master &&
> +git -C pc1 rev-list --quiet --objects --missing=print \
> +master..origin/master >observed &&
>  test_line_count = 0 observed
>  '


Re: [PATCH 1/1] rebase -i: introduce the 'break' command

2018-10-05 Thread Junio C Hamano
"Johannes Schindelin via GitGitGadget" 
writes:

> From: Johannes Schindelin 
>
> The 'edit' command can be used to cherry-pick a commit and then
> immediately drop out of the interactive rebase, with exit code 0, to let
> the user amend the commit, or test it, or look around.
>
> Sometimes this functionality would come in handy *without*
> cherry-picking a commit, e.g. to interrupt the interactive rebase even
> before cherry-picking a commit, or immediately after an 'exec' or a
> 'merge'.
>
> This commit introduces that functionality, as the spanking new 'break' 
> command.
>
> Suggested-by: Stefan Beller 
> Signed-off-by: Johannes Schindelin 
> ---

If one wants to emulate this with the versions of Git that are
currently deployed, would it be sufficient to insert "exec false"
instead of "break"?

The reason I am asking is *not* to imply that we do not need this
new feature.  It is because I vaguely recall seeing a request to add
'pause' to the insn set and "exec false" was mentioned as a more
general alternative long time ago.  I am trying to see if this is a
recurring request/wish, because it would reinforce that this new
feature would be a good addition if that is the case.

I suspect that "exec false" would give a message that looks like a
complaint ("'false' failed so we are giving you control back to fix
things" or something like that), and having a dedicated way to pause
the execution without alarming the user is a good idea.

I think the earlier request asked for 'pause' (I didn't dig the list
archive very carefully, though), and 'stop' may also be a possible
verb, but I tend to agree with this patch that 'break' is probably
the best choice, simply because it begins with 'b' in the
abbreviated form, a letter that is not yet used by others (unlike
'pause' or 'stop' that would want 'p' and 's' that are already
taken)..

Here is a tangent, but I think the description of "-x " in "git
rebase --continue" should mention that a failing command would
interrupt the sequencer.  That fact about "exec" command is given
much later in the last part of the "interactive mode" section of the
manual, so technically our docs are not being incomplete, but the
current description is not helpful to those who are looking for
substring "exec" from the beginning of the documentation to find out
if the exit status of the command affects the way commits are
replayed (which is what I was doing when imagining how users would
emulate this feature with deployed versions of Git).

Perhaps something as simple as this...

 Documentation/git-rebase.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
index 0e20a66e73..0fc5a851b5 100644
--- a/Documentation/git-rebase.txt
+++ b/Documentation/git-rebase.txt
@@ -420,7 +420,8 @@ idea unless you know what you are doing (see BUGS below).
 --exec ::
Append "exec " after each line creating a commit in the
final history.  will be interpreted as one or more shell
-   commands.
+   commands, and interrupts the rebase session when it exits with
+   non-zero status.
 +
 You may execute several commands by either using one instance of `--exec`
 with several commands:


Also, it seems that this has some interaction with the topics in
flight; the added test does pass when queued on top of 'master', but
fails when merged to 'pu'.  I didn't look into the details as I am
not fully online yet.

Thanks.


Re: [PATCH] doc: fix a typo and clarify a sentence

2018-10-05 Thread Junio C Hamano
Mihir Mehta  writes:

> diff --git a/Documentation/git-diff.txt b/Documentation/git-diff.txt
> index b180f1fa5..6173f569e 100644
> --- a/Documentation/git-diff.txt
> +++ b/Documentation/git-diff.txt
> @@ -72,8 +72,9 @@ two blob objects, or changes between two files on disk.
>   This form is to view the changes on the branch containing
>   and up to the second , starting at a common ancestor
>   of both .  "git diff A\...B" is equivalent to
> - "git diff $(git-merge-base A B) B".  You can omit any one
> - of , which has the same effect as using HEAD instead.
> + "git diff $(git merge-base A B) B".  You can omit any one

"git merge-base" is a more modern way to spell "git-merge-base" and
we have been trying to update the mention of the latter in the docs
to the former.  Thanks for doing this.

> + of the two instances of , which has the same effect as

The paragraph is about ... three-dot notation.  I
suspect that you wanted to say ... and ... are
allowed, implying that a bare ... is not allowed and does not mean
the same thing as what HEAD...HEAD means.  But that is not the case.
Asking "git diff HEAD...HEAD" by omitting both may not give very
interesting output (it always becomes a no-op), but nevertheless it
is a valid thing to ask (iow "git diff $commit1...$commit2" is what
you can safely write without having to worry about one or both going
empty string).  So I'd rather not to see this change in this form.
It is an incomplete attempt to discourage use of ...
but without giving enough justification.

Side note.  I am not recommending to do so, but
"discouragement with enough justification" would look like
this.

You can omit  on any side of the three dots, which
has the same effect as using HEAD instead.  Omitting both
and leaving only three dots is not an error but that merely
specifies a set of commits that are and are not reachable
from HEAD at the same time, which by definition is an empty
set, hence it is not very useful.

> + using HEAD in its place.

> +++ b/Documentation/howto/update-hook-example.txt
> @@ -80,7 +80,7 @@ case "$1" in
>info "The branch '$1' is new..."
>  else
># updating -- make sure it is a fast-forward
> -  mb=$(git-merge-base "$2" "$3")
> +  mb=$(git merge-base "$2" "$3")

I strongly suspect that inside update-hook, the original still
should work (iow, $GIT_EXEC_PATH should already have been prepended
to $PATH before a hoook is called).  But the updated form should
also work, and it is the form we humans need to type, so let's take
this change.

Thanks.

>case "$mb,$2" in
>  "$2,$mb") info "Update is fast-forward" ;;
>   *)noff=y; info "This is not a fast-forward update.";;


Re: [PATCH v3 0/6] Fix the racy split index problem

2018-10-05 Thread Junio C Hamano
SZEDER Gábor  writes:

> On Fri, Sep 28, 2018 at 06:24:53PM +0200, SZEDER Gábor wrote:
>> Interdiff:
>> diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
>> index c8acbc50d0..f053bf83eb 100755
>> --- a/t/t1700-split-index.sh
>> +++ b/t/t1700-split-index.sh
>> @@ -6,6 +6,9 @@ test_description='split index mode tests'
>>  
>>  # We need total control of index splitting here
>>  sane_unset GIT_TEST_SPLIT_INDEX
>
> The conflict resolution around here in 3f725b07d3 (Merge branch
> 'sg/split-index-racefix' into jch, 2018-09-29) accidentally removed
> the above line, ...

Yeah, I see it in "git show --cc".  Will fix the rerere database
entry to prevent the mismerge from recurring..

Sorry, and thanks for helping me correcting the mismerge.



Re: [PATCH v4 1/7] t/README: reformat Do, Don't, Keep in mind lists

2018-10-05 Thread Junio C Hamano
Matthew DeVore  writes:

> diff --git a/t/README b/t/README
> index 9028b47d9..85024aba6 100644
> --- a/t/README
> +++ b/t/README
> @@ -393,13 +393,13 @@ This test harness library does the following things:
> consistently when command line arguments --verbose (or -v),
> --debug (or -d), and --immediate (or -i) is given.
>  
> -Do's, don'ts & things to keep in mind
> +Do's & don'ts
>  -

We may not format this with AsciiDoc, but please shorten the
underline so that it aligns with the line above that it applies to.

>  Here are a few examples of things you probably should and shouldn't do
>  when writing tests.


Re: [PATCH v2] gpg-interface.c: detect and reject multiple signatures on commits

2018-10-05 Thread Junio C Hamano
Michał Górny  writes:

> On Fri, 2018-08-17 at 09:34 +0200, Michał Górny wrote:
>> GnuPG supports creating signatures consisting of multiple signature
>> packets.  If such a signature is verified, it outputs all the status
>> messages for each signature separately.  However, git currently does not
>> account for such scenario and gets terribly confused over getting
>> multiple *SIG statuses.
>> 
>> For example, if a malicious party alters a signed commit and appends
>> a new untrusted signature, git is going to ignore the original bad
>> signature and report untrusted commit instead.  However, %GK and %GS
>> format strings may still expand to the data corresponding
>> to the original signature, potentially tricking the scripts into
>> trusting the malicious commit.
>> 
>> Given that the use of multiple signatures is quite rare, git does not
>> support creating them without jumping through a few hoops, and finally
>> supporting them properly would require extensive API improvement, it
>> seems reasonable to just reject them at the moment.
>> 
>
> Gentle ping.

I think among the three issues raised in the review of v1 [*1*], one
of them remain unaddressed.  Other than that the addition relative
to v2 looks reasonable (but I only skimmed the patch).

[Reference] *1* 
https://public-inbox.org/git/xmqq1saxc5gu@gitster-ct.c.googlers.com/ 

Relevant part reproduced here.

>>> /* Iterate over all search strings */
>>> for (i = 0; i < ARRAY_SIZE(sigcheck_gpg_status); i++) {
>>> @@ -50,6 +52,10 @@ static void parse_gpg_output(struct signature_check 
>>> *sigc)
>>> continue;
>>> found += strlen(sigcheck_gpg_status[i].check);
>>> ...
>>> +   if (had_status > 1) {
>>> +   sigc->result = 'E';
>>> +   /* Clear partial data to avoid confusion */
>>> +   if (sigc->signer)
>>> +   FREE_AND_NULL(sigc->signer);
>>> +   if (sigc->key)
>>> +   FREE_AND_NULL(sigc->key);
>>> +   }
>>
>> Makes sense to me.
>
> I was wondering if we have to revamp the loop altogether.  The
> current code runs through the list of all the possible "status"
> lines, and find the first occurrence for each type in the buffer
> that has GPG output.  Second and subsequent occurrence of the same
> type, if existed, will not be noticed by the original loop
> structure, and this patch does not change it, even though the topic
> of the patch is about rejecting the signature block with elements
> taken from multiple signatures.

Which still smells to me that it points out a grave (made grave by
what the patch claims to address) issue in the implementation of v1;
did v2 get substantially updated to address the concern?

> One way to fix it may be to keep
> the current loop structure to go over the sigcheck_gpg_status[],
> but make the logic inside the loop into an inner loop that finds all
> occurrences of the same type, instead of stopping after finding the
> first instance.  But once we go to that length, I suspect that it
> may be cleaner to iterate over the lines in the buffer, checking
> each line if it matches one of the recognized "[GNUPG:] FOOSIG"
> lines and acting on it (while ignoring unrecognized lines).


P.S. I'd be either offline or otherwise occupied until the next
week, so there is no need to hastily prepare an updated patch
series.

Thanks.



Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-05 Thread Junio C Hamano
SZEDER Gábor  writes:

>> git-gc - Cleanup unnecessary files and optimize the local repository
>> 
>> Creating these indexes like the commit-graph falls under "optimize the
>> local repository",
>
> But it doesn't fall under "cleanup unnecessary files", which the
> commit-graph file is, since, strictly speaking, it's purely
> optimization.

I won't be actively engaged in this discussion soon, but I must say
that "git gc" doing "garbage collection" is merely an implementation
detail of optimizing the repository for further use.  And from that
point of view, what needs to be updated is the synopsis of the
git-gc doc.  It states "X and Y" above, but it actually is "Y by
doing X and other things".

I understand your "by definition there is no garbage immediately
after clone" position, and also I would understand if you find it
(perhaps philosophically) disturbing that "git clone" may give users
a suboptimal repository that immediately needs optimizing [*1*].

But that bridge was crossed long time ago ever since pack transfer
was invented.  The data source sends only the pack data stream, and
the receiving end is responsible for spending cycles to build .idx
file.  Theoretically, .pack should be all that is needed---you
should be able to locate any necessary object by parsing the .pack
file every time you open it, and .idx is mere optimization.  You can
think of the .midx and graph files the same way.

I'd consider it a growing pain that these two recent inventions were
and are still built as a totally optional and separate features,
requiring completely separate full enumeration of objects in the
repository that needs to happen anyway when we build .idx out of the
received .pack.

I would not be surprised by a future in which the initial index-pack
that is responsible for receiving the incoming pack stream and
storing that in .pack file(s) while creating corresponding .idx
file(s) becomes also responsible for building .midx and graph files
in the same pass, or at least smaller number of passes.  Once we
gain experience and confidence with these new auxiliary files, that
ought to happen naturally.  And at that point, we won't be having
this discussion---we'd all happily run index-pack to receive the
pack data, because that is pretty much the fundamental requirement
to make use of the data.

[Footnote]

*1* Even without considering these recent invention of auxiliary
files, cloning from a sloppily packed server whose primary focus
is to avoid spending cycles by not computing better deltas will
give the cloner a suboptimal repository.  If we truly want to
have an optimized repository ready to be used after cloning, we
should run an equivalent of "repack -a -d -f" immediately after
"git clone".



Re: [PATCH] help: allow redirecting to help for aliased command

2018-09-29 Thread Junio C Hamano
Jeff King  writes:

> And I think this has to be stderr. We're polluting the output of the
> aliased command with our extra message, so we have two choices:
>
>   1. Pollute stderr, and risk copious stdout (or a pager) scrolling it
>  off the screen.
>
>   2. Pollute stdout, at which point our message may be confused as part
>  of the actual output of the command (and that may not even be
>  immediately noticed if it is passed through a shell pipeline or
>  into a file).
>
> Choice (2) seems like a regression to me. Choice (1) is unfortunate in
> some cases, but is no worse than today's behavior.

I think the output of "git foo -h" changing (i.e. has "aliased
to..."  message in front) is about the same degree of regression as
"git foo --help" no longer giving "aliased to..." information
anywhere, though.

>> Even the first two "better" cases share the same glitch if the "foo
>> ...
>> thing to do is to
>> 
>>  $ git unknown-command -h >&2 | less
>> 
>> And at that point, it does not matter which between the standard
>> output and the standard error streams we write "unknown-command is
>> aliased to ...".
>
> Yeah. I think if "git foo -h" produces a bunch of output you didn't
> expect, then "git help foo" or "git foo --help" may be the next thing
> you reach for. That's not so different than running the command even
> without any aliases involved.

Hmmm.  With the "teach 'git foo -h' to output 'foo is aliased to
bar' to the standard error before running 'git bar -h'", plus "'git
foo --help' now goes straight to 'git bar --help'", "git foo --help"
no longer tells us that foo is aliased to bar.  Presumably "git help
foo" will still give "foo is bar" and stop?


Re: [PATCH] fetch: replace string-list used as a look-up table with a hashmap

2018-09-29 Thread Junio C Hamano
Jeff King  writes:

> So yet another alternative here is to just define a single hashmap that
> stores void pointers. That also throws away some type safety, but is
> maybe conceptually simpler. I dunno.

I was tempted to try that route, which would essentially give us "if
you are abusing string-list as a look-up table, and if either you do
not need to iterate over all the elements, or you do not care about
the order in which such an interation is made, then use this
instead" API that can more easily substitute string-list without
boilerplate.

But I am not sure if that is worth it.  Perhaps we may end up doing
so when we find the second thing to move from string-list to hashmap,
but not with this patch (and yes, there is another string-list user
in the same source file, which I think can reuse what I added with
this patch).

In any case, I expect to be offline more than half of the next week,
so before I forget, here is an updated patch.  Two changes since the
version posted earlier are:

 - remote_refs_list (not remote_refs which is a hashmap) is passed
   to string_list_clear() at the end.  This is a fix for an outright
   bug Ramsay noticed.

 - The callback function now takes (const void *) and casts its
   parameters to correct type before they are used, instead of
   casting the pointer to a function whose signature is wrong in the
   call to hashmap_init().  This was noticed by Stefan and I agree
   that doing it this way makes more sense.

-- >8 --
Subject: [PATCH v2] fetch: replace string-list used as a look-up table with a 
hashmap

In find_non_local_tags() helper function (used to implement the
"follow tags"), we use string_list_has_string() on two string lists
as a way to see if a refname has already been processed, etc.

All this code predates more modern in-core lookup API like hashmap;
replace them with two hashmaps and one string list---the hashmaps
are used for look-ups and the string list is to keep the order of
items in the returned result stable (which is the only single thing
hashmap does worse than lookups on string-list).

Signed-off-by: Junio C Hamano 
---
 builtin/fetch.c | 121 +---
 1 file changed, 94 insertions(+), 27 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 0696abfc2a..489531ec93 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -246,16 +246,76 @@ static int will_fetch(struct ref **head, const unsigned 
char *sha1)
return 0;
 }
 
+struct refname_hash_entry {
+   struct hashmap_entry ent; /* must be the first member */
+   struct object_id oid;
+   char refname[FLEX_ARRAY];
+};
+
+static int refname_hash_entry_cmp(const void *hashmap_cmp_fn_data,
+ const void *e1_,
+ const void *e2_,
+ const void *keydata)
+{
+   const struct refname_hash_entry *e1 = e1_;
+   const struct refname_hash_entry *e2 = e2_;
+
+   return strcmp(e1->refname, keydata ? keydata : e2->refname);
+}
+
+static struct refname_hash_entry *refname_hash_add(struct hashmap *map,
+  const char *refname,
+  const struct object_id *oid)
+{
+   struct refname_hash_entry *ent;
+   size_t len = strlen(refname);
+
+   FLEX_ALLOC_MEM(ent, refname, refname, len);
+   hashmap_entry_init(ent, memhash(refname, len));
+   oidcpy(>oid, oid);
+   hashmap_add(map, ent);
+   return ent;
+}
+
+static int add_one_refname(const char *refname,
+  const struct object_id *oid,
+  int flag, void *cbdata)
+{
+   struct hashmap *refname_map = cbdata;
+
+   (void) refname_hash_add(refname_map, refname, oid);
+   return 0;
+}
+
+static void refname_hash_init(struct hashmap *map)
+{
+   hashmap_init(map, refname_hash_entry_cmp, NULL, 0);
+}
+
+static int refname_hash_exists(struct hashmap *map, const char *refname)
+{
+   struct hashmap_entry key;
+   size_t len = strlen(refname);
+   hashmap_entry_init(, memhash(refname, len));
+
+   return !!hashmap_get(map, , refname);
+}
+
 static void find_non_local_tags(const struct ref *refs,
struct ref **head,
struct ref ***tail)
 {
-   struct string_list existing_refs = STRING_LIST_INIT_DUP;
-   struct string_list remote_refs = STRING_LIST_INIT_NODUP;
+   struct hashmap existing_refs;
+   struct hashmap remote_refs;
+   struct string_list remote_refs_list = STRING_LIST_INIT_NODUP;
+   struct string_list_item *remote_ref_item;
const struct ref *ref;
-   struct string_list_item *item = NULL;
+   struct refname_hash_entry *item = NULL;
 
-   for_each_ref(add_existing, _refs);
+   refname_hash_init(_refs);
+   refname_hash_init(_refs);
+
+ 

Re: Git Evolve

2018-09-29 Thread Junio C Hamano
Stefan Xenos  writes:

> What is the evolve command?
> ...
> - Systems like gerrit would no longer need to rely on "change-id" tags
> in commit comments to associate commits with the change that they
> edit, since git itself would have that information.
> ...
> Is anyone else interested in this? Please email me directly or on this
> list. Let's chat: I want to make sure that whatever we come up with is
> at least as good as any similar technology that has come before.

As you listed in the related technologies section, I think the
underlying machinery that supports "rebase -i", especially with the
recent addition of redoing the existing merges (i.e. "rebase -i
-r"), may be enough to rewrite the histories that were built on top
of a commit that has been obsoleted by amending.

I would imagine that the main design effort you would need to make
is to figure out a good way to

 (1) keep track of which commits are obsoleted by which other ones
 [*1*], and

 (2) to figure out what histories are still to be rebuilt in what
 order on top of what commit efficiently.

Once these are done, you should be able to write out the sequence of
instructions to feed the same sequencer machinery used by the
"rebase -i" command.

[Side note]

*1* It is very desirable to keep track of the evolution of a change
without polluting the commit object with things like Change-Id:
and other cruft, either in the body or in the header.  If we
lose the Change-Id: footer without adding any new cruft in the
commit object header, that would be a great success.  It would
be a failure if we end up touching the object header.






Re: [PATCH] fetch-pack: approximate no_dependents with filter

2018-09-29 Thread Junio C Hamano
Jonathan Tan  writes:

> This was prompted by a user at $DAY_JOB who had a partial clone
> excluding trees, and had a workflow that only required tree objects (and
> not blobs).
>
> This will hopefully make partial clones excluding trees (with the
> "tree:0" filter) a bit better, in that if an operation requires only
> trees to be inspected, the required download is much smaller.

This seems to break 5520 and 5616 when merged to 'pu'.  

It seems that merging master to md/filter-trees and then applying
this is sufficient to break 5616.


Re: [PATCH] Improvement to only call Git Credential Helper once

2018-09-29 Thread Junio C Hamano
Jeff King  writes:

> Wow, what's old is new again. Here's more or less the same patch from
> 2012:
>
>   https://public-inbox.org/git/20120407033417.ga13...@sigill.intra.peff.net/
>
> Unfortunately, some people seem to rely on this multi-helper behavior. I
> recommend reading the whole thread, as there are some interesting bits
> in it (that I had always meant to return to, but somehow 6 years went
> by).

After reading that thread again, my version of summary is that

 - storing the credential obtained from a helper back to the same
   helper may be pointless at best and may incur end-user
   interaction (e.g. asking for symmetric encryption key before the
   data hits the disk), but it can be used as a "we used what you
   gave us successfully" ping-back signal.  We may not have designed
   approve() to do "store" back to the same helper by default and
   instead to do so only when asked by the helper, if we knew
   better.  But an unconditional change of behaviour will break
   existing users and helpers.

 - storing the credential obtained from a helper to a different
   helper may have security implications, and we may not have
   designed approve() to do "store" by default to all helpers if we
   knew better.  But an unconditional change of behaviour will break
   existing users and helpers.

Assuming that the above summary is accurate, I think the former is
easier to solve.  In the ideal end game state, we would also have
"accepted" in the protocol and call the helper that gave us the
accepted credential material with an earlier "get" (if the helper is
updated to understand "accepted").  The "accepted" may not even need
to receive the credential material as the payload.

The latter is trickier, as there are design considerations.

 - We may want to allow the helper that gives the credential back
   from the outside world upon "get" request to say "you can 'store'
   this to other helpers of this kind but not of that kind".  If we
   want to do so, we'd need to extend what is returned from the
   helper upon "get" (e.g. "get2" will give more than what "get"
   does back).

 - We may want to allow the helper that receives the credential
   material from others to behave differently depending on where it
   came from, and what other helpers done to it (e.g. even a new
   credential the end user just typed may not want to go to all
   helpers; an on-disk helper may feel "I'd take it if the only
   other helpers that responded to 'store' are the transient
   'in-core' kind, but otherwise I'd ignore").  I am not offhand
   sure what kind of flexibility and protocol extension is
   necessary.

 - We may want to be able to override the preference the helper
   makes in the above two.  The user may want to say "Only use this
   on-disk helper as a read-only source and never call 'store' on it
   to add new entries (or update an existing one)."



Re: [PATCH] fetch: fix compilation warning

2018-09-29 Thread Junio C Hamano
Ramsay Jones  writes:

> You probably already know, but I had to add this on top of the 'pu'
> branch to get a clean compile tonight (your 'jc/war-on-string-list'
> branch).

It was not just about squelching a warning but simply broken code
that deserved to be warned/errored.  I think what we have in 'pu'
now is already fixed.

Thanks.

>   }
>   hashmap_free(_refs, 1);
> - string_list_clear(_refs, 0);
> + string_list_clear(_refs_list, 0);
>  }
>  
>  static struct ref *get_ref_map(struct remote *remote,


Re: [PATCH] fetch: replace string-list used as a look-up table with a hashmap

2018-09-29 Thread Junio C Hamano
Jeff King  writes:

> On Wed, Sep 26, 2018 at 03:59:08PM -0700, Stefan Beller wrote:
>
>> > +struct refname_hash_entry {
>> > +   struct hashmap_entry ent; /* must be the first member */
>> 
>> $ git grep "struct hashmap_entry" reveals that we have another
>> convention that we follow but not document, which is to stress
>> the importance of putting the hashmap_entry first. ;-)
>
> One thing I've liked about the list.h implementation is that you can
> store the list pointer anywhere in the struct, or even have the same
> struct in multiple lists.
>
> The only funny thing is that you have to "dereference" the iterator like
> this:
>
>   struct list_head *pos;
>   struct actual_thing *item;
>   ...
>   item = list_entry(pos, struct actual_thing, list_member);
>
> which is a minor pain, but it's reasonably hard to get it wrong.
>
> I wonder if we could do the same here. The hashmap would only ever see
> the "struct hashmap_entry", and then the caller would convert that back
> to the actual type.

Hmph, how would hashmap_cmp_fn look like with that scheme?  It would
get one entry, another entry (or just the skeleton of it) and
optionally a separate keydata (if the second one is skeleton), and
the first two points at the embedded hashmap struct, not the
surrounding one.  The callback function is now responsible for
calling a hashmap_entry() macro that adjusts for the negative offset
like list_entry() does?

> I think we could even get away with not converting
> existing callers; if the hashmap _is_ at the front, then that
> list_entry() really just devolves to a cast.

Yes.

> So as long as the struct
> definition and the users of the struct agree, it would just work.

Yes, too.

Was it ever a consideration, when allowing struct list-head anywhere
in the enclosing struct, that it would allow an element to be on
more than one list?  Would it benefit us to be able to place an
element in multiple hashmaps because we do not have to have the
embedded hashmap_entry always at the beginning if we did this
change?


Re: [PATCH v6 3/7] eoie: add End of Index Entry (EOIE) extension

2018-09-29 Thread Junio C Hamano
Duy Nguyen  writes:

> On Wed, Sep 26, 2018 at 03:54:38PM -0400, Ben Peart wrote:
>> +
>> +#define EOIE_SIZE (4 + GIT_SHA1_RAWSZ) /* <4-byte offset> + <20-byte hash> 
>> */
>> +#define EOIE_SIZE_WITH_HEADER (4 + 4 + EOIE_SIZE) /* <4-byte signature> + 
>> <4-byte length> + EOIE_SIZE */
>
> If you make these variables instead of macros, you can use
> the_hash_algo, which makes this code sha256-friendlier and probably
> can explain less, e.g. ...
>
>> +
>> +static size_t read_eoie_extension(const char *mmap, size_t mmap_size)
>> +{
>> +/*
>> + * The end of index entries (EOIE) extension is guaranteed to be last
>> + * so that it can be found by scanning backwards from the EOF.
>> + *
>> + * "EOIE"
>> + * <4-byte length>
>> + * <4-byte offset>
>> + * <20-byte hash>

20? ;-)

>> + */
>
>   uint32_t EOIE_SIZE = 4 + the_hash_algo->rawsz;
>   uint32_t EOIE_SIZE_WITH_HEADER = 4 + 4 + EOIE_SIZE;
>
>> +const char *index, *eoie;
>> +uint32_t extsize;
>> +size_t offset, src_offset;
>> +unsigned char hash[GIT_MAX_RAWSZ];
>> +git_hash_ctx c;
> --
> Duy


Re: [PATCH] help: allow redirecting to help for aliased command

2018-09-29 Thread Junio C Hamano
Jeff King  writes:

> Right, I'm proposing only to add the extra message and then continue as
> usual.
>
> It is a little funny, I guess, if you have a script which doesn't
> respond to "-h", because you'd get our "foo is aliased to git-bar"
> message to stderr, followed by who-knows-what. But as long as it's to
> stderr (and not stdout), I think it's not likely to _break_ anything.
>
>> >   - "git cp --help" opens the manpage for cherry-pick. We don't bother
>> > with the alias definition, as it's available through other means
>> > (and thus we skip the obliteration/timing thing totally).
>> 
>> It sounds like you suggest doing this unconditionally, and without any
>> opt-in via config option or a short wait? That would certainly work for
>> me. It is, in fact, how I expect 'git cp --help' to work, until I get
>> reminded that it does not... Also, as Junio noted, is consistent with
>> --help generally providing more information than -h - except that one
>> loses the 'is an alias for' part for --help.
>
> Yes, I'd suggest doing it always. No config, no wait.

While I do think your suggestion is the best among various ones
floated in the thread, I just realized there is one potential glitch
even with that approach.  

Suppose "git foo" is aliased to a command "git bar".

The best case is when "git bar -h" knows that it is asked to give us
a short usage.  We get "foo is aliased to bar" followed by the short
usage for "bar" and everything is visible above the shell prompt
after all that happens.

The second best case is when "git bar" simply does not support "-h"
but actively notices an unknown option on the command line to give
the usage message.  We see "foo is aliased to bar" followed by "-h
is an unknown option; supported options are ..." and everything is
visible above the shell prompt after all that happens.

The worst case is when "git bar" supports or ignores "-h" and
produces reams of output.  Sending the "aliased to" message to the
standard error means that it is scrolled out when the output is
done, or lost even when "git foo -h | less" attempts to let the
reader read before the early part of the output scrolls away.

Even the first two "better" cases share the same glitch if the "foo
is aliased to bar" goes to the standard error output.  Parse-options
enabled commands tend to show a long "-h" output that you would need
to say "git grep -h | less", losing the "aliased to" message.

At least it seems to me an improvement to use standard output,
instead of standard error, for the alias information.

In practice, however, what the command that "git foo" is aliased to
does when given "-h" is probably unknown (because the user is asking
what "git foo" is in the first place), so perhaps I am worried too
much.  When the user does not know if the usage text comes to the
standard output or to the standard error, and if the usage text is
very long or not, they probably would learn quickly that the safest
thing to do is to

$ git unknown-command -h >&2 | less

And at that point, it does not matter which between the standard
output and the standard error streams we write "unknown-command is
aliased to ...".

So I dunno.


Re: [PATCH] grep: provide a noop --recursive option

2018-09-29 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason   writes:

> This --recursive (-r) option does nothing, and is purely here to
> appease people who have "grep -r ..." burned into their muscle memory.
>
> Requested-by: Christoph Berg 
> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---

I personally am not all that sympathetic to the "'git grep' and
'grep' sound similar, so even though it won't do anything useful
just add a synonym to noop" line of reasoning.  That will lead to
sloppy noop, and will invite unbound amount of busywork to deal with
future complaints like "oh, but 'grep GNU COPYING' does not give
useless filename in front like the same command line with 'git'
prefixed; please fix 'git grep'" (which we'd have to say "no",
wasting our time).

I however do not mind if we added "--recursive" with matching
"--no-recursive", and

 - made "--recursive" the default (obviously)

 - made "--no-recursive" a synonym to setting the recursion limit
   to "never recurse"

 - and made "--recursive" a synonym to setting the recursion limit
   to "infinity".

That would be more work than this patch.  But if I see "--recursive"
advertised as a feature, and the command by default goes recursive,
I do expect to be able to tell it not to recurse.

I also expect folks who are used to "git grep --re" to summon
the only option of the command that begins with that prefix to start
complaining that they now have to type "--recurs" instead.  I
am not solving that with the above suggestion to improve the
suggested "noop".


Re: [PATCH] git doc: direct bug reporters to mailing list archive (Re: [PATCH v2] git.txt: mention mailing list archive)

2018-09-28 Thread Junio C Hamano
Jonathan Nieder  writes:

> My experience is that bug reporters are very sensitive to hints the
> project gives about what kind of bugs they want to receive.  I'd
> rather make use of that lesson now instead of waiting to relearn it in
> the wild.  Here goes.

OK.  This unfortunately came a bit too late for today's integration
cycle, so I'll leave this in my inbox and replace what is queued 
with it later.

Unless there is another one to supersede this proposal comes before
that happens, that is.

Thanks.



Re: [PATCH] submodule: Alllow staged changes for get_superproject_working_tree

2018-09-28 Thread Junio C Hamano
Sam McKelvie  writes:

>> Or perhaps
>> 
>> rev-parse: --show-superproject-working-tree should work during a merge
>> 
>> may be more to the point.  It does not hint the root cause of the
>> bug like the other one, but is more direct how the breakage would
>> have been observed by the end users.
>> 
> Haha, that is closer to my original title that Stefan wanted to change:
>
> submodule.c: Make get_superproject_working_tree() work when superproject has 
> unmerged changes of the submodule reference
>
> Though I could see why mine was too long.
>
> I’m really happy with both your suggestions

I've pushed out with the "rev-parse: ..." as the title.

Thanks for the fix.



Re: [PATCH v6 4/7] config: add new index.threads config setting

2018-09-28 Thread Junio C Hamano
Ramsay Jones  writes:

>>     if (!nr) {
>>     ieot_blocks = istate->cache_nr / THREAD_COST;
>> -   if (ieot_blocks < 1)
>> -   ieot_blocks = 1;
>>     cpus = online_cpus();
>>     if (ieot_blocks > cpus - 1)
>>     ieot_blocks = cpus - 1;
>
> So, am I reading this correctly - you need cpus > 2 before an
> IEOT extension block is written out?
>
> OK.

Why should we be even calling online_cpus() in this codepath to
write the index in a single thread to begin with?

The number of cpus that readers would use to read this index file
has nothing to do with the number of cpus available to this
particular writer process.  



Re: [PATCH] git-rebase.sh: fix typos in error messages

2018-09-28 Thread Junio C Hamano
Junio C Hamano  writes:

> ...  However, because the same mistakes are inherited to
> builtin/rebase.c by these topics, we'd need a matching fix to
> correct 07664161 ("builtin rebase: error out on incompatible
> option/mode combinations", 2018-08-08) and either squash the fix
> into that commit, or queue it on top of pk/rebase-in-c-5-test topic.
>
> Will queue; thanks.

Here is what I'd queue, too.

-- >8 --
Subject: [PATCH] rebase: fix typos in error messages

The separator between words in a multi-word option name is a dash,
not an underscore.

Inspired by a matching change by Ralf Thielow for the scripted
version of "git rebase".

Signed-off-by: Junio C Hamano 
---
 builtin/rebase.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin/rebase.c b/builtin/rebase.c
index 1a697d70c9..0f9a40aae5 100644
--- a/builtin/rebase.c
+++ b/builtin/rebase.c
@@ -1135,15 +1135,15 @@ int cmd_rebase(int argc, const char **argv, const char 
*prefix)
 *   git-rebase.txt caveats with "unless you know what you 
are doing"
 */
if (options.rebase_merges)
-   die(_("error: cannot combine '--preserve_merges' with "
+   die(_("error: cannot combine '--preserve-merges' with "
  "'--rebase-merges'"));
 
if (options.rebase_merges) {
if (strategy_options.nr)
-   die(_("error: cannot combine '--rebase_merges' with "
+   die(_("error: cannot combine '--rebase-merges' with "
  "'--strategy-option'"));
if (options.strategy)
-   die(_("error: cannot combine '--rebase_merges' with "
+   die(_("error: cannot combine '--rebase-merges' with "
  "'--strategy'"));
}
 
-- 
2.19.0-271-gfe8321ec05



Re: [PATCH] git-rebase.sh: fix typos in error messages

2018-09-28 Thread Junio C Hamano
Ralf Thielow  writes:

> Signed-off-by: Ralf Thielow 
> ---
>  git-rebase.sh | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

This patch itself will soon stop mattering once the group of
rebase-in-c topics graduate, which hopefully will happen during this
cycle.  However, because the same mistakes are inherited to
builtin/rebase.c by these topics, we'd need a matching fix to
correct 07664161 ("builtin rebase: error out on incompatible
option/mode combinations", 2018-08-08) and either squash the fix
into that commit, or queue it on top of pk/rebase-in-c-5-test topic.

Will queue; thanks.

>
> diff --git a/git-rebase.sh b/git-rebase.sh
> index 7973447645..45b6ee9c0e 100755
> --- a/git-rebase.sh
> +++ b/git-rebase.sh
> @@ -553,15 +553,15 @@ then
>   # Note: incompatibility with --interactive is just a strong warning;
>   #   git-rebase.txt caveats with "unless you know what you are doing"
>   test -n "$rebase_merges" &&
> - die "$(gettext "error: cannot combine '--preserve_merges' with 
> '--rebase-merges'")"
> + die "$(gettext "error: cannot combine '--preserve-merges' with 
> '--rebase-merges'")"
>  fi
>  
>  if test -n "$rebase_merges"
>  then
>   test -n "$strategy_opts" &&
> - die "$(gettext "error: cannot combine '--rebase_merges' with 
> '--strategy-option'")"
> + die "$(gettext "error: cannot combine '--rebase-merges' with 
> '--strategy-option'")"
>   test -n "$strategy" &&
> - die "$(gettext "error: cannot combine '--rebase_merges' with 
> '--strategy'")"
> + die "$(gettext "error: cannot combine '--rebase-merges' with 
> '--strategy'")"
>  fi
>  
>  if test -z "$rebase_root"


Re: [PATCH] strbuf.h: format according to coding guidelines

2018-09-28 Thread Junio C Hamano
Stefan Beller  writes:

> The previous patch suggested the strbuf header to be the leading example
> of how we would want our APIs to be documented. This may lead to some
> scrutiny of that code and the coding style (which is different from the
> API documentation style) and hence might be taken as an example on how
> to format code as well.
>
> So let's format strbuf.h in a way that we'd like to see:
> * omit the extern keyword from function declarations
> * name all parameters (usually the parameters are obvious from its type,
>   but consider exceptions like
>   `int strbuf_getwholeline_fd(struct strbuf *, int, int);`
> * break overly long lines
>
> Signed-off-by: Stefan Beller 
> ---
>
> Maybe this on top of Junios guideline patch?

If we were to do this, I'd rather see us do a very good job on this
file first, with "We are going to use this file as the best current
practice model for an API header file" to begin its log message,
and then recommend its use as the model on top.


Re: [PATCH] strbuf.h: format according to coding guidelines

2018-09-28 Thread Junio C Hamano
Junio C Hamano  writes:

> I actually do not mind the rule to be more like
>
>  * Use the same parameter names used in the function declaration when
>the description in the API documentation refers the parameter.

Assuming that we adopt the above guideline, let's extending it to
the original patch's review.

The following is a good example.  FIRST and SECOND would have been
upcased if this followed my earlier illustration to make them stand
out as references to the parameters, but it is already readable
without upcasing _and_ naming parameters is helping here.

 /**
  * Compare two buffers. Returns an integer less than, equal to, or greater
  * than zero if the first buffer is found, respectively, to be less than,
  * to match, or be greater than the second buffer.
  */
-extern int strbuf_cmp(const struct strbuf *, const struct strbuf *);
+int strbuf_cmp(const struct strbuf *first, const struct strbuf *second);



The next one could be improved and say something like: "Remove LEN
bytes in the strbuf SB starting at offset POS", as it already had
'pos' and 'len' that are readily usable.  Notice that "Remove LEN
bytes starting at offset POS" is a sufficiently clear description
and that is why I do not think we should require all parameters to
be named.

 /**
  * Remove given amount of data from a given position of the buffer.
  */
-extern void strbuf_remove(struct strbuf *, size_t pos, size_t len);
+void strbuf_remove(struct strbuf *sb, size_t pos, size_t len);
 

The last example is a job half-done.  The original had pos and len
parameters and referred to them in the text, but just said "with the
given data".  Now we have data and data_len, "the given data" can
and should be clarified by referring to them.

 /**
  * Remove the bytes between `pos..pos+len` and replace it with the given
  * data.
  */
-extern void strbuf_splice(struct strbuf *, size_t pos, size_t len,
- const void *, size_t);
+void strbuf_splice(struct strbuf *sb, size_t pos, size_t len,
+  const void *data, size_t data_len);



Re: [PATCH] strbuf.h: format according to coding guidelines

2018-09-28 Thread Junio C Hamano


Junio C Hamano  writes:

> Stefan Beller  writes:
>
>> So let's format strbuf.h in a way that we'd like to see:
>> * omit the extern keyword from function declarations
>
> OK
>
>> * name all parameters (usually the parameters are obvious from its type,
>>   but consider exceptions like
>>   `int strbuf_getwholeline_fd(struct strbuf *, int, int);`
>
> I am mildly against giving names to obvious ones.

Just to make sure nobody listening from sidelines do not
misunderstand, ones like getwholeline_fd() that takes more than one
parameter with the same time is a prime example of a function that
take non-obvious parameters that MUST be named.  

What I am mildly against is the rule that says "name ALL
parameters".  I tend to think these names make headers harder to
read, and prefer to keep them to the minimum.

I actually do not mind the rule to be more like

 * Use the same parameter names used in the function declaration when
   the description in the API documentation refers the parameter.

That _forces_ people to write

/**
 * Read a whole line up to the terminating character 
 * TERM (typically LF or NUL) from the file descriptor FD
 * and replace the contents of the strbuf SB with the
 * result ...
 */
int strbuf_getwholeline_fd(struct strbuf *sb, int fd, int term);

which is mostly equivalent to having a rule to always name all
parameters, while still allowing "sb" to be omitted by rephrasing
"the contents of the given strbuf with the result ..." and I
consider that a good flexibility to have.



Re: [PATCH] strbuf.h: format according to coding guidelines

2018-09-28 Thread Junio C Hamano
Stefan Beller  writes:

> So let's format strbuf.h in a way that we'd like to see:
> * omit the extern keyword from function declarations

OK

> * name all parameters (usually the parameters are obvious from its type,
>   but consider exceptions like
>   `int strbuf_getwholeline_fd(struct strbuf *, int, int);`

I am mildly against giving names to obvious ones.


Re: [PATCH] Improvement to only call Git Credential Helper once

2018-09-28 Thread Junio C Hamano
Kyle Hubert  writes:

> Subject: Re: [PATCH] Improvement to only call Git Credential Helper once

Nobody will send in a patch to worsen things, so phrases like
"Improvement to" that convey no useful information has no place on
the title.

There probably are multiple ways that credential helpers are not
called just once and many of them probably are legit (e.g. "get" is
not the only request one helper can receive).  It is unclear why
"only call helper once" is an improvement unless the reader reads
more, which means the title could be a lot more improved.

Side note: this matters because in "git shortlog --no-merges"
the title is the only thing you can tell your readers what
your contribution was about.

> When calling the Git Credential Helper that is set in the git config,
> the get command can return a credential. Git immediately turns around
> and calls the store command, even though that credential was just
> retrieved by the Helper. 

Good summary of the current behaviour.  A paragraph break here would
make the result easier to read.

> This creates two side effects. First of all,
> if the Helper requires a passphrase, the user has to type it in
> twice.

Hmph, because...?  If I am reading the flow correctly, an
application would

 - call credential_fill(), which returns when both username and
   password are obtained by a "get" request to one of the helpers.

 - use the credential to authenticate with a service and find that
   it was accepted.

 - call credential_approve(), which does "store" to all the helpers.

Where does the "twice" come from?

Ah, that is not between the application and the service, but between
the helper and the user you are required to "unlock" the helper?

OK, that wish makes sense.

It does not make much sense to ask helper A for credential and then
tell it to write it back the same thing.

HOWEVER.  Let me play a devil's advocate.

The "store" does not have to necessarily mean "write it back", no?

Imagine a helper that is connected to an OTP password device that
gives a different passcode every time button A is pressed, and there
are two other buttons B to tell the device that the password was
accepted and C to tell the device that the password was rejected
(i.e. we are out of sync).  "get" would press button A and read the
output, "store" would press button B and "erase" would press button
C, I would imagine.  With the current credential.c framework, you
can construct such a helper.  The proposed patch that stops calling
"store" unconditionally makes it impossible to build.

> Secondly, if the user has a number of helpers, this retrieves
> the credential from one service and writes it to all services.

It is unclear why you think it is a bad thing.  You need to
elaborate.

On the other hand, I can think of a case to illustrate why it is a
bad idea to unconditionally stop calling "store" to other helpers.
If one helper is a read-only "encrypted on disk" one, you may want
to require passphrase to "decrypt" to implement the "get" request to
the helper.  You would then overlay a "stay only in-core for a short
time" helper and give higher priority to it.

By doing so, on the first "get" request will ask the in-core one,
which says "I dunno", then the encrypted-on-disk one interacts with
the end-user and gives the credential.  The current code "store"s to
the in-core one as well as the encrypted-on-disk one, and second and
subseqhent "get" request can be served from that in-core helper.

Side note: and the "store" to encrypted-on-disk one may not
even need passphrase, even if "get" from it may need one.

"We got the credential from some helper, so we won't call store"
makes it impossible to build such an arrangement.

The above is a devil's advocate response in that I do not mean to
say that your proposed workaround does not solve *your* immediate
need, but to point out that you are closing many doors for needs
other people would have, or needs they already satisfy by taking
advantage of the current behaviour the proposed patch is breaking.

So, I dunno.  I certainly do not think it is a bad idea to stop
feeding _other_ helpers.  I also do not think it is a good idea to
unconditionally stop calling "store" to the same helper, but I can
see the benefit for having an option to skip "store" to the same
helper.  I am not sure if there should be an option to stop feeding
other helpers.

Thanks.


Re: [PATCH v3 3/5] fsmonitor: update GIT_TEST_FSMONITOR support

2018-09-28 Thread Junio C Hamano
Ben Peart  writes:

> Junio, can you squash in the following patch or would you prefer I
> reroll the entire series?

Squash it to f8cd77d5 ("fsmonitor: update GIT_TEST_FSMONITOR
support", 2018-09-18) and use the two new lines in the log message?

I can do that.

Thanks.


<    3   4   5   6   7   8   9   10   11   12   >