Re: [PATCH 1/1] commit-graph: define GIT_TEST_COMMIT_GRAPH

2018-10-08 Thread Junio C Hamano
Derrick Stolee  writes:

> I see these failures, too, but I believe they are due to
> ds/commit-graph-with-grafts not being merged to 'next' yet. The
> purpose of that branch is to fix these test breaks. The environment
> variable got merged a lot faster.

A separate "ping" would have helped me.  Will merge it down to
'next'.


Re: How to handle patch series conflicts

2018-10-08 Thread Junio C Hamano
Stephen & Linda Smith  writes:

> Junio - I've been working this but would like your opinion on 7500, 7501 and 
> now 7510. 
>
> I note that the the commit tests have intermixed functionality.  An example 
> is 
> signoff tests that are in the three tests I mentioned. 
>
> I've been tempted multiple times over the last week to just merge the tests 
> into a single script, but that doesn't seem right either.
>
> So would you prefer a single script?   Would you prefer me to move tests 
> around?

The scripts themselves having the same name that is no more specific
tha just "commit" does not bother _me_ personally too much.  If I
were doing it, unless you are an obsessive type that wants to see
spanking cleanness everywhere, I'd limit the changes to the minimum.

If something tested in script X is tested in another script Y and it
is trivial to see they are testing exactly the same thing, removing
one copy from script Y would be good, and if the remaining changes
in script Y becomes more focused with only such removals, that would
even be better, as at that point we can rename "tY-commit.sh" to
something more specific like "tY-commit-signature.sh".


Re: [PATCH 1/1] rebase -i: introduce the 'break' command

2018-10-08 Thread Junio C Hamano
Junio C Hamano  writes:

>> There is one crucial difference: the exit code.
>
> OK, and it was good that you explicitly said "with exit code 0" in
> the log message.  Together with the idea to update the doc I floated
> earlier, this probably is worth documenting, too.

Heh, I am becoming sloppy in reviewing.  The patch does not even
update any doc.

It is not a reason to reject the change (the change looks reasonably
simple and reviewers and those who have to look at the code to build
upon it would understand it in the current shape), but it is a
blocker for the change to be merged to 'next' and down.



Re: git log -S or -G

2018-10-08 Thread Junio C Hamano
Jeff King  writes:

> I think that is the best we could do for "-S", though, which is
> inherently about counting hits.
>
> For "-G", we are literally grepping the diff. It does not seem
> unreasonable to add the ability to grep only "-" or "+" lines, and the
> interface for that should be pretty straightforward (a tri-state flag to
> look in remove, added, or both lines).

Yeah, here is a lunchtime hack that hasn't even been compile tested.

 diff.c |  4 
 diff.h |  2 ++
 diffcore-pickaxe.c | 22 --
 3 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index f0c7557b40..d1f2780844 100644
--- a/diff.c
+++ b/diff.c
@@ -5068,6 +5068,10 @@ int diff_opt_parse(struct diff_options *options,
}
else if (!strcmp(arg, "--pickaxe-all"))
options->pickaxe_opts |= DIFF_PICKAXE_ALL;
+   else if (!strcmp(arg, "--pickaxe-ignore-add"))
+   options->pickaxe_opts |= DIFF_PICKAXE_IGNORE_ADD;
+   else if (!strcmp(arg, "--pickaxe-ignore-del"))
+   options->pickaxe_opts |= DIFF_PICKAXE_IGNORE_DEL;
else if (!strcmp(arg, "--pickaxe-regex"))
options->pickaxe_opts |= DIFF_PICKAXE_REGEX;
else if ((argcount = short_opt('O', av, ))) {
diff --git a/diff.h b/diff.h
index a30cc35ec3..147c47ace7 100644
--- a/diff.h
+++ b/diff.h
@@ -358,6 +358,8 @@ int git_config_rename(const char *var, const char *value);
 DIFF_PICKAXE_KIND_OBJFIND)
 
 #define DIFF_PICKAXE_IGNORE_CASE   32
+#define DIFF_PICKAXE_IGNORE_ADD64
+#define DIFF_PICKAXE_IGNORE_DEL 128
 
 void diffcore_std(struct diff_options *);
 void diffcore_fix_diff_index(struct diff_options *);
diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index 800a899c86..826dde6bd4 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -16,6 +16,7 @@ typedef int (*pickaxe_fn)(mmfile_t *one, mmfile_t *two,
 
 struct diffgrep_cb {
regex_t *regexp;
+   struct diff_options *diff_options;
int hit;
 };
 
@@ -23,9 +24,14 @@ static void diffgrep_consume(void *priv, char *line, 
unsigned long len)
 {
struct diffgrep_cb *data = priv;
regmatch_t regmatch;
+   unsigned pickaxe_opts = data->diff_options->pickaxe_opts;
 
if (line[0] != '+' && line[0] != '-')
return;
+   if ((pickaxe_opts & DIFF_PICKAXE_IGNORE_ADD) && line[0] == '+')
+   return;
+   if ((pickaxe_opts & DIFF_PICKAXE_IGNORE_DEL) && line[0] == '-')
+   return;
if (data->hit)
/*
 * NEEDSWORK: we should have a way to terminate the
@@ -45,13 +51,20 @@ static int diff_grep(mmfile_t *one, mmfile_t *two,
xpparam_t xpp;
xdemitconf_t xecfg;
 
-   if (!one)
+   if (!one) {
+   if (o->pickaxe_opts & DIFF_PICKAXE_IGNORE_ADD)
+   return 0;
return !regexec_buf(regexp, two->ptr, two->size,
1, , 0);
-   if (!two)
+   }
+   if (!two) {
+   if (o->pickaxe_opts & DIFF_PICKAXE_IGNORE_DEL)
+   return 0;
return !regexec_buf(regexp, one->ptr, one->size,
1, , 0);
+   }
 
+   ecbdata.diff_options = o;
/*
 * We have both sides; need to run textual diff and see if
 * the pattern appears on added/deleted lines.
@@ -113,6 +126,11 @@ static int has_changes(mmfile_t *one, mmfile_t *two,
 {
unsigned int one_contains = one ? contains(one, regexp, kws) : 0;
unsigned int two_contains = two ? contains(two, regexp, kws) : 0;
+
+   if (o->pickaxe_opts & DIFF_PICKAXE_IGNORE_ADD)
+   return one_contains > two_contains;
+   if (o->pickaxe_opts & DIFF_PICKAXE_IGNORE_DEL)
+   return one_contains < two_contains;
return one_contains != two_contains;
 }
 


Re: [BUG] A part of an edge from an octopus merge gets colored, even with --color=never

2018-10-08 Thread Jeff King
On Wed, Oct 03, 2018 at 06:32:06PM -0400, Noam Postavsky wrote:

> > which is admittedly pretty horrible, too, but at least resembles a
> > graph. I dunno.
> 
> Yeah, but it's lossy, so it doesn't seem usable for the test. Maybe
> doubling up some characters?
> 
> **  left
> R|  **B-B-M-M.  octopus-merge
> R|  R|Y\  B\  M\
> R|R/  Y/  B/  M/
> R|  Y|  B|  **  4
> R|  Y|  **  M|  3
> R|  Y|  M|M/
> R|  **  M|  2
> R|  M|M/
> **  M|  1
> M|M/
> **  initial

Yeah, I tried something similar, but it's hard to read as a graph since
the alignment is lost between lines. I agree the single-char version is
lossy, but I think in combination with checking the literal, uncolored
version, we'd be OK.

However, it may be best to just leave the original verbose version you
had. It's hard to read and to modify, but we don't plan for people to do
that very often. And it's at least simple.

> > I'm also not thrilled that we depend on the exact sequence of default
> > colors, but I suspect it's not the first time. And it wouldn't be too
> > hard to update it if that default changes.
> 
> Well, it's easy enough to set the colors explicitly. After doing this
> I noticed that green seems to be skipped. Not sure if that's a bug or
> not.

Hmm, yeah, that is weird. I think it's an artifact of the way we
increment the color selector, though, and not related to your patch (the
same thing happens before your fix, as well).

> > I think it's OK to have a dedicated script for even these two tests, if
> > it makes things easier to read. However, would we also want to test the
> > octopus without the problematic graph here? I think if we just omit
> > "left" we get that, don't we?
> 
> t4202-log.sh already does test a "normal" octopus merge (starting
> around line 615, search for "octopus-a"). But that is only a 3-parent
> merge. And adding another test is easy enough.
> [...]

Thanks, what you have here looks good.

> From cd9415b524357c2c8b9b20a63032c94e01d46a15 Mon Sep 17 00:00:00 2001
> From: Noam Postavsky 
> Date: Sat, 1 Sep 2018 20:07:16 -0400
> Subject: [PATCH v5] log: Fix coloring of certain octupus merge shapes

This whole version looks good to me. "git am" is supposed to understand
attachments, but it seems to want to apply our whole conversation as the
commit message.

You may want to repost one more time with this subject in the email
subject line to fix that and to get the maintainer's attention. Feel
free to add my:

  Reviewed-by: Jeff King 

after your signoff. Thanks for sticking with this topic!

-Peff


Re: [PATCH 1/1] rebase -i: introduce the 'break' command

2018-10-08 Thread Junio C Hamano
Johannes Schindelin  writes:

> Hi Junio,
>
> On Fri, 5 Oct 2018, Junio C Hamano wrote:
>
>> "Johannes Schindelin via GitGitGadget" 
>> writes:
>> 
>> > From: Johannes Schindelin 
>> >
>> > The 'edit' command can be used to cherry-pick a commit and then
>> > immediately drop out of the interactive rebase, with exit code 0, to let
>> > the user amend the commit, or test it, or look around.
>>  ...
>> If one wants to emulate this with the versions of Git that are
>> currently deployed, would it be sufficient to insert "exec false"
>> instead of "break"?
>
> There is one crucial difference: the exit code.

OK, and it was good that you explicitly said "with exit code 0" in
the log message.  Together with the idea to update the doc I floated
earlier, this probably is worth documenting, too.

>> I think the earlier request asked for 'pause' (I didn't dig the list
>> archive very carefully, though),
>
> No need to: I mentioned it in the cover letter. Here is the link again,
> for your convenience:
> https://public-inbox.org/git/20180118183618.39853-3-sbel...@google.com/

No, you misunderstood.  I knew what message in the immediate past
triggered this patch and that wasn't what I "could have dug for but
didn't".  "The earlier request" I meant was another one I recall
that was made long time ago---that was what I "could have dug for
but didn't".

> Good initial version. I would like it to be a bit more precise about who
> exits with what status. How about this:

Sounds good.  It is likely that I'll either forget or will continue
to be too busy to pick textual pieces and assemble together myself,
so I'll leave it as a left over bit for somebody reading from the
sideline to send in a finished version if they care deeply enough
;-)

Thanks.



Re: git log -S or -G

2018-10-08 Thread Jacob Keller
On Mon, Oct 8, 2018 at 8:22 PM Jeff King  wrote:
>
> On Tue, Oct 09, 2018 at 08:09:32AM +0900, Junio C Hamano wrote:
>
> > Julia Lawall  writes:
> >
> > >> Doing the same for -S is much harder at the machinery level, as it
> > >> performs its thing without internally running "diff" twice, but just
> > >> counts the number of occurrences of 'foo'---that is sufficient for
> > >> its intended use, and more efficient.
> > >
> > > There is still the question of whether the number of occurrences of foo
> > > decreases or increases.
> >
> > Hmph, taking the changes that makes the number of hits decrease
> > would catch a subset of "changes that removes 'foo' only---I am not
> > interested in the ones that adds 'foo'".  It will avoid getting
> > confused by a change that moves an existing 'foo' to another place
> > in the same file (as the number of hits does not change), but at the
> > same time, it will miss a change that genuinely removes an existing
> > 'foo' and happens to add a 'foo' at a different place in the same
> > file that is unrelated to the original 'foo'.  Depending on the
> > definition of "I am only interested in removed ones", that may or
> > may not be acceptable.
>
> I think that is the best we could do for "-S", though, which is
> inherently about counting hits.
>
> For "-G", we are literally grepping the diff. It does not seem
> unreasonable to add the ability to grep only "-" or "+" lines, and the
> interface for that should be pretty straightforward (a tri-state flag to
> look in remove, added, or both lines).
>
> -Peff

Yea. I know I've wanted something like this in the past.

Thanks,
Jake


Re: [PATCH v6 09/10] submodule: support reading .gitmodules when it's not in the working tree

2018-10-08 Thread Junio C Hamano
Junio C Hamano  writes:

> Antonio Ospite  writes:
>
>> Finally, add t7416-submodule-sparse-gitmodules.sh to verify that reading
>> from .gitmodules succeeds and that writing to it fails when the file is
>> not checked out.
>> ...
>>  t/t7416-submodule-sparse-gitmodules.sh | 78 ++
>
> This now triggers test-lint errors as the most recent maintenance
> release took t/t7416 for something else.  I'll do s/t7416-/t7418-/g
> on the mailbox before running "git am -s" on this series.

This is an unrelated tangent to the topic, but running "range-diff"
on what has been queued on 'pu' since mid September and this
replacement after doing the renaming was a surprisingly pleasant
experience.  In its comparison between 09/10 of the two iterations,
it showed that 7416's name has been changed to 7418 but otherwise
there is no change in the contents of that test script.

FWIW, tbdiff also gets this right, so the pleasant experience was
inherited without getting broken.  Kudos should go to both Thomas
and Dscho ;-).


Re: [PATCH v6 09/10] submodule: support reading .gitmodules when it's not in the working tree

2018-10-08 Thread Junio C Hamano
Antonio Ospite  writes:

> Finally, add t7416-submodule-sparse-gitmodules.sh to verify that reading
> from .gitmodules succeeds and that writing to it fails when the file is
> not checked out.
> ...
>  t/t7416-submodule-sparse-gitmodules.sh | 78 ++

This now triggers test-lint errors as the most recent maintenance
release took t/t7416 for something else.  I'll do s/t7416-/t7418-/g
on the mailbox before running "git am -s" on this series.



Re: git log -S or -G

2018-10-08 Thread Jeff King
On Tue, Oct 09, 2018 at 08:09:32AM +0900, Junio C Hamano wrote:

> Julia Lawall  writes:
> 
> >> Doing the same for -S is much harder at the machinery level, as it
> >> performs its thing without internally running "diff" twice, but just
> >> counts the number of occurrences of 'foo'---that is sufficient for
> >> its intended use, and more efficient.
> >
> > There is still the question of whether the number of occurrences of foo
> > decreases or increases.
> 
> Hmph, taking the changes that makes the number of hits decrease
> would catch a subset of "changes that removes 'foo' only---I am not
> interested in the ones that adds 'foo'".  It will avoid getting
> confused by a change that moves an existing 'foo' to another place
> in the same file (as the number of hits does not change), but at the
> same time, it will miss a change that genuinely removes an existing
> 'foo' and happens to add a 'foo' at a different place in the same
> file that is unrelated to the original 'foo'.  Depending on the
> definition of "I am only interested in removed ones", that may or
> may not be acceptable.

I think that is the best we could do for "-S", though, which is
inherently about counting hits.

For "-G", we are literally grepping the diff. It does not seem
unreasonable to add the ability to grep only "-" or "+" lines, and the
interface for that should be pretty straightforward (a tri-state flag to
look in remove, added, or both lines).

-Peff


Re: [PATCH v3] coccicheck: process every source file at once

2018-10-08 Thread Jeff King
On Fri, Oct 05, 2018 at 09:54:13PM +0200, SZEDER Gábor wrote:

> Runtimes tend to fluctuate quite a bit more on Travis CI compared to
> my machine, but not this much, and it seems to be consistent so far.
> 
> After scripting/querying the Travis CI API a bit, I found that from
> the last 100 static analysis build jobs 78 did actully run 'make
> coccicheck' [1], avaraging 470s for the whole build job, with only 4
> build job exceeding the 10min mark.
> 
> I had maybe 6-8 build jobs running this patch over the last 2-3 days,
> I think all of them were over 15min.  (I restarted some of them, so I
> don't have separate logs for all of them, hence the uncertainty.)

So that's really weird and counter-intuitive, since we should be doing
strictly less work. I know that spatch tries to parallelize itself,
though from my tests, 1.0.4 does not. I wonder if the version in Travis
differs in that respect and starts too many threads, and the extra time
is going to contention and context switches.

Have you tried passing "-j1" to spatch? My 1.0.4 does not even recognize
it.

That seems like a pretty unlikely explanation to me, but I am having
trouble coming up with another one.

I guess the other plausible thing is that the extra memory is forcing us
into some slower path. E.g., a hypervisor may even be swapping,
unbeknownst to the child OS, and it gets accounted in the child OS as
"boy, that memory load was really slow", which becomes used CPU.

That actually sounds more credible to me.

-Peff


Re: [PATCH v3] coccicheck: process every source file at once

2018-10-08 Thread Jeff King
On Sat, Oct 06, 2018 at 10:42:57AM +0200, René Scharfe wrote:

> > That's OK, too, assuming people would actually want to use it. I'm also
> > OK shipping this (with the "make -j" fix you suggested) and seeing if
> > anybody actually complains. I assume there are only a handful of people
> > running coccicheck in the first place.
> 
> FWIW, my development environment is a virtual machine with 1200MB RAM
> and 900MB swap space.  coccicheck takes almost eight minutes
> sequentially, and four and a half minutes with -j4.
> 
> Unsurprisingly, it fails after almost three minutes with the patch,
> reporting that it ran out of memory.  With 2900MB it fails after almost
> two minutes, with 3000MB it succeeds after a good two minutes.
> 
> time(1) says (for -j1):
> 
> 433.30user 36.17system 7:49.84elapsed 99%CPU (0avgtext+0avgdata 
> 108212maxresident)k
> 192inputs+1512outputs (0major+16409056minor)pagefaults 0swaps
> 
> 129.74user 2.06system 2:13.27elapsed 98%CPU (0avgtext+0avgdata 
> 1884568maxresident)k
> 236896inputs+1096outputs (795major+462129minor)pagefaults 0swaps
> 
> So with the patch it's more than three times faster, but needs more
> than seventeen times more memory.  And I need a bigger VM. :-/

Yuck. :) So if we want to take this as a complaint, then I guess we can
jump straight to implementing the fallback to the existing behavior
(though it may be worth it for you to expand your VM to get the
decreased CPU time).

I'm still puzzled by Gábor's counter-intuitive CI numbers, though.

-Peff


Re: [PATCH v5 0/4] Filter alternate references

2018-10-08 Thread Jeff King
On Mon, Oct 08, 2018 at 11:09:20AM -0700, Taylor Blau wrote:

> Attached is (what I anticipate to be) the final re-roll of my series to
> introduce 'core.alternateRefsCommand' and 'core.alternateRefsPrefixes'
> in order to limit the ".have" advertisement when pushing over protocol
> v1 to a repository with configured alternates.

Thanks, this looks good to me!

-Peff


Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-08 Thread Jeff King
On Mon, Oct 08, 2018 at 02:29:47PM -0400, Derrick Stolee wrote:

> > > > But I'm afraid it will take a while until I get around to turn it into
> > > > something presentable...
> > > Do you have the code pushed somewhere public where one could take a look? 
> > > I
> > > Do you have the code pushed somewhere public where one could take a
> > > look? I could provide some early feedback.
> > Nah, definitely not...  I know full well how embarassingly broken this
> > implementation is, I don't need others to tell me that ;)
> There are two questions that I was hoping to answer by looking at your code:
> 
> 1. How do you store your Bloom filter? Is it connected to the commit-graph
> and split on a commit-by-commit basis (storing "$path" as a key), or is it
> one huge Bloom filter (storing "$commitid:$path" as key)?

I guess you've probably thought all of this through for your
implementation, but let me pontificate.

I'd have done it as one fixed-size filter per commit. Then you should be
able to hash the path keys once, and apply the result as a bitwise query
to each individual commit (I'm assuming that it's constant-time to
access the filter for each, as an index into an mmap'd array, with the
offset coming from a commit-graph entry we'd be able to look up anyway).

I think it would also be easier to deal with maintenance, since each
filter is independent (IIRC, you cannot delete from a bloom filter
without re-adding all of the other keys).

The obvious downside is that it's O(commits) storage instead of O(1).

Now let me ponder a bit further afield. A bloom filter lets us answer
the question "did this commit (probably) touch these paths?". But it
does not let us answer "which paths did this commit touch?".

That second one is less useful than you might think, because we almost
always care about not just the names of the paths, but their actual
object ids. Think about a --raw diff, or even traversing for
reachability (where if we knew the tree-diff cheaply, we could avoid
asking "have we seen this yet?" on most of the tree entries). The names
alone can make that a bit faster, but in the worst case you still have
to walk the whole tree to find their entries.

But there's also a related question: how do we match pathspec patterns?
For a changed path like "foo/bar/baz", I imagine a bloom filter would
mark all of "foo", "foo/bar", and "foo/bar/baz". But what about "*.c"? I
don't think a bloom filter can answer that.

At least not by itself. If we imagine that the commit-graph also had an
alphabetized list of every path in every tree, then it's easy: apply the
glob to that list once to get a set of concrete paths, and then query
the bloom filters for those. And that list actually isn't too big. The
complete set of paths in linux.git is only about 300k gzipped (I think
that's the most relevant measure, since it's an obvious win to avoid
repeating shared prefixes of long paths).

Imagine we have that list. Is a bloom filter still the best data
structure for each commit? At the point that we have the complete
universe of paths, we could give each commit a bitmap of changed paths.
That lets us ask "did this commit touch these paths" (collect the bits
from the list of paths, then check for 1's), as well as "what paths did
we touch" (collect the 1 bits, and then index the path list).  Those
bitmaps should compress very well via EWAH or similar (most of them
would be huge stretches of 0's punctuated by short runs of 1's).

So that seems promising to me (or at least not an obvious dead-end). I
do think maintenance gets to be a headache, though. Adding new paths
potentially means reordering the bitmaps, which means O(commits) work to
"incrementally" update the structure. (Unless you always add the new
paths at the end, but then you lose fast lookups in the list; that might
be an acceptable tradeoff).

And finally, there's one more radical option: could we actually store a
real per-commit tree-diff cache? I.e., imagine that each commit had the
equivalent of a --raw diff easily accessible, including object ids. That
would allow:

  - fast pathspec matches, including globs

  - fast --raw output (and faster -p output, since we can skip the tree
entirely)

  - fast reachability traversals (we only need to bother to look at the
objects for changed entries)

where "fast" is basically O(size of commit's changes), rather than
O(size of whole tree). This was one of the big ideas of packv4 that
never materialized. You can _almost_ do it with packv2, since after all,
we end up storing many trees as deltas. But those deltas are byte-wise
so it's hard for a reader to convert them directly into a pure-tree
diff (they also don't mention the "deleted" data, so it's really only
half a diff).

So let's imagine we'd store such a cache external to the regular object
data (i.e., as a commit-graph entry). The "log --raw" diff of linux.git
has 1.7M entries. The paths should easily compress to a single 32-bit
integer (e.g., as an index 

Re: [PATCH][Outreachy] remove all the inclusions of git-compat-util.h in header files

2018-10-08 Thread Ananya Krishna Maram
On Mon, 8 Oct 2018 at 22:43, Derrick Stolee  wrote:
>
> On 10/8/2018 1:05 PM, Ananya Krishna Maram wrote:
> > Hi All,
> Hello, Ananya! Welcome.
>
> > I was searching through #leftovers and found this.
> > https://public-inbox.org/git/cabpp-bgvvxcbzx44er6to-pusfen_6gnyj1u5cuon9deaa4...@mail.gmail.com/
> >
> > This patch address the task discussed in the above link.
> The discussion above seems to not be intended for your commit message,
> but it does show up when I run `git am` and provide your email as input.
> The typical way to avoid this is to place all commentary below the "---"

Sorry, I didn't know that. Shall I re submit the patch with proper commentary.

> that signifies the commit message is over.
> > From: Ananya Krishan Maram 
> >
> > skip the #include of git-compat-util.h since all .c files include it.
> >
> > Signed-off-by: Ananya Krishna Maram 
> > ---
> >   advice.h | 1 -
> >   commit-graph.h   | 1 -
> >   hash.h   | 1 -
> >   pkt-line.h   | 1 -
> >   t/helper/test-tool.h | 1 -
> >   5 files changed, 5 deletions(-)
> >
> > diff --git a/advice.h b/advice.h
> > index ab24df0fd..09148baa6 100644
> > --- a/advice.h
> > +++ b/advice.h
> > @@ -1,7 +1,6 @@
> >   #ifndef ADVICE_H
> >   #define ADVICE_H
> >
> > -#include "git-compat-util.h"
> >
> >   extern int advice_push_update_rejected;
> >   extern int advice_push_non_ff_current;
> > diff --git a/commit-graph.h b/commit-graph.h
> > index b05047676..0e93c2bed 100644
> > --- a/commit-graph.h
> > +++ b/commit-graph.h
> > @@ -1,7 +1,6 @@
> >   #ifndef COMMIT_GRAPH_H
> >   #define COMMIT_GRAPH_H
> >
> > -#include "git-compat-util.h"
> >   #include "repository.h"
> >   #include "string-list.h"
> >   #include "cache.h"
> > diff --git a/hash.h b/hash.h
> > index 7c8238bc2..9a4334c5d 100644
> > --- a/hash.h
> > +++ b/hash.h
> > @@ -1,7 +1,6 @@
> >   #ifndef HASH_H
> >   #define HASH_H
> >
> > -#include "git-compat-util.h"
> >
> >   #if defined(SHA1_PPC)
> >   #include "ppc/sha1.h"
> > diff --git a/pkt-line.h b/pkt-line.h
> > index 5b28d4347..fdd316494 100644
> > --- a/pkt-line.h
> > +++ b/pkt-line.h
> > @@ -1,7 +1,6 @@
> >   #ifndef PKTLINE_H
> >   #define PKTLINE_H
> >
> > -#include "git-compat-util.h"
> >   #include "strbuf.h"
> >
> >   /*
> > diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
> > index e07495727..24e0a1589 100644
> > --- a/t/helper/test-tool.h
> > +++ b/t/helper/test-tool.h
> > @@ -1,7 +1,6 @@
> >   #ifndef __TEST_TOOL_H__
> >   #define __TEST_TOOL_H__
> >
> > -#include "git-compat-util.h"
> >
> >   int cmd__chmtime(int argc, const char **argv);
> >   int cmd__config(int argc, const char **argv);
> I applied these changes locally and confirmed the code compiles, so all
> .c files including these _do_ include git-compat-util.h properly.
>
> Thanks,
> -Stolee
>


Re: What's so special about objects/17/ ?

2018-10-08 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> Depending on how we're counting there's at least two.

I thought you were asking "why the special sentinel is not 0{40}?"
You counted the number of reasons why 0{40} is used to stand in for
a real value, but that was the number I didn't find interesting in
the scope of this discussion, i.e. "why the special sample is 17?"

I vaguely recall we also used 0{39}1 for something else long time
ago; I offhand do not recall if we still do, or we got rid of it.


Re: What's so special about objects/17/ ?

2018-10-08 Thread Junio C Hamano
Stefan Beller  writes:

> On Sun, Oct 7, 2018 at 1:07 PM Junio C Hamano  wrote:
>>
>> Junio C Hamano  writes:
>
>> > ...
>> > by general public and I do not have to explain the choice to the
>> > general public ;-)
>>
>> One thing that is more important than "why not 00 but 17?" to answer
>> is why a hardcoded number rather than a runtime random.  It is for
>> repeatability.
>
> Let's talk about repeatability vs statistics for a second. ;-)

Oh, I think I misled you by saying "more important".  

I didn't mean that it is more important to stick to the "use
hardcoded value" design decision than sticking to "use 17".  I've
made sure that everybody would understnd choosing any arbitrary byte
value other than "17" does not make the resulting Git any better nor
worse.  But discussing the design decision to use hardcoded value is
"more important", as that affects the balance between the end-user
experience and debuggability, and I tried to help those who do not
know the history by giving the fact that choice was made for the
latter and not for other hidden reasons, that those who would
propose to change the system may have to keep in mind.

Sorry if you mistook it as if I were saying that it is important to
keep the design to use a hardcoded byte value.  That wasn't what the
message was about.


Re: [PATCH] builtin/grep.c: remote superflous submodule code

2018-10-08 Thread Stefan Beller
> Well, submodule-config.c has its implementation and another caller,
> which technically is outside submodule.c ;-)

i.e. there is a typo in my commit message.
I meant to say submodule-config.c

>  repo_read_gitmodules
> has two more callers in unpack-trees.c these days, so perhaps we can
> do without this last paragraph.

Gah, looking at that code, did we have any reason to rush that series?
c.f. https://public-inbox.org/git/20170811171811.gc1...@book.hvoigt.net/


On Sat, Oct 6, 2018 at 5:33 PM Junio C Hamano  wrote:
>
> Stefan Beller  writes:
>
> > After ff6f1f564c4 (submodule-config: lazy-load a repository's .gitmodules
> > file, 2017-08-03) this is no longer necessary, but that commit did not
> > cleanup the whole tree, but just show cased the new way how to deal with
> > submodules in ls-files.
>
> The log message of the above one singles out "grep" as a special
> case and explalins why it did not touch, by the way.  You probably
> need to explain the reason why "this is no longer necessary" a bit
> better than the above---as it stands, it is "ff6f1f564c4 said it
> still is necessary, I say it is not".

That is true.

For grep, the reason seems to be, that we check is_submodule_active
based off the index, i.e. using
   module = submodule_from_path(repo, _oid, path);
as the deciding factor, which falls in line with lazyloading.

However the use of the specialized gitmodules_config_oid
in grep is also guarded by the same commit ff6f1f564c4.

Going back to the use case of unpack-trees.c,
I think that we need to keep it there as alternatives
seem to be more complicated.

So I guess I'll just resend with a better commit message.

Thanks,
Stefan


Re: [PATCH 14/14] rerere: convert to use the_hash_algo

2018-10-08 Thread Stefan Beller
On Mon, Oct 8, 2018 at 2:57 PM brian m. carlson
 wrote:
>
> Since this data is stored in the .git directory, it makes sense for us
> to use the same hash algorithm for it as for everything else.  Convert
> the remaining uses of SHA-1 to use the_hash_algo.  Use GIT_MAX_RAWSZ for
> allocations.  Rename various struct members, local variables, and a
> function to be named "hash" instead of "sha1".
>
> Signed-off-by: brian m. carlson 
> ---
>  rerere.c | 81 +---
>  1 file changed, 42 insertions(+), 39 deletions(-)

I have reviewed all patches and had minor questions on some of them,
all others look good to me.

Thanks,
Stefan


Re: [PATCH 13/14] submodule: make zero-oid comparison hash function agnostic

2018-10-08 Thread Stefan Beller
On Mon, Oct 8, 2018 at 2:57 PM brian m. carlson
 wrote:
>
> With SHA-256, the length of the all-zeros object ID is longer.  Add a
> function to git-submodule.sh to check if a full hex object ID is the
> all-zeros value, and use it to check the output we're parsing from git
> diff-files or diff-index.
>
> Signed-off-by: brian m. carlson 
> ---

Nice!


Re: git log -S or -G

2018-10-08 Thread Junio C Hamano
Julia Lawall  writes:

>> Doing the same for -S is much harder at the machinery level, as it
>> performs its thing without internally running "diff" twice, but just
>> counts the number of occurrences of 'foo'---that is sufficient for
>> its intended use, and more efficient.
>
> There is still the question of whether the number of occurrences of foo
> decreases or increases.

Hmph, taking the changes that makes the number of hits decrease
would catch a subset of "changes that removes 'foo' only---I am not
interested in the ones that adds 'foo'".  It will avoid getting
confused by a change that moves an existing 'foo' to another place
in the same file (as the number of hits does not change), but at the
same time, it will miss a change that genuinely removes an existing
'foo' and happens to add a 'foo' at a different place in the same
file that is unrelated to the original 'foo'.  Depending on the
definition of "I am only interested in removed ones", that may or
may not be acceptable.





Re: [PATCH 02/14] builtin/repack: replace hard-coded constant

2018-10-08 Thread Eric Sunshine
On Mon, Oct 8, 2018 at 6:27 PM Stefan Beller  wrote:
> On Mon, Oct 8, 2018 at 2:57 PM brian m. carlson
>  wrote:
> > -   if (line.len != 40)
> > -   die("repack: Expecting 40 character sha1 lines only 
> > from pack-objects.");
> > +   if (line.len != the_hash_algo->hexsz)
> > +   die("repack: Expecting full hex object ID lines 
> > only from pack-objects.");
>
> This is untranslated as it is plumbing? If so, maybe
>
> if (is_sha1(the_hash_algo)
> die("repack: Expecting 40 character sh...
> else
> die(repack: Expecting full hex object ID in %s, of length %d",
> the_hash_algo->name,
> the_hash_algo->hexsz);

Special-casing for SHA-1 seems overkill for an error message. A script
expecting this particular error condition and this particular error
message would be fragile indeed.


Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-08 Thread Junio C Hamano
SZEDER Gábor  writes:

> There is certainly potential there.  With a (very) rough PoC
> experiment, a 8MB bloom filter, and a carefully choosen path I can
> achieve a nice, almost 25x speedup:
>
>   $ time git rev-list --count HEAD -- t/valgrind/valgrind.sh
>   6
>
>   real0m1.563s
>   user0m1.519s
>   sys 0m0.045s
>
>   $ time GIT_USE_POC_BLOOM_FILTER=y ~/src/git/git rev-list --count HEAD -- 
> t/valgrind/valgrind.sh
>   6
>
>   real0m0.063s
>   user0m0.043s
>   sys 0m0.020s

Even though I somehow sense a sign of exaggeration in [v] in the
pathname, it still is quite respectable.

>   bloom filter total queries: 16269 definitely not: 16195 maybe: 74 false 
> positives: 64 fp ratio: 0.003934
>
> But I'm afraid it will take a while until I get around to turn it into
> something presentable...

That's OK.  This is an encouraging result.

Just from curiousity, how are you keying the filter?  tree object
name of the top-level and full path concatenated or something like
that?


Re: [PATCH 06/14] packfile: express constants in terms of the_hash_algo

2018-10-08 Thread Stefan Beller
On Mon, Oct 8, 2018 at 2:57 PM brian m. carlson
 wrote:
>
> Replace uses of GIT_SHA1_RAWSZ with references to the_hash_algo to avoid
> dependence on a particular hash length.

Unlike the previous patches, this is dealing directly with packfiles,
which (I would think) carry their own hash function selector?
(i.e. packfiles up to version 4 are sha1 hardcoded and version
5 and onwards will have a hash type field. Usually that hash type would
match what is in the_repository, but you could obtain packfiles
out of band, or the translation table that we plan to have might
be part of the packfile/idx file?)


>
> Signed-off-by: brian m. carlson 
> ---
>  packfile.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/packfile.c b/packfile.c
> index 841b36182f..17f993b5bf 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -1121,13 +1121,14 @@ int unpack_object_header(struct packed_git *p,
>  void mark_bad_packed_object(struct packed_git *p, const unsigned char *sha1)
>  {
> unsigned i;
> +   const unsigned hashsz = the_hash_algo->rawsz;
> for (i = 0; i < p->num_bad_objects; i++)
> -   if (hasheq(sha1, p->bad_object_sha1 + GIT_SHA1_RAWSZ * i))
> +   if (hasheq(sha1, p->bad_object_sha1 + hashsz * i))
> return;
> p->bad_object_sha1 = xrealloc(p->bad_object_sha1,
>   st_mult(GIT_MAX_RAWSZ,
>   st_add(p->num_bad_objects, 1)));
> -   hashcpy(p->bad_object_sha1 + GIT_SHA1_RAWSZ * p->num_bad_objects, 
> sha1);
> +   hashcpy(p->bad_object_sha1 + hashsz * p->num_bad_objects, sha1);
> p->num_bad_objects++;
>  }
>


Re: [PATCH 05/14] pack-revindex: express constants in terms of the_hash_algo

2018-10-08 Thread Stefan Beller
> -   /* This knows the pack format -- the 20-byte trailer
> +   /* This knows the pack format -- the hash trailer
>  * follows immediately after the last object data.

While at it, fix the comment style?

With or without the nit addressed, this patch (and patches
1 and 4) are

  Reviewed-by: Stefan Beller 


Re: [PATCH 03/14] builtin/mktree: remove hard-coded constant

2018-10-08 Thread Stefan Beller
On Mon, Oct 8, 2018 at 2:57 PM brian m. carlson
 wrote:
>
> Instead of using a hard-coded constant for the size of a hex object ID,
> switch to use the computed pointer from parse_oid_hex that points after
> the parsed object ID.
>
> Signed-off-by: brian m. carlson 
> ---
>  builtin/mktree.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/builtin/mktree.c b/builtin/mktree.c
> index 2dc4ad6ba8..94e82b8504 100644
> --- a/builtin/mktree.c
> +++ b/builtin/mktree.c
> @@ -98,7 +98,7 @@ static void mktree_line(char *buf, size_t len, int 
> nul_term_line, int allow_miss
>
> *ntr++ = 0; /* now at the beginning of SHA1 */
>
> -   path = ntr + 41;  /* at the beginning of name */
> +   path = (char *)p + 1;  /* at the beginning of name */

... and we need the cast to un-const p such that path takes it.
Makes sense.


Re: [PATCH 02/14] builtin/repack: replace hard-coded constant

2018-10-08 Thread Stefan Beller
On Mon, Oct 8, 2018 at 2:57 PM brian m. carlson
 wrote:
>
> Signed-off-by: brian m. carlson 
> ---
>  builtin/repack.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/repack.c b/builtin/repack.c
> index c6a7943d5c..e77859062d 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -407,8 +407,8 @@ int cmd_repack(int argc, const char **argv, const char 
> *prefix)
>
> out = xfdopen(cmd.out, "r");
> while (strbuf_getline_lf(, out) != EOF) {
> -   if (line.len != 40)
> -   die("repack: Expecting 40 character sha1 lines only 
> from pack-objects.");
> +   if (line.len != the_hash_algo->hexsz)
> +   die("repack: Expecting full hex object ID lines only 
> from pack-objects.");

This is untranslated as it is plumbing?
If so, maybe

if (is_sha1(the_hash_algo)
die("repack: Expecting 40 character sh...
else
die(repack: Expecting full hex object ID in %s, of length %d",
the_hash_algo->name,
the_hash_algo->hexsz);


Re: [PATCH v3 0/2] EditorConfig file

2018-10-08 Thread Taylor Blau
On Mon, Oct 08, 2018 at 10:03:51PM +, brian m. carlson wrote:
> This series introduces an EditorConfig file to help developers using any
> editor set their editor's settings in conformance with the Git Project's
> settings.  This is helpful for developers who work on different projects
> with different indentation standards to keep their work in sync.
>
> Changes since v2:
> * Add .pl and .pm files.
>
> Changes since v1:
> * Add notes to both .editorconfig and .clang-format that they should be
>   kept in sync.
> * Add commit message line length.

Thanks for both of these. I think that v3 is ready for queueing, if
other folks find it OK to have an .editorconfig in the repository.

Therefore:

  Reviewed-by: Taylor Blau 

Thanks,
Taylor


Re: [PATCH v6 09/10] submodule: support reading .gitmodules when it's not in the working tree

2018-10-08 Thread Stefan Beller
> +test_expect_success 'not writing gitmodules config file when it is not 
> checked out' '
> +test_must_fail git -C super submodule--helper config 
> submodule.submodule.url newurl

This only checks the exit code, do we also want to check for

test_path_is_missing .gitmodules ?

> +test_expect_success 'initialising submodule when the gitmodules config is 
> not checked out' '
> +   git -C super submodule init
> +'
> +
> +test_expect_success 'showing submodule summary when the gitmodules config is 
> not checked out' '
> +   git -C super submodule summary
> +'

Same for these, is the exit code enough, or do we want to look at
specific things?

> +
> +test_expect_success 'updating submodule when the gitmodules config is not 
> checked out' '
> +   (cd submodule &&
> +   echo file2 >file2 &&
> +   git add file2 &&
> +   git commit -m "add file2 to submodule"
> +   ) &&
> +   git -C super submodule update

git status would want to be clean afterwards?


[PATCH v3 0/2] EditorConfig file

2018-10-08 Thread brian m. carlson
This series introduces an EditorConfig file to help developers using any
editor set their editor's settings in conformance with the Git Project's
settings.  This is helpful for developers who work on different projects
with different indentation standards to keep their work in sync.

Changes since v2:
* Add .pl and .pm files.

Changes since v1:
* Add notes to both .editorconfig and .clang-format that they should be
  kept in sync.
* Add commit message line length.

brian m. carlson (2):
  editorconfig: provide editor settings for Git developers
  editorconfig: indicate settings should be kept in sync

 .clang-format |  2 ++
 .editorconfig | 16 
 2 files changed, 18 insertions(+)
 create mode 100644 .editorconfig



[PATCH v3 1/2] editorconfig: provide editor settings for Git developers

2018-10-08 Thread brian m. carlson
Contributors to Git use a variety of editors, each with their own
configuration files.  Because C lacks the defined norms on how to indent
and style code that other languages, such as Ruby and Rust, have, it's
possible for various contributors, especially new ones, to have
configured their editor to use a style other than the style the Git
community prefers.

To make automatically configuring one's editor easier, provide an
EditorConfig file.  This is an INI-style configuration file that can be
used to specify editor settings and can be understood by a wide variety
of editors.  Some editors include this support natively; others require
a plugin.  Regardless, providing such a file allows users to
automatically configure their editor of choice with the correct settings
by default.

Provide global settings to set the character set to UTF-8 and insert a
final newline into files.  Provide language-specific settings for C,
Shell, Perl, and Python files according to what CodingGuidelines already
specifies.  Since the indentation of other files varies, especially
certain AsciiDoc files, don't provide any settings for them until a
clear consensus forward emerges.

Set the line length for commit messages to 72 characters, which is the
generally accepted line length for emails, since we send patches by
email.

Don't specify an end of line type.  While the Git community uses
Unix-style line endings in the repository, some Windows users may use
Git's auto-conversion support and forcing Unix-style line endings might
cause problems for those users.

Finally, leave out a root directive, which would prevent reading other
EditorConfig files higher up in the tree, in case someone wants to set
the end of line type for their system in such a file.

Signed-off-by: brian m. carlson 
---
 .editorconfig | 14 ++
 1 file changed, 14 insertions(+)
 create mode 100644 .editorconfig

diff --git a/.editorconfig b/.editorconfig
new file mode 100644
index 00..8667740e1d
--- /dev/null
+++ b/.editorconfig
@@ -0,0 +1,14 @@
+[*]
+charset = utf-8
+insert_final_newline = true
+
+[*.{c,h,sh,perl,pl,pm}]
+indent_style = tab
+tab_width = 8
+
+[*.py]
+indent_style = space
+indent_size = 4
+
+[COMMIT_EDITMSG]
+max_line_length = 72


[PATCH v3 2/2] editorconfig: indicate settings should be kept in sync

2018-10-08 Thread brian m. carlson
Now that we have two places where we set code formatting settings,
.editorconfig and .clang-format, it's possible that they could fall out
of sync.  This is relatively unlikely, since we're not likely to change
the tab width or our preference for tabs, but just in case, add comments
to both files reminding future developers to keep them in sync.

Signed-off-by: brian m. carlson 
---
 .clang-format | 2 ++
 .editorconfig | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/.clang-format b/.clang-format
index 12a89f95f9..de1c8b5c77 100644
--- a/.clang-format
+++ b/.clang-format
@@ -6,6 +6,8 @@
 
 # Use tabs whenever we need to fill whitespace that spans at least from one tab
 # stop to the next one.
+#
+# These settings are mirrored in .editorconfig.  Keep them in sync.
 UseTab: Always
 TabWidth: 8
 IndentWidth: 8
diff --git a/.editorconfig b/.editorconfig
index 8667740e1d..42cdc4bbfb 100644
--- a/.editorconfig
+++ b/.editorconfig
@@ -2,6 +2,8 @@
 charset = utf-8
 insert_final_newline = true
 
+# The settings for C (*.c and *.h) files are mirrored in .clang-format.  Keep
+# them in sync.
 [*.{c,h,sh,perl,pl,pm}]
 indent_style = tab
 tab_width = 8


[PATCH 11/14] apply: replace hard-coded constants

2018-10-08 Thread brian m. carlson
Replace several 40-based constants with references to GIT_MAX_HEXSZ or
the_hash_algo, as appropriate.

Signed-off-by: brian m. carlson 
---
 apply.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/apply.c b/apply.c
index e485fbc6bc..792ecea36a 100644
--- a/apply.c
+++ b/apply.c
@@ -223,8 +223,8 @@ struct patch {
struct fragment *fragments;
char *result;
size_t resultsize;
-   char old_sha1_prefix[41];
-   char new_sha1_prefix[41];
+   char old_sha1_prefix[GIT_MAX_HEXSZ + 1];
+   char new_sha1_prefix[GIT_MAX_HEXSZ + 1];
struct patch *next;
 
/* three-way fallback result */
@@ -1093,9 +1093,10 @@ static int gitdiff_index(struct apply_state *state,
 */
const char *ptr, *eol;
int len;
+   const unsigned hexsz = the_hash_algo->hexsz;
 
ptr = strchr(line, '.');
-   if (!ptr || ptr[1] != '.' || 40 < ptr - line)
+   if (!ptr || ptr[1] != '.' || hexsz < ptr - line)
return 0;
len = ptr - line;
memcpy(patch->old_sha1_prefix, line, len);
@@ -1109,7 +1110,7 @@ static int gitdiff_index(struct apply_state *state,
ptr = eol;
len = ptr - line;
 
-   if (40 < len)
+   if (hexsz < len)
return 0;
memcpy(patch->new_sha1_prefix, line, len);
patch->new_sha1_prefix[len] = 0;
@@ -3142,13 +3143,14 @@ static int apply_binary(struct apply_state *state,
 {
const char *name = patch->old_name ? patch->old_name : patch->new_name;
struct object_id oid;
+   const unsigned hexsz = the_hash_algo->hexsz;
 
/*
 * For safety, we require patch index line to contain
-* full 40-byte textual SHA1 for old and new, at least for now.
+* full hex textual object ID for old and new, at least for now.
 */
-   if (strlen(patch->old_sha1_prefix) != 40 ||
-   strlen(patch->new_sha1_prefix) != 40 ||
+   if (strlen(patch->old_sha1_prefix) != hexsz ||
+   strlen(patch->new_sha1_prefix) != hexsz ||
get_oid_hex(patch->old_sha1_prefix, ) ||
get_oid_hex(patch->new_sha1_prefix, ))
return error(_("cannot apply binary patch to '%s' "
@@ -4055,7 +4057,7 @@ static int preimage_oid_in_gitlink_patch(struct patch *p, 
struct object_id *oid)
starts_with(++preimage, heading) &&
/* does it record full SHA-1? */
!get_oid_hex(preimage + sizeof(heading) - 1, oid) &&
-   preimage[sizeof(heading) + GIT_SHA1_HEXSZ - 1] == '\n' &&
+   preimage[sizeof(heading) + the_hash_algo->hexsz - 1] == '\n' &&
/* does the abbreviated name on the index line agree with it? */
starts_with(preimage + sizeof(heading) - 1, p->old_sha1_prefix))
return 0; /* it all looks fine */


[PATCH 07/14] refs/packed-backend: express constants using the_hash_algo

2018-10-08 Thread brian m. carlson
Switch uses of GIT_SHA1_HEXSZ to use the_hash_algo so that they are
appropriate for the any given hash length.

Signed-off-by: brian m. carlson 
---
 refs/packed-backend.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 74e2996e93..c01c7f5901 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -274,8 +274,8 @@ struct snapshot_record {
 static int cmp_packed_ref_records(const void *v1, const void *v2)
 {
const struct snapshot_record *e1 = v1, *e2 = v2;
-   const char *r1 = e1->start + GIT_SHA1_HEXSZ + 1;
-   const char *r2 = e2->start + GIT_SHA1_HEXSZ + 1;
+   const char *r1 = e1->start + the_hash_algo->hexsz + 1;
+   const char *r2 = e2->start + the_hash_algo->hexsz + 1;
 
while (1) {
if (*r1 == '\n')
@@ -297,7 +297,7 @@ static int cmp_packed_ref_records(const void *v1, const 
void *v2)
  */
 static int cmp_record_to_refname(const char *rec, const char *refname)
 {
-   const char *r1 = rec + GIT_SHA1_HEXSZ + 1;
+   const char *r1 = rec + the_hash_algo->hexsz + 1;
const char *r2 = refname;
 
while (1) {
@@ -344,7 +344,7 @@ static void sort_snapshot(struct snapshot *snapshot)
if (!eol)
/* The safety check should prevent this. */
BUG("unterminated line found in packed-refs");
-   if (eol - pos < GIT_SHA1_HEXSZ + 2)
+   if (eol - pos < the_hash_algo->hexsz + 2)
die_invalid_line(snapshot->refs->path,
 pos, eof - pos);
eol++;
@@ -456,7 +456,7 @@ static void verify_buffer_safe(struct snapshot *snapshot)
return;
 
last_line = find_start_of_record(start, eof - 1);
-   if (*(eof - 1) != '\n' || eof - last_line < GIT_SHA1_HEXSZ + 2)
+   if (*(eof - 1) != '\n' || eof - last_line < the_hash_algo->hexsz + 2)
die_invalid_line(snapshot->refs->path,
 last_line, eof - last_line);
 }
@@ -796,7 +796,7 @@ static int next_record(struct packed_ref_iterator *iter)
 
iter->base.flags = REF_ISPACKED;
 
-   if (iter->eof - p < GIT_SHA1_HEXSZ + 2 ||
+   if (iter->eof - p < the_hash_algo->hexsz + 2 ||
parse_oid_hex(p, >oid, ) ||
!isspace(*p++))
die_invalid_line(iter->snapshot->refs->path,
@@ -826,7 +826,7 @@ static int next_record(struct packed_ref_iterator *iter)
 
if (iter->pos < iter->eof && *iter->pos == '^') {
p = iter->pos + 1;
-   if (iter->eof - p < GIT_SHA1_HEXSZ + 1 ||
+   if (iter->eof - p < the_hash_algo->hexsz + 1 ||
parse_oid_hex(p, >peeled, ) ||
*p++ != '\n')
die_invalid_line(iter->snapshot->refs->path,


[PATCH 05/14] pack-revindex: express constants in terms of the_hash_algo

2018-10-08 Thread brian m. carlson
Express the various constants used in terms of the_hash_algo.

Signed-off-by: brian m. carlson 
---
 pack-revindex.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/pack-revindex.c b/pack-revindex.c
index bb521cf7fb..3756ec71a8 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -122,13 +122,14 @@ static void create_pack_revindex(struct packed_git *p)
unsigned num_ent = p->num_objects;
unsigned i;
const char *index = p->index_data;
+   const unsigned hashsz = the_hash_algo->rawsz;
 
ALLOC_ARRAY(p->revindex, num_ent + 1);
index += 4 * 256;
 
if (p->index_version > 1) {
const uint32_t *off_32 =
-   (uint32_t *)(index + 8 + p->num_objects * (20 + 4));
+   (uint32_t *)(index + 8 + p->num_objects * (hashsz + 4));
const uint32_t *off_64 = off_32 + p->num_objects;
for (i = 0; i < num_ent; i++) {
uint32_t off = ntohl(*off_32++);
@@ -142,16 +143,16 @@ static void create_pack_revindex(struct packed_git *p)
}
} else {
for (i = 0; i < num_ent; i++) {
-   uint32_t hl = *((uint32_t *)(index + 24 * i));
+   uint32_t hl = *((uint32_t *)(index + (hashsz + 4) * i));
p->revindex[i].offset = ntohl(hl);
p->revindex[i].nr = i;
}
}
 
-   /* This knows the pack format -- the 20-byte trailer
+   /* This knows the pack format -- the hash trailer
 * follows immediately after the last object data.
 */
-   p->revindex[num_ent].offset = p->pack_size - 20;
+   p->revindex[num_ent].offset = p->pack_size - hashsz;
p->revindex[num_ent].nr = -1;
sort_revindex(p->revindex, num_ent, p->pack_size);
 }


[PATCH 12/14] apply: rename new_sha1_prefix and old_sha1_prefix

2018-10-08 Thread brian m. carlson
Rename these structure members to "new_oid_prefix" and "old_oid_prefix".

Signed-off-by: brian m. carlson 
---
 apply.c | 40 
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/apply.c b/apply.c
index 792ecea36a..b9eb02ec12 100644
--- a/apply.c
+++ b/apply.c
@@ -223,8 +223,8 @@ struct patch {
struct fragment *fragments;
char *result;
size_t resultsize;
-   char old_sha1_prefix[GIT_MAX_HEXSZ + 1];
-   char new_sha1_prefix[GIT_MAX_HEXSZ + 1];
+   char old_oid_prefix[GIT_MAX_HEXSZ + 1];
+   char new_oid_prefix[GIT_MAX_HEXSZ + 1];
struct patch *next;
 
/* three-way fallback result */
@@ -1099,8 +1099,8 @@ static int gitdiff_index(struct apply_state *state,
if (!ptr || ptr[1] != '.' || hexsz < ptr - line)
return 0;
len = ptr - line;
-   memcpy(patch->old_sha1_prefix, line, len);
-   patch->old_sha1_prefix[len] = 0;
+   memcpy(patch->old_oid_prefix, line, len);
+   patch->old_oid_prefix[len] = 0;
 
line = ptr + 2;
ptr = strchr(line, ' ');
@@ -1112,8 +1112,8 @@ static int gitdiff_index(struct apply_state *state,
 
if (hexsz < len)
return 0;
-   memcpy(patch->new_sha1_prefix, line, len);
-   patch->new_sha1_prefix[len] = 0;
+   memcpy(patch->new_oid_prefix, line, len);
+   patch->new_oid_prefix[len] = 0;
if (*ptr == ' ')
return gitdiff_oldmode(state, ptr + 1, patch);
return 0;
@@ -2205,7 +2205,7 @@ static void reverse_patches(struct patch *p)
SWAP(p->new_mode, p->old_mode);
SWAP(p->is_new, p->is_delete);
SWAP(p->lines_added, p->lines_deleted);
-   SWAP(p->old_sha1_prefix, p->new_sha1_prefix);
+   SWAP(p->old_oid_prefix, p->new_oid_prefix);
 
for (; frag; frag = frag->next) {
SWAP(frag->newpos, frag->oldpos);
@@ -3149,10 +3149,10 @@ static int apply_binary(struct apply_state *state,
 * For safety, we require patch index line to contain
 * full hex textual object ID for old and new, at least for now.
 */
-   if (strlen(patch->old_sha1_prefix) != hexsz ||
-   strlen(patch->new_sha1_prefix) != hexsz ||
-   get_oid_hex(patch->old_sha1_prefix, ) ||
-   get_oid_hex(patch->new_sha1_prefix, ))
+   if (strlen(patch->old_oid_prefix) != hexsz ||
+   strlen(patch->new_oid_prefix) != hexsz ||
+   get_oid_hex(patch->old_oid_prefix, ) ||
+   get_oid_hex(patch->new_oid_prefix, ))
return error(_("cannot apply binary patch to '%s' "
   "without full index line"), name);
 
@@ -3162,7 +3162,7 @@ static int apply_binary(struct apply_state *state,
 * applies to.
 */
hash_object_file(img->buf, img->len, blob_type, );
-   if (strcmp(oid_to_hex(), patch->old_sha1_prefix))
+   if (strcmp(oid_to_hex(), patch->old_oid_prefix))
return error(_("the patch applies to '%s' (%s), "
   "which does not match the "
   "current contents."),
@@ -3175,7 +3175,7 @@ static int apply_binary(struct apply_state *state,
   "'%s' but it is not empty"), name);
}
 
-   get_oid_hex(patch->new_sha1_prefix, );
+   get_oid_hex(patch->new_oid_prefix, );
if (is_null_oid()) {
clear_image(img);
return 0; /* deletion patch */
@@ -3191,7 +3191,7 @@ static int apply_binary(struct apply_state *state,
if (!result)
return error(_("the necessary postimage %s for "
   "'%s' cannot be read"),
-patch->new_sha1_prefix, name);
+patch->new_oid_prefix, name);
clear_image(img);
img->buf = result;
img->len = size;
@@ -3207,9 +3207,9 @@ static int apply_binary(struct apply_state *state,
 
/* verify that the result matches */
hash_object_file(img->buf, img->len, blob_type, );
-   if (strcmp(oid_to_hex(), patch->new_sha1_prefix))
+   if (strcmp(oid_to_hex(), patch->new_oid_prefix))
return error(_("binary patch to '%s' creates incorrect 
result (expecting %s, got %s)"),
-   name, patch->new_sha1_prefix, oid_to_hex());
+   name, patch->new_oid_prefix, oid_to_hex());
}
 
return 0;
@@ -3565,7 +3565,7 @@ static int try_threeway(struct apply_state *state,
/* Preimage the patch was prepared for */
if (patch->is_new)
write_object_file("", 0, blob_type, _oid);
-   else if 

[PATCH 13/14] submodule: make zero-oid comparison hash function agnostic

2018-10-08 Thread brian m. carlson
With SHA-256, the length of the all-zeros object ID is longer.  Add a
function to git-submodule.sh to check if a full hex object ID is the
all-zeros value, and use it to check the output we're parsing from git
diff-files or diff-index.

Signed-off-by: brian m. carlson 
---
 git-submodule.sh | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/git-submodule.sh b/git-submodule.sh
index 1b568e29b9..c09eb3e03d 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -82,6 +82,11 @@ isnumber()
n=$(($1 + 0)) 2>/dev/null && test "$n" = "$1"
 }
 
+# Given a full hex object ID, is this the zero OID?
+is_zero_oid () {
+   echo "$1" | sane_egrep '^0+$' >/dev/null 2>&1
+}
+
 # Sanitize the local git environment for use within a submodule. We
 # can't simply use clear_local_git_env since we want to preserve some
 # of the settings from GIT_CONFIG_PARAMETERS.
@@ -780,7 +785,7 @@ cmd_summary() {
while read -r mod_src mod_dst sha1_src sha1_dst status name
do
if test -z "$cached" &&
-   test $sha1_dst = 

+   is_zero_oid $sha1_dst
then
case "$mod_dst" in
16)


[PATCH 14/14] rerere: convert to use the_hash_algo

2018-10-08 Thread brian m. carlson
Since this data is stored in the .git directory, it makes sense for us
to use the same hash algorithm for it as for everything else.  Convert
the remaining uses of SHA-1 to use the_hash_algo.  Use GIT_MAX_RAWSZ for
allocations.  Rename various struct members, local variables, and a
function to be named "hash" instead of "sha1".

Signed-off-by: brian m. carlson 
---
 rerere.c | 81 +---
 1 file changed, 42 insertions(+), 39 deletions(-)

diff --git a/rerere.c b/rerere.c
index 7aa149e849..ceb98015ff 100644
--- a/rerere.c
+++ b/rerere.c
@@ -29,7 +29,7 @@ static int rerere_dir_alloc;
 #define RR_HAS_POSTIMAGE 1
 #define RR_HAS_PREIMAGE 2
 static struct rerere_dir {
-   unsigned char sha1[20];
+   unsigned char hash[GIT_MAX_HEXSZ];
int status_alloc, status_nr;
unsigned char *status;
 } **rerere_dir;
@@ -52,7 +52,7 @@ static void free_rerere_id(struct string_list_item *item)
 
 static const char *rerere_id_hex(const struct rerere_id *id)
 {
-   return sha1_to_hex(id->collection->sha1);
+   return sha1_to_hex(id->collection->hash);
 }
 
 static void fit_variant(struct rerere_dir *rr_dir, int variant)
@@ -115,7 +115,7 @@ static int is_rr_file(const char *name, const char 
*filename, int *variant)
 static void scan_rerere_dir(struct rerere_dir *rr_dir)
 {
struct dirent *de;
-   DIR *dir = opendir(git_path("rr-cache/%s", sha1_to_hex(rr_dir->sha1)));
+   DIR *dir = opendir(git_path("rr-cache/%s", sha1_to_hex(rr_dir->hash)));
 
if (!dir)
return;
@@ -133,24 +133,24 @@ static void scan_rerere_dir(struct rerere_dir *rr_dir)
closedir(dir);
 }
 
-static const unsigned char *rerere_dir_sha1(size_t i, void *table)
+static const unsigned char *rerere_dir_hash(size_t i, void *table)
 {
struct rerere_dir **rr_dir = table;
-   return rr_dir[i]->sha1;
+   return rr_dir[i]->hash;
 }
 
 static struct rerere_dir *find_rerere_dir(const char *hex)
 {
-   unsigned char sha1[20];
+   unsigned char hash[GIT_MAX_RAWSZ];
struct rerere_dir *rr_dir;
int pos;
 
-   if (get_sha1_hex(hex, sha1))
+   if (get_sha1_hex(hex, hash))
return NULL; /* BUG */
-   pos = sha1_pos(sha1, rerere_dir, rerere_dir_nr, rerere_dir_sha1);
+   pos = sha1_pos(hash, rerere_dir, rerere_dir_nr, rerere_dir_hash);
if (pos < 0) {
rr_dir = xmalloc(sizeof(*rr_dir));
-   hashcpy(rr_dir->sha1, sha1);
+   hashcpy(rr_dir->hash, hash);
rr_dir->status = NULL;
rr_dir->status_nr = 0;
rr_dir->status_alloc = 0;
@@ -207,26 +207,27 @@ static void read_rr(struct string_list *rr)
return;
while (!strbuf_getwholeline(, in, '\0')) {
char *path;
-   unsigned char sha1[20];
+   unsigned char hash[GIT_MAX_RAWSZ];
struct rerere_id *id;
int variant;
+   const unsigned hexsz = the_hash_algo->hexsz;
 
/* There has to be the hash, tab, path and then NUL */
-   if (buf.len < 42 || get_sha1_hex(buf.buf, sha1))
+   if (buf.len < hexsz + 2 || get_sha1_hex(buf.buf, hash))
die(_("corrupt MERGE_RR"));
 
-   if (buf.buf[40] != '.') {
+   if (buf.buf[hexsz] != '.') {
variant = 0;
-   path = buf.buf + 40;
+   path = buf.buf + hexsz;
} else {
errno = 0;
-   variant = strtol(buf.buf + 41, , 10);
+   variant = strtol(buf.buf + hexsz + 1, , 10);
if (errno)
die(_("corrupt MERGE_RR"));
}
if (*(path++) != '\t')
die(_("corrupt MERGE_RR"));
-   buf.buf[40] = '\0';
+   buf.buf[hexsz] = '\0';
id = new_rerere_id_hex(buf.buf);
id->variant = variant;
string_list_insert(rr, path)->util = id;
@@ -360,7 +361,7 @@ static void rerere_strbuf_putconflict(struct strbuf *buf, 
int ch, size_t size)
 }
 
 static int handle_conflict(struct strbuf *out, struct rerere_io *io,
-  int marker_size, git_SHA_CTX *ctx)
+  int marker_size, git_hash_ctx *ctx)
 {
enum {
RR_SIDE_1 = 0, RR_SIDE_2, RR_ORIGINAL
@@ -398,10 +399,12 @@ static int handle_conflict(struct strbuf *out, struct 
rerere_io *io,
strbuf_addbuf(out, );
rerere_strbuf_putconflict(out, '>', marker_size);
if (ctx) {
-   git_SHA1_Update(ctx, one.buf ? one.buf : "",
-   one.len + 1);
-   git_SHA1_Update(ctx, two.buf ? two.buf : "",
-

[PATCH 10/14] tag: express constant in terms of the_hash_algo

2018-10-08 Thread brian m. carlson
Signed-off-by: brian m. carlson 
---
 tag.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tag.c b/tag.c
index 1db663d716..7445b8f6ea 100644
--- a/tag.c
+++ b/tag.c
@@ -144,7 +144,7 @@ int parse_tag_buffer(struct repository *r, struct tag 
*item, const void *data, u
return 0;
item->object.parsed = 1;
 
-   if (size < GIT_SHA1_HEXSZ + 24)
+   if (size < the_hash_algo->hexsz + 24)
return -1;
if (memcmp("object ", bufptr, 7) || parse_oid_hex(bufptr + 7, , 
) || *bufptr++ != '\n')
return -1;


[PATCH 09/14] transport: use parse_oid_hex instead of a constant

2018-10-08 Thread brian m. carlson
Use parse_oid_hex to compute a pointer instead of using GIT_SHA1_HEXSZ.
This simplifies the code and makes it independent of the hash length.

Signed-off-by: brian m. carlson 
---
 transport.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/transport.c b/transport.c
index 1c76d64aba..44b9ddf670 100644
--- a/transport.c
+++ b/transport.c
@@ -1346,15 +1346,16 @@ static void read_alternate_refs(const char *path,
fh = xfdopen(cmd.out, "r");
while (strbuf_getline_lf(, fh) != EOF) {
struct object_id oid;
+   const char *p;
 
-   if (get_oid_hex(line.buf, ) ||
-   line.buf[GIT_SHA1_HEXSZ] != ' ') {
+   if (parse_oid_hex(line.buf, , ) ||
+   *p != ' ') {
warning(_("invalid line while parsing alternate refs: 
%s"),
line.buf);
break;
}
 
-   cb(line.buf + GIT_SHA1_HEXSZ + 1, , data);
+   cb(p + 1, , data);
}
 
fclose(fh);


[PATCH 01/14] pack-bitmap-write: use GIT_MAX_RAWSZ for allocation

2018-10-08 Thread brian m. carlson
Signed-off-by: brian m. carlson 
---
 pack-bitmap-write.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index fc82f37a02..6f0c78d6aa 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -37,7 +37,7 @@ struct bitmap_writer {
 
struct progress *progress;
int show_progress;
-   unsigned char pack_checksum[20];
+   unsigned char pack_checksum[GIT_MAX_RAWSZ];
 };
 
 static struct bitmap_writer writer;


[PATCH 04/14] builtin/fetch-pack: remove constants with parse_oid_hex

2018-10-08 Thread brian m. carlson
Instead of using GIT_SHA1_HEXSZ, use parse_oid_hex to compute a pointer
and use that in comparisons.  This is both simpler to read and works
independent of the hash length.  Update references to SHA-1 in the same
function to refer to object IDs instead.

Signed-off-by: brian m. carlson 
---
 builtin/fetch-pack.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 1a1bc63566..63e69a5801 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -16,13 +16,14 @@ static void add_sought_entry(struct ref ***sought, int *nr, 
int *alloc,
 {
struct ref *ref;
struct object_id oid;
+   const char *p;
 
-   if (!get_oid_hex(name, )) {
-   if (name[GIT_SHA1_HEXSZ] == ' ') {
-   /*  , find refname */
-   name += GIT_SHA1_HEXSZ + 1;
-   } else if (name[GIT_SHA1_HEXSZ] == '\0') {
-   ; /* , leave sha1 as name */
+   if (!parse_oid_hex(name, , )) {
+   if (*p == ' ') {
+   /*  , find refname */
+   name = p + 1;
+   } else if (*p == '\0') {
+   ; /* , leave oid as name */
} else {
/* , clear cruft from oid */
oidclr();


[PATCH 08/14] upload-pack: express constants in terms of the_hash_algo

2018-10-08 Thread brian m. carlson
Convert all uses of the GIT_SHA1_HEXSZ to use the_hash_algo so that they
are appropriate for any given hash length.

Signed-off-by: brian m. carlson 
---
 upload-pack.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/upload-pack.c b/upload-pack.c
index 62a1000f44..1aae5dd828 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -443,6 +443,7 @@ static int do_reachable_revlist(struct child_process *cmd,
struct object *o;
char namebuf[GIT_MAX_HEXSZ + 2]; /* ^ + hash + LF */
int i;
+   const unsigned hexsz = the_hash_algo->hexsz;
 
cmd->argv = argv;
cmd->git_cmd = 1;
@@ -461,7 +462,7 @@ static int do_reachable_revlist(struct child_process *cmd,
goto error;
 
namebuf[0] = '^';
-   namebuf[GIT_SHA1_HEXSZ + 1] = '\n';
+   namebuf[hexsz + 1] = '\n';
for (i = get_max_object_index(); 0 < i; ) {
o = get_indexed_object(--i);
if (!o)
@@ -470,11 +471,11 @@ static int do_reachable_revlist(struct child_process *cmd,
o->flags &= ~TMP_MARK;
if (!is_our_ref(o))
continue;
-   memcpy(namebuf + 1, oid_to_hex(>oid), GIT_SHA1_HEXSZ);
-   if (write_in_full(cmd->in, namebuf, GIT_SHA1_HEXSZ + 2) < 0)
+   memcpy(namebuf + 1, oid_to_hex(>oid), hexsz);
+   if (write_in_full(cmd->in, namebuf, hexsz + 2) < 0)
goto error;
}
-   namebuf[GIT_SHA1_HEXSZ] = '\n';
+   namebuf[hexsz] = '\n';
for (i = 0; i < src->nr; i++) {
o = src->objects[i].item;
if (is_our_ref(o)) {
@@ -484,8 +485,8 @@ static int do_reachable_revlist(struct child_process *cmd,
}
if (reachable && o->type == OBJ_COMMIT)
o->flags |= TMP_MARK;
-   memcpy(namebuf, oid_to_hex(>oid), GIT_SHA1_HEXSZ);
-   if (write_in_full(cmd->in, namebuf, GIT_SHA1_HEXSZ + 1) < 0)
+   memcpy(namebuf, oid_to_hex(>oid), hexsz);
+   if (write_in_full(cmd->in, namebuf, hexsz + 1) < 0)
goto error;
}
close(cmd->in);


[PATCH 06/14] packfile: express constants in terms of the_hash_algo

2018-10-08 Thread brian m. carlson
Replace uses of GIT_SHA1_RAWSZ with references to the_hash_algo to avoid
dependence on a particular hash length.

Signed-off-by: brian m. carlson 
---
 packfile.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/packfile.c b/packfile.c
index 841b36182f..17f993b5bf 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1121,13 +1121,14 @@ int unpack_object_header(struct packed_git *p,
 void mark_bad_packed_object(struct packed_git *p, const unsigned char *sha1)
 {
unsigned i;
+   const unsigned hashsz = the_hash_algo->rawsz;
for (i = 0; i < p->num_bad_objects; i++)
-   if (hasheq(sha1, p->bad_object_sha1 + GIT_SHA1_RAWSZ * i))
+   if (hasheq(sha1, p->bad_object_sha1 + hashsz * i))
return;
p->bad_object_sha1 = xrealloc(p->bad_object_sha1,
  st_mult(GIT_MAX_RAWSZ,
  st_add(p->num_bad_objects, 1)));
-   hashcpy(p->bad_object_sha1 + GIT_SHA1_RAWSZ * p->num_bad_objects, sha1);
+   hashcpy(p->bad_object_sha1 + hashsz * p->num_bad_objects, sha1);
p->num_bad_objects++;
 }
 


[PATCH 00/14] Hash function transition part 15

2018-10-08 Thread brian m. carlson
This is the fifteenth series in the ongoing hash function transition.

This series includes several conversions to use the_hash_algo, combined
with some use of parse_oid_hex and GIT_MAX_RAWSZ.  None of the
transformations here are especially surprising.

brian m. carlson (14):
  pack-bitmap-write: use GIT_MAX_RAWSZ for allocation
  builtin/repack: replace hard-coded constant
  builtin/mktree: remove hard-coded constant
  builtin/fetch-pack: remove constants with parse_oid_hex
  pack-revindex: express constants in terms of the_hash_algo
  packfile: express constants in terms of the_hash_algo
  refs/packed-backend: express constants using the_hash_algo
  upload-pack: express constants in terms of the_hash_algo
  transport: use parse_oid_hex instead of a constant
  tag: express constant in terms of the_hash_algo
  apply: replace hard-coded constants
  apply: rename new_sha1_prefix and old_sha1_prefix
  submodule: make zero-oid comparison hash function agnostic
  rerere: convert to use the_hash_algo

 apply.c   | 50 +-
 builtin/fetch-pack.c  | 13 +++
 builtin/mktree.c  |  2 +-
 builtin/repack.c  |  4 +--
 git-submodule.sh  |  7 +++-
 pack-bitmap-write.c   |  2 +-
 pack-revindex.c   |  9 ++---
 packfile.c|  5 +--
 refs/packed-backend.c | 14 
 rerere.c  | 81 ++-
 tag.c |  2 +-
 transport.c   |  7 ++--
 upload-pack.c | 13 +++
 13 files changed, 112 insertions(+), 97 deletions(-)



[PATCH 03/14] builtin/mktree: remove hard-coded constant

2018-10-08 Thread brian m. carlson
Instead of using a hard-coded constant for the size of a hex object ID,
switch to use the computed pointer from parse_oid_hex that points after
the parsed object ID.

Signed-off-by: brian m. carlson 
---
 builtin/mktree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/mktree.c b/builtin/mktree.c
index 2dc4ad6ba8..94e82b8504 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -98,7 +98,7 @@ static void mktree_line(char *buf, size_t len, int 
nul_term_line, int allow_miss
 
*ntr++ = 0; /* now at the beginning of SHA1 */
 
-   path = ntr + 41;  /* at the beginning of name */
+   path = (char *)p + 1;  /* at the beginning of name */
if (!nul_term_line && path[0] == '"') {
struct strbuf p_uq = STRBUF_INIT;
if (unquote_c_style(_uq, path, NULL))


[PATCH 02/14] builtin/repack: replace hard-coded constant

2018-10-08 Thread brian m. carlson
Signed-off-by: brian m. carlson 
---
 builtin/repack.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index c6a7943d5c..e77859062d 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -407,8 +407,8 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
 
out = xfdopen(cmd.out, "r");
while (strbuf_getline_lf(, out) != EOF) {
-   if (line.len != 40)
-   die("repack: Expecting 40 character sha1 lines only 
from pack-objects.");
+   if (line.len != the_hash_algo->hexsz)
+   die("repack: Expecting full hex object ID lines only 
from pack-objects.");
string_list_append(, line.buf);
}
fclose(out);


Re: [PATCH v2 1/2] editorconfig: provide editor settings for Git developers

2018-10-08 Thread brian m. carlson
On Mon, Oct 08, 2018 at 10:53:52PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> It looks like we can add at least "pm" and "pl" to that pattern:

Sure.  I didn't think we had any of those, but you've just proven me
wrong.  I'll send a v3 shortly.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


[PATCH] unpack-trees: allow missing CE_SKIP_WORKTREE objs

2018-10-08 Thread Jonathan Tan
Whenever a sparse checkout occurs, the existence of all blobs in the
index is verified, whether or not they are included or excluded by the
.git/info/sparse-checkout specification. This degrades performance,
significantly in the case of a partial clone, because a lazy fetch
occurs whenever the existence of a missing blob is checked.

At the point of invoking cache_tree_update() in unpack_trees(),
CE_SKIP_WORKTREE is already set on all excluded blobs
(mark_new_skip_worktree() is called twice to set CE_NEW_SKIP_WORKTREE,
then apply_sparse_checkout() is called which copies over
CE_NEW_SKIP_WORKTREE to CE_SKIP_WORKTREE). cache_tree_update() can use
this information to know which blobs are excluded, and thus skip the
checking of these.

Because cache_tree_update() is used from multiple places, this new
behavior is guarded by a new flag WRITE_TREE_SKIP_WORKTREE_MISSING_OK.
Implement this new flag, and teach unpack_trees() to invoke
cache_tree_update() with this new flag.

Signed-off-by: Jonathan Tan 
---
 cache-tree.c |  6 +-
 cache-tree.h |  4 
 t/t1090-sparse-checkout-scope.sh | 33 
 unpack-trees.c   |  1 +
 4 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/cache-tree.c b/cache-tree.c
index 5ce51468f0..340caf2d34 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -246,6 +246,8 @@ static int update_one(struct cache_tree *it,
int missing_ok = flags & WRITE_TREE_MISSING_OK;
int dryrun = flags & WRITE_TREE_DRY_RUN;
int repair = flags & WRITE_TREE_REPAIR;
+   int skip_worktree_missing_ok =
+   flags & WRITE_TREE_SKIP_WORKTREE_MISSING_OK;
int to_invalidate = 0;
int i;
 
@@ -356,7 +358,9 @@ static int update_one(struct cache_tree *it,
}
 
if (is_null_oid(oid) ||
-   (mode != S_IFGITLINK && !missing_ok && 
!has_object_file(oid))) {
+   (mode != S_IFGITLINK && !missing_ok &&
+!(skip_worktree_missing_ok && ce_skip_worktree(ce)) &&
+!has_object_file(oid))) {
strbuf_release();
if (expected_missing)
return -1;
diff --git a/cache-tree.h b/cache-tree.h
index 0ab6784ffe..655d487619 100644
--- a/cache-tree.h
+++ b/cache-tree.h
@@ -40,6 +40,10 @@ void cache_tree_verify(struct index_state *);
 #define WRITE_TREE_DRY_RUN 4
 #define WRITE_TREE_SILENT 8
 #define WRITE_TREE_REPAIR 16
+/*
+ * Do not check for the presence of cache entry objects with CE_SKIP_WORKTREE.
+ */
+#define WRITE_TREE_SKIP_WORKTREE_MISSING_OK 32
 
 /* error return codes */
 #define WRITE_TREE_UNREADABLE_INDEX (-1)
diff --git a/t/t1090-sparse-checkout-scope.sh b/t/t1090-sparse-checkout-scope.sh
index 25d7c700f6..090b7fc3d3 100755
--- a/t/t1090-sparse-checkout-scope.sh
+++ b/t/t1090-sparse-checkout-scope.sh
@@ -63,4 +63,37 @@ test_expect_success 'return to full checkout of master' '
test "$(cat b)" = "modified"
 '
 
+test_expect_success 'in partial clone, sparse checkout only fetches needed 
blobs' '
+   test_create_repo server &&
+   git clone "file://$(pwd)/server" client &&
+
+   test_config -C server uploadpack.allowfilter 1 &&
+   test_config -C server uploadpack.allowanysha1inwant 1 &&
+   echo a >server/a &&
+   echo bb >server/b &&
+   mkdir server/c &&
+   echo ccc >server/c/c &&
+   git -C server add a b c/c &&
+   git -C server commit -m message &&
+
+   test_config -C client core.sparsecheckout 1 &&
+   test_config -C client extensions.partialclone origin &&
+   echo "!/*" >client/.git/info/sparse-checkout &&
+   echo "/a" >>client/.git/info/sparse-checkout &&
+   git -C client fetch --filter=blob:none origin &&
+   git -C client checkout FETCH_HEAD &&
+
+   git -C client rev-list HEAD \
+   --quiet --objects --missing=print >unsorted_actual &&
+   (
+   printf "?" &&
+   git hash-object server/b &&
+   printf "?" &&
+   git hash-object server/c/c
+   ) >unsorted_expect &&
+   sort unsorted_actual >actual &&
+   sort unsorted_expect >expect &&
+   test_cmp expect actual
+'
+
 test_done
diff --git a/unpack-trees.c b/unpack-trees.c
index 51bfac6aa0..39e0e7a6c7 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1635,6 +1635,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, 
struct unpack_trees_options
o->result.cache_tree = cache_tree();
if (!cache_tree_fully_valid(o->result.cache_tree))
cache_tree_update(>result,
+ 
WRITE_TREE_SKIP_WORKTREE_MISSING_OK |
  WRITE_TREE_SILENT |
  WRITE_TREE_REPAIR);
}
-- 

Re: [PATCH v2 1/2] editorconfig: provide editor settings for Git developers

2018-10-08 Thread Ævar Arnfjörð Bjarmason


On Mon, Oct 08 2018, brian m. carlson wrote:

> Contributors to Git use a variety of editors, each with their own
> configuration files.  Because C lacks the defined norms on how to indent
> and style code that other languages, such as Ruby and Rust, have, it's
> possible for various contributors, especially new ones, to have
> configured their editor to use a style other than the style the Git
> community prefers.
>
> To make automatically configuring one's editor easier, provide an
> EditorConfig file.  This is an INI-style configuration file that can be
> used to specify editor settings and can be understood by a wide variety
> of editors.  Some editors include this support natively; others require
> a plugin.  Regardless, providing such a file allows users to
> automatically configure their editor of choice with the correct settings
> by default.
>
> Provide global settings to set the character set to UTF-8 and insert a
> final newline into files.  Provide language-specific settings for C,
> Shell, Perl, and Python files according to what CodingGuidelines already
> specifies.  Since the indentation of other files varies, especially
> certain AsciiDoc files, don't provide any settings for them until a
> clear consensus forward emerges.
>
> Set the line length for commit messages to 72 characters, which is the
> generally accepted line length for emails, since we send patches by
> email.
>
> Don't specify an end of line type.  While the Git community uses
> Unix-style line endings in the repository, some Windows users may use
> Git's auto-conversion support and forcing Unix-style line endings might
> cause problems for those users.
>
> Finally, leave out a root directive, which would prevent reading other
> EditorConfig files higher up in the tree, in case someone wants to set
> the end of line type for their system in such a file.
>
> Signed-off-by: brian m. carlson 
> ---
>  .editorconfig | 14 ++
>  1 file changed, 14 insertions(+)
>  create mode 100644 .editorconfig
>
> diff --git a/.editorconfig b/.editorconfig
> new file mode 100644
> index 00..83227fa0b2
> --- /dev/null
> +++ b/.editorconfig
> @@ -0,0 +1,14 @@
> +[*]
> +charset = utf-8
> +insert_final_newline = true
> +
> +[*.{c,h,sh,perl}]
> +indent_style = tab
> +tab_width = 8

It looks like we can add at least "pm" and "pl" to that pattern:

$ git ls-files|grep -E -v -e '\.(c|h|sh|perl)$' | grep -F .| sed 
's/.*\.//'|sort|uniq -c|sort -nr|head -n 15
631 txt
 56 expect
 48 po
 41 test
 40 tcl
 34 gitignore
 24 pm
 18 patch
 18 diff
 16 pl
 15 side
 14 gitattributes
 12 dump
 11 sample
  9 master

> +[*.py]
> +indent_style = space
> +indent_size = 4
> +
> +[COMMIT_EDITMSG]
> +max_line_length = 72


[PATCH v2 0/2] EditorConfig file

2018-10-08 Thread brian m. carlson
This series introduces an EditorConfig file to help developers using any
editor set their editor's settings in conformance with the Git Project's
settings.  This is helpful for developers who work on different projects
with different indentation standards to keep their work in sync.

Changes since v1:
* Add notes to both .editorconfig and .clang-format that they should be
  kept in sync.
* Add commit message line length.

brian m. carlson (2):
  editorconfig: provide editor settings for Git developers
  editorconfig: indicate settings should be kept in sync

 .clang-format |  2 ++
 .editorconfig | 16 
 2 files changed, 18 insertions(+)
 create mode 100644 .editorconfig



[PATCH v2 1/2] editorconfig: provide editor settings for Git developers

2018-10-08 Thread brian m. carlson
Contributors to Git use a variety of editors, each with their own
configuration files.  Because C lacks the defined norms on how to indent
and style code that other languages, such as Ruby and Rust, have, it's
possible for various contributors, especially new ones, to have
configured their editor to use a style other than the style the Git
community prefers.

To make automatically configuring one's editor easier, provide an
EditorConfig file.  This is an INI-style configuration file that can be
used to specify editor settings and can be understood by a wide variety
of editors.  Some editors include this support natively; others require
a plugin.  Regardless, providing such a file allows users to
automatically configure their editor of choice with the correct settings
by default.

Provide global settings to set the character set to UTF-8 and insert a
final newline into files.  Provide language-specific settings for C,
Shell, Perl, and Python files according to what CodingGuidelines already
specifies.  Since the indentation of other files varies, especially
certain AsciiDoc files, don't provide any settings for them until a
clear consensus forward emerges.

Set the line length for commit messages to 72 characters, which is the
generally accepted line length for emails, since we send patches by
email.

Don't specify an end of line type.  While the Git community uses
Unix-style line endings in the repository, some Windows users may use
Git's auto-conversion support and forcing Unix-style line endings might
cause problems for those users.

Finally, leave out a root directive, which would prevent reading other
EditorConfig files higher up in the tree, in case someone wants to set
the end of line type for their system in such a file.

Signed-off-by: brian m. carlson 
---
 .editorconfig | 14 ++
 1 file changed, 14 insertions(+)
 create mode 100644 .editorconfig

diff --git a/.editorconfig b/.editorconfig
new file mode 100644
index 00..83227fa0b2
--- /dev/null
+++ b/.editorconfig
@@ -0,0 +1,14 @@
+[*]
+charset = utf-8
+insert_final_newline = true
+
+[*.{c,h,sh,perl}]
+indent_style = tab
+tab_width = 8
+
+[*.py]
+indent_style = space
+indent_size = 4
+
+[COMMIT_EDITMSG]
+max_line_length = 72


[PATCH v2 2/2] editorconfig: indicate settings should be kept in sync

2018-10-08 Thread brian m. carlson
Now that we have two places where we set code formatting settings,
.editorconfig and .clang-format, it's possible that they could fall out
of sync.  This is relatively unlikely, since we're not likely to change
the tab width or our preference for tabs, but just in case, add comments
to both files reminding future developers to keep them in sync.

Signed-off-by: brian m. carlson 
---
 .clang-format | 2 ++
 .editorconfig | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/.clang-format b/.clang-format
index 12a89f95f9..de1c8b5c77 100644
--- a/.clang-format
+++ b/.clang-format
@@ -6,6 +6,8 @@
 
 # Use tabs whenever we need to fill whitespace that spans at least from one tab
 # stop to the next one.
+#
+# These settings are mirrored in .editorconfig.  Keep them in sync.
 UseTab: Always
 TabWidth: 8
 IndentWidth: 8
diff --git a/.editorconfig b/.editorconfig
index 83227fa0b2..98d7c5ff99 100644
--- a/.editorconfig
+++ b/.editorconfig
@@ -2,6 +2,8 @@
 charset = utf-8
 insert_final_newline = true
 
+# The settings for C (*.c and *.h) files are mirrored in .clang-format.  Keep
+# them in sync.
 [*.{c,h,sh,perl}]
 indent_style = tab
 tab_width = 8


Re: What's so special about objects/17/ ?

2018-10-08 Thread Stefan Beller
On Sun, Oct 7, 2018 at 1:07 PM Junio C Hamano  wrote:
>
> Junio C Hamano  writes:

> > ...
> > by general public and I do not have to explain the choice to the
> > general public ;-)
>
> One thing that is more important than "why not 00 but 17?" to answer
> is why a hardcoded number rather than a runtime random.  It is for
> repeatability.

Let's talk about repeatability vs statistics for a second. ;-)

If I am a user and I were really into optimizing my loose object count
for some reason, so I would want to choose a low number of
gc.auto. Let's say I go with 128.

At the low end of loose objects the approximation is yielding
some high relative errors. This is because of the granularity, i.e.
gc would implicitly estimate the loose objects to be 0 or 256 or 512, (or more)
if there is 0, 1, 2 (or more) loose objects in the objects/17.

As each object can be viewed following an unfair coin flip
(With a chance of 1/256 it is in objects/17), the distribution in
objects/17 (and hence any other objects/XX bin) follows the
Bernoulli distribution.

If I do have say about 157 loose objects (and having auto.gc
configured anywhere in 1..255), then the probability to not
gc is 54% (as that is the probability to have 0 objects in /17,
following probability mass function of the Bernoulli distribution,
(i.e. Pr(0 objects) = (157 over 0) x (1/256)^0 x (255/256)^157))

As it is repeatable (by picking the same /17 every time), I can run
"gc --auto" multiple times and still have 157 loose objects, despite
wanting to have only 128 loose objects at a 54% chance.

If we'd roll the 256 dice every time to pick a different bin,
then we might hit another bin and gc in the second or third
gc, which would be more precise on average.

By having repeatability we allow for these numbers to be far off
more often when configuring small numbers.

I think that is the right choice, as we probably do not care about the
exactness of auto-gc for small numbers, as it is a performance
thing anyway. Although documenting it properly might be a challenge.

The current wording of auto.gc seems to suggest that we are right
for the number as we compute it via the implying the expected value,
(i.e. we pick a bin and multiply the fullness of the bin by the number
of bins to estimate the whole fullness, see the mean=n p on [1])
I think a user would be far more interested in giving an upper bound,
i.e. expressing something like "I will have at most $auto.gc objects
before gc kicks in" or "The likelihood to exceed the $auto.gc number
of loose objects by $this much is less than 5%", for which the math
would be more complicated, but easier to document with the words of
statistics.

[1]  https://en.wikipedia.org/wiki/Binomial_distribution

Stefan


Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-08 Thread Derrick Stolee

On 10/8/2018 2:10 PM, SZEDER Gábor wrote:

On Mon, Oct 08, 2018 at 12:57:34PM -0400, Derrick Stolee wrote:

Nice! These numbers make sense to me, in terms of how many TREESAME queries
we actually need to perform for such a query.

Yeah...  because you didn't notice that I deliberately cheated :)

As it turned out, it's not just about the number of diff queries that
we can spare, but, for the speedup _ratio_, it's more about how
expensive those diff queries are.

git.git has a rather flat hierarchy, and 't/' is the 372th entry in
the current root tree object, while 'valgrind/' is the 923th entry,
and the diff machinery spends considerable time wading through the
previous entries.  Notice the "carefully chosen path" remark in my
previous email; I think this particular path has the highest number of
preceeding tree entries, and, in addition, 't/' changes rather
frequently, so the diff machinery often has to scan two relatively big
tree objects.  Had I chosen 'Documentation/RelNotes/1.5.0.1.txt'
instead, i.e. another path two directories deep, but whose leading
path components are both near the beginning of the tree objects, the
speedup would be much less impressive: 0.282s vs. 0.049s, i.e. "only"
~5.7x instead of ~24.8x.


This is expected. The performance ratio is better when the path is any 
of the following:


1. A very deep path (need to walk multiple trees to answer TREESAME)

2. An entry is late in a very wide tree (need to spend extra time 
parsing tree object)


3. The path doesn't change very often (need to inspect many TREESAME 
pairs before finding enough interesting commits)


4. Some sub-path changes often (so the TREESAME comparison needs to 
parse beyond that sub-path often)


Our standard examples (Git and Linux repos) don't have many paths that 
have these properties. But: they do exist. In other projects, this is 
actually typical. Think about Java projects that frequently have ~5 
levels of folders before actually touching a code file.


When I was implementing the Bloom filter feature for Azure Repos, I ran 
performance tests on the Linux repo using a random sampling of paths. 
The typical speedup was 5x while some outliers were in the 25x range.





But I'm afraid it will take a while until I get around to turn it into
something presentable...

Do you have the code pushed somewhere public where one could take a look? I
Do you have the code pushed somewhere public where one could take a
look? I could provide some early feedback.

Nah, definitely not...  I know full well how embarassingly broken this
implementation is, I don't need others to tell me that ;)

There are two questions that I was hoping to answer by looking at your code:

1. How do you store your Bloom filter? Is it connected to the 
commit-graph and split on a commit-by-commit basis (storing "$path" as a 
key), or is it one huge Bloom filter (storing "$commitid:$path" as key)?


2. Where does your Bloom filter check plug into the TREESAME logic? I 
haven't investigated this part, but hopefully it isn't too complicated.


Thanks,
-Stolee


Re: t3404.6 breaks on master under GIT_FSMONITOR_TEST

2018-10-08 Thread Ben Peart




On 10/8/2018 10:19 AM, Ævar Arnfjörð Bjarmason wrote:


On Thu, Sep 06 2018, Ævar Arnfjörð Bjarmason wrote:


On Thu, Feb 01 2018, Ævar Arnfjörð Bjarmason wrote:


The GIT_FSMONITOR_TEST variable allows you to roundtrip the fsmonitor
codpath in the whole test suite. On both Debian & CentOS this breaks for
me:

 (cd t && GIT_FSMONITOR_TEST=$PWD/t7519/fsmonitor-all 
./t3404-rebase-interactive.sh -i)

Whereas this works:

 (cd t && GIT_FSMONITOR_TEST=$PWD/t7519/fsmonitor-all 
GIT_SKIP_TESTS=t3404.6 ./t3404-rebase-interactive.sh -i)

The entirety of the rest of the test suite still passes with
GIT_FSMONITOR_TEST.

This has been failing ever since GIT_FSMONITOR_TEST was introduced in
883e248b8a ("fsmonitor: teach git to optionally utilize a file system
monitor to speed up detecting new or changed files.", 2017-09-22). Under
-v -x -i:

 + echo test_must_fail: command succeeded: env 
FAKE_LINES=exec_echo_foo_>file1 1 git rebase -i HEAD^
 test_must_fail: command succeeded: env FAKE_LINES=exec_echo_foo_>file1 1 
git rebase -i HEAD^
 + return 1
 error: last command exited with $?=1
 not ok 6 - rebase -i with the exec command checks tree cleanness
 #
 #   git checkout master &&
 #   set_fake_editor &&
 #   test_must_fail env FAKE_LINES="exec_echo_foo_>file1 1" git rebase 
-i HEAD^ &&

Maybe once this is fixed running the test suite under GIT_FSMONITOR_TEST
would be a useful Travis target, but I don't know the current status of
adding new options to Travis.


*Poke* at this again. Ben, or anyone else with knowledge of fsmonitor:
Can you reproduce this?

This failure along with the one I noted in
https://public-inbox.org/git/87tvn2remn@evledraar.gmail.com/ is
failing the tests on Linux when run with GIT_FSMONITOR_TEST.

I'm looking at this again because SZEDER's patches to the split index
reminded me again that we have these long-standing failures in rare test
modes (see
https://public-inbox.org/git/87va7ireuu@evledraar.gmail.com/ for the
split index discussion).


For what it's worth this is still broken, but more importantly (I'm not
just keeping bumping the same thing) the only thing that's now broken
under fsmonitor. I.e. my skip config is now GIT_SKIP_TESTS="t3404.7"
whereas before 43f1180814 ("git-mv: allow submodules and fsmonitor to
work together", 2018-09-10) I needed to add "t7411.3 t7411.4" to that.



I glanced at this for a few minutes but it wasn't obvious what was 
happening.  It will take some additional effort to dig into and figure 
out the underlying issue.  I haven't forgotten about this - it's still 
on my list, just below some other things I need to get finished up first.


Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-08 Thread SZEDER Gábor
On Mon, Oct 08, 2018 at 12:57:34PM -0400, Derrick Stolee wrote:
> On 10/8/2018 12:41 PM, SZEDER Gábor wrote:
> >On Wed, Oct 03, 2018 at 03:18:05PM -0400, Jeff King wrote:
> >>I'm still excited about the prospect of a bloom filter for paths which
> >>each commit touches. I think that's the next big frontier in getting
> >>things like "git log -- path" to a reasonable run-time.
> >There is certainly potential there.  With a (very) rough PoC
> >experiment, a 8MB bloom filter, and a carefully choosen path I can
> >achieve a nice, almost 25x speedup:
> >
> >   $ time git rev-list --count HEAD -- t/valgrind/valgrind.sh
> >   6
> >
> >   real0m1.563s
> >   user0m1.519s
> >   sys 0m0.045s
> >
> >   $ time GIT_USE_POC_BLOOM_FILTER=y ~/src/git/git rev-list --count HEAD -- 
> > t/valgrind/valgrind.sh
> >   6
> >
> >   real0m0.063s
> >   user0m0.043s
> >   sys 0m0.020s
> >
> >   bloom filter total queries: 16269 definitely not: 16195 maybe: 74 false 
> > positives: 64 fp ratio: 0.003934

> Nice! These numbers make sense to me, in terms of how many TREESAME queries
> we actually need to perform for such a query.

Yeah...  because you didn't notice that I deliberately cheated :)

As it turned out, it's not just about the number of diff queries that
we can spare, but, for the speedup _ratio_, it's more about how
expensive those diff queries are.

git.git has a rather flat hierarchy, and 't/' is the 372th entry in
the current root tree object, while 'valgrind/' is the 923th entry,
and the diff machinery spends considerable time wading through the
previous entries.  Notice the "carefully chosen path" remark in my
previous email; I think this particular path has the highest number of
preceeding tree entries, and, in addition, 't/' changes rather
frequently, so the diff machinery often has to scan two relatively big
tree objects.  Had I chosen 'Documentation/RelNotes/1.5.0.1.txt'
instead, i.e. another path two directories deep, but whose leading
path components are both near the beginning of the tree objects, the
speedup would be much less impressive: 0.282s vs. 0.049s, i.e. "only"
~5.7x instead of ~24.8x.

> >But I'm afraid it will take a while until I get around to turn it into
> >something presentable...
> Do you have the code pushed somewhere public where one could take a look? I
> Do you have the code pushed somewhere public where one could take a 
> look? I could provide some early feedback.

Nah, definitely not...  I know full well how embarassingly broken this
implementation is, I don't need others to tell me that ;)



[PATCH v5 0/4] Filter alternate references

2018-10-08 Thread Taylor Blau
Hi,

Attached is (what I anticipate to be) the final re-roll of my series to
introduce 'core.alternateRefsCommand' and 'core.alternateRefsPrefixes'
in order to limit the ".have" advertisement when pushing over protocol
v1 to a repository with configured alternates.

Not much has changed from last time, expect for:

  - Taking a documentation suggestion from Peff (in 3/4), and

  - Fixing a typo pointed out by Ramsay (in 4/4).

I believe that this series is otherwise ready for queueing, if everyone
else feels sufficiently OK about the changes.

Thanks in advance for your review.

Thanks,
Taylor

Jeff King (1):
  transport: drop refnames from for_each_alternate_ref

Taylor Blau (3):
  transport.c: extract 'fill_alternate_refs_command'
  transport.c: introduce core.alternateRefsCommand
  transport.c: introduce core.alternateRefsPrefixes

 Documentation/config.txt   | 18 +
 builtin/receive-pack.c |  3 +--
 fetch-pack.c   |  3 +--
 t/t5410-receive-pack-alternates.sh | 41 ++
 transport.c| 38 +--
 transport.h|  2 +-
 6 files changed, 92 insertions(+), 13 deletions(-)
 create mode 100755 t/t5410-receive-pack-alternates.sh

Range-diff against v4:
1:  76482a7eba = 1:  e4947f557b transport: drop refnames from 
for_each_alternate_ref
2:  120df009df = 2:  3d77a46c61 transport.c: extract 
'fill_alternate_refs_command'
3:  c63864c89a ! 3:  7451b4872a transport.c: introduce core.alternateRefsCommand
@@ -42,14 +42,9 @@
 +  hex object id per line (i.e., the same as produce by `git for-each-ref
 +  --format='%(objectname)'`).
 ++
-+This is useful when a repository only wishes to advertise some of its
-+alternate's references as `.have`'s. For example, to only advertise branch
-+heads, configure `core.alternateRefsCommand` to the path of a script 
which runs
-+`git --git-dir="$1" for-each-ref --format='%(objectname)' refs/heads`.
-++
-+Note that the configured value is executed in a shell, and thus
-+linkgit:git-for-each-ref[1] by itself does not work, as scripts have to 
handle
-+the path argument specially.
++Note that you cannot generally put `git for-each-ref` directly into the 
config
++value, as it does not take a repository path as an argument (but you can 
wrap
++the command above in a shell script).
 +
  core.bare::
If true this repository is assumed to be 'bare' and has no
4:  0f6cdc7ea4 ! 4:  28cbbe63f7 transport.c: introduce 
core.alternateRefsPrefixes
@@ -39,8 +39,8 @@
  --- a/Documentation/config.txt
  +++ b/Documentation/config.txt
 @@
- linkgit:git-for-each-ref[1] by itself does not work, as scripts have to 
handle
- the path argument specially.
+ value, as it does not take a repository path as an argument (but you can 
wrap
+ the command above in a shell script).

 +core.alternateRefsPrefixes::
 +  When listing references from an alternate, list only references that 
begin
@@ -62,7 +62,7 @@

 +test_expect_success 'with core.alternateRefsPrefixes' '
 +  test_config -C fork core.alternateRefsPrefixes "refs/heads/private" &&
-+  git rev-parse private/branch expect &&
++  git rev-parse private/branch >expect &&
 +  printf "" | git receive-pack fork >actual &&
 +  extract_haves actual.haves &&
 +  test_cmp expect actual.haves
--
2.19.0.221.g150f307af


[PATCH v5 4/4] transport.c: introduce core.alternateRefsPrefixes

2018-10-08 Thread Taylor Blau
The recently-introduced "core.alternateRefsCommand" allows callers to
specify with high flexibility the tips that they wish to advertise from
alternates. This flexibility comes at the cost of some inconvenience
when the caller only wishes to limit the advertisement to one or more
prefixes.

For example, to advertise only tags, a caller using
'core.alternateRefsCommand' would have to do:

  $ git config core.alternateRefsCommand ' \
  f() { git -C "$1" for-each-ref \
  refs/tags --format="%(objectname)" }; f "$@"'

The above is cumbersome to write, so let's introduce a
"core.alternateRefsPrefixes" to address this common case. Instead, the
caller can run:

  $ git config core.alternateRefsPrefixes 'refs/tags'

Which will behave identically to the longer example using
"core.alternateRefsCommand".

Since the value of "core.alternateRefsPrefixes" is appended to 'git
for-each-ref' and then executed, include a "--" before taking the
configured value to avoid misinterpreting arguments as flags to 'git
for-each-ref'.

In the case that the caller wishes to specify multiple prefixes, they
may separate them by whitespace. If "core.alternateRefsCommand" is set,
it will take precedence over "core.alternateRefsPrefixes".

Signed-off-by: Taylor Blau 
---
 Documentation/config.txt   | 7 +++
 t/t5410-receive-pack-alternates.sh | 8 
 transport.c| 5 +
 3 files changed, 20 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index c51e82d8a5..a133a709f3 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -627,6 +627,13 @@ Note that you cannot generally put `git for-each-ref` 
directly into the config
 value, as it does not take a repository path as an argument (but you can wrap
 the command above in a shell script).
 
+core.alternateRefsPrefixes::
+   When listing references from an alternate, list only references that 
begin
+   with the given prefix. Prefixes match as if they were given as 
arguments to
+   linkgit:git-for-each-ref[1]. To list multiple prefixes, separate them 
with
+   whitespace. If `core.alternateRefsCommand` is set, setting
+   `core.alternateRefsPrefixes` has no effect.
+
 core.bare::
If true this repository is assumed to be 'bare' and has no
working directory associated with it.  If this is the case a
diff --git a/t/t5410-receive-pack-alternates.sh 
b/t/t5410-receive-pack-alternates.sh
index 49d0fe44fb..457c20c2a5 100755
--- a/t/t5410-receive-pack-alternates.sh
+++ b/t/t5410-receive-pack-alternates.sh
@@ -30,4 +30,12 @@ test_expect_success 'with core.alternateRefsCommand' '
test_cmp expect actual.haves
 '
 
+test_expect_success 'with core.alternateRefsPrefixes' '
+   test_config -C fork core.alternateRefsPrefixes "refs/heads/private" &&
+   git rev-parse private/branch >expect &&
+   printf "" | git receive-pack fork >actual &&
+   extract_haves actual.haves &&
+   test_cmp expect actual.haves
+'
+
 test_done
diff --git a/transport.c b/transport.c
index e271b66603..83474add28 100644
--- a/transport.c
+++ b/transport.c
@@ -1341,6 +1341,11 @@ static void fill_alternate_refs_command(struct 
child_process *cmd,
argv_array_pushf(>args, "--git-dir=%s", repo_path);
argv_array_push(>args, "for-each-ref");
argv_array_push(>args, "--format=%(objectname)");
+
+   if (!git_config_get_value("core.alternateRefsPrefixes", 
)) {
+   argv_array_push(>args, "--");
+   argv_array_split(>args, value);
+   }
}
 
cmd->env = local_repo_env;
-- 
2.19.0.221.g150f307af


[PATCH v5 2/4] transport.c: extract 'fill_alternate_refs_command'

2018-10-08 Thread Taylor Blau
To list alternate references, 'read_alternate_refs' creates a child
process running 'git for-each-ref' in the alternate's Git directory.

Prepare to run other commands besides 'git for-each-ref' by introducing
and moving the relevant code from 'read_alternate_refs' to
'fill_alternate_refs_command'.

Signed-off-by: Taylor Blau 
---
 transport.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/transport.c b/transport.c
index 2e0bc414d0..2825debac5 100644
--- a/transport.c
+++ b/transport.c
@@ -1325,6 +1325,17 @@ char *transport_anonymize_url(const char *url)
return xstrdup(url);
 }
 
+static void fill_alternate_refs_command(struct child_process *cmd,
+   const char *repo_path)
+{
+   cmd->git_cmd = 1;
+   argv_array_pushf(>args, "--git-dir=%s", repo_path);
+   argv_array_push(>args, "for-each-ref");
+   argv_array_push(>args, "--format=%(objectname)");
+   cmd->env = local_repo_env;
+   cmd->out = -1;
+}
+
 static void read_alternate_refs(const char *path,
alternate_ref_fn *cb,
void *data)
@@ -1333,12 +1344,7 @@ static void read_alternate_refs(const char *path,
struct strbuf line = STRBUF_INIT;
FILE *fh;
 
-   cmd.git_cmd = 1;
-   argv_array_pushf(, "--git-dir=%s", path);
-   argv_array_push(, "for-each-ref");
-   argv_array_push(, "--format=%(objectname)");
-   cmd.env = local_repo_env;
-   cmd.out = -1;
+   fill_alternate_refs_command(, path);
 
if (start_command())
return;
-- 
2.19.0.221.g150f307af



[PATCH v5 3/4] transport.c: introduce core.alternateRefsCommand

2018-10-08 Thread Taylor Blau
When in a repository containing one or more alternates, Git would
sometimes like to list references from those alternates. For example,
'git receive-pack' lists the "tips" pointed to by references in those
alternates as special ".have" references.

Listing ".have" references is designed to make pushing changes from
upstream to a fork a lightweight operation, by advertising to the pusher
that the fork already has the objects (via its alternate). Thus, the
client can avoid sending them.

However, when the alternate (upstream, in the previous example) has a
pathologically large number of references, the initial advertisement is
too expensive. In fact, it can dominate any such optimization where the
pusher avoids sending certain objects.

Introduce "core.alternateRefsCommand" in order to provide a facility to
limit or filter alternate references. This can be used, for example, to
filter out references the alternate does not wish to send (for space
concerns, or otherwise) during the initial advertisement.

Let the repository that has alternates configure this command to avoid
trusting the alternate to provide us a safe command to run in the shell.
To find the alternate, pass its absolute path as the first argument.

Signed-off-by: Taylor Blau 
---
 Documentation/config.txt   | 11 ++
 t/t5410-receive-pack-alternates.sh | 33 ++
 transport.c| 19 +
 3 files changed, 59 insertions(+), 4 deletions(-)
 create mode 100755 t/t5410-receive-pack-alternates.sh

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ad0f4510c3..c51e82d8a5 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -616,6 +616,17 @@ core.preferSymlinkRefs::
This is sometimes needed to work with old scripts that
expect HEAD to be a symbolic link.
 
+core.alternateRefsCommand::
+   When advertising tips of available history from an alternate, use the 
shell to
+   execute the specified command instead of linkgit:git-for-each-ref[1]. 
The
+   first argument is the absolute path of the alternate. Output must 
contain one
+   hex object id per line (i.e., the same as produce by `git for-each-ref
+   --format='%(objectname)'`).
++
+Note that you cannot generally put `git for-each-ref` directly into the config
+value, as it does not take a repository path as an argument (but you can wrap
+the command above in a shell script).
+
 core.bare::
If true this repository is assumed to be 'bare' and has no
working directory associated with it.  If this is the case a
diff --git a/t/t5410-receive-pack-alternates.sh 
b/t/t5410-receive-pack-alternates.sh
new file mode 100755
index 00..49d0fe44fb
--- /dev/null
+++ b/t/t5410-receive-pack-alternates.sh
@@ -0,0 +1,33 @@
+#!/bin/sh
+
+test_description='git receive-pack with alternate ref filtering'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+   test_commit base &&
+   git clone -s --bare . fork &&
+   git checkout -b public/branch master &&
+   test_commit public &&
+   git checkout -b private/branch master &&
+   test_commit private
+'
+
+extract_haves () {
+   depacketize | perl -lne '/^(\S+) \.have/ and print $1'
+}
+
+test_expect_success 'with core.alternateRefsCommand' '
+   write_script fork/alternate-refs <<-\EOF &&
+   git --git-dir="$1" for-each-ref \
+   --format="%(objectname)" \
+   refs/heads/public/
+   EOF
+   test_config -C fork core.alternateRefsCommand alternate-refs &&
+   git rev-parse public/branch >expect &&
+   printf "" | git receive-pack fork >actual &&
+   extract_haves actual.haves &&
+   test_cmp expect actual.haves
+'
+
+test_done
diff --git a/transport.c b/transport.c
index 2825debac5..e271b66603 100644
--- a/transport.c
+++ b/transport.c
@@ -1328,10 +1328,21 @@ char *transport_anonymize_url(const char *url)
 static void fill_alternate_refs_command(struct child_process *cmd,
const char *repo_path)
 {
-   cmd->git_cmd = 1;
-   argv_array_pushf(>args, "--git-dir=%s", repo_path);
-   argv_array_push(>args, "for-each-ref");
-   argv_array_push(>args, "--format=%(objectname)");
+   const char *value;
+
+   if (!git_config_get_value("core.alternateRefsCommand", )) {
+   cmd->use_shell = 1;
+
+   argv_array_push(>args, value);
+   argv_array_push(>args, repo_path);
+   } else {
+   cmd->git_cmd = 1;
+
+   argv_array_pushf(>args, "--git-dir=%s", repo_path);
+   argv_array_push(>args, "for-each-ref");
+   argv_array_push(>args, "--format=%(objectname)");
+   }
+
cmd->env = local_repo_env;
cmd->out = -1;
 }
-- 
2.19.0.221.g150f307af



[PATCH v5 1/4] transport: drop refnames from for_each_alternate_ref

2018-10-08 Thread Taylor Blau
From: Jeff King 

None of the current callers use the refname parameter we pass to their
callbacks. In theory somebody _could_ do so, but it's actually quite
weird if you think about it: it's a ref in somebody else's repository.
So the name has no meaning locally, and in fact there may be duplicates
if there are multiple alternates.

The users of this interface really only care about seeing some ref tips,
since that promises that the alternate has the full commit graph
reachable from there. So let's keep the information we pass back to the
bare minimum.

Signed-off-by: Jeff King 
Signed-off-by: Taylor Blau 
---
 builtin/receive-pack.c | 3 +--
 fetch-pack.c   | 3 +--
 transport.c| 6 +++---
 transport.h| 2 +-
 4 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 4d30001950..6792291f5e 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -281,8 +281,7 @@ static int show_ref_cb(const char *path_full, const struct 
object_id *oid,
return 0;
 }
 
-static void show_one_alternate_ref(const char *refname,
-  const struct object_id *oid,
+static void show_one_alternate_ref(const struct object_id *oid,
   void *data)
 {
struct oidset *seen = data;
diff --git a/fetch-pack.c b/fetch-pack.c
index 75047a4b2a..b643de143b 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -76,8 +76,7 @@ struct alternate_object_cache {
size_t nr, alloc;
 };
 
-static void cache_one_alternate(const char *refname,
-   const struct object_id *oid,
+static void cache_one_alternate(const struct object_id *oid,
void *vcache)
 {
struct alternate_object_cache *cache = vcache;
diff --git a/transport.c b/transport.c
index 1c76d64aba..2e0bc414d0 100644
--- a/transport.c
+++ b/transport.c
@@ -1336,7 +1336,7 @@ static void read_alternate_refs(const char *path,
cmd.git_cmd = 1;
argv_array_pushf(, "--git-dir=%s", path);
argv_array_push(, "for-each-ref");
-   argv_array_push(, "--format=%(objectname) %(refname)");
+   argv_array_push(, "--format=%(objectname)");
cmd.env = local_repo_env;
cmd.out = -1;
 
@@ -1348,13 +1348,13 @@ static void read_alternate_refs(const char *path,
struct object_id oid;
 
if (get_oid_hex(line.buf, ) ||
-   line.buf[GIT_SHA1_HEXSZ] != ' ') {
+   line.buf[GIT_SHA1_HEXSZ]) {
warning(_("invalid line while parsing alternate refs: 
%s"),
line.buf);
break;
}
 
-   cb(line.buf + GIT_SHA1_HEXSZ + 1, , data);
+   cb(, data);
}
 
fclose(fh);
diff --git a/transport.h b/transport.h
index 01e717c29e..9baeca2d7a 100644
--- a/transport.h
+++ b/transport.h
@@ -261,6 +261,6 @@ int transport_refs_pushed(struct ref *ref);
 void transport_print_push_status(const char *dest, struct ref *refs,
  int verbose, int porcelain, unsigned int *reject_reasons);
 
-typedef void alternate_ref_fn(const char *refname, const struct object_id 
*oid, void *);
+typedef void alternate_ref_fn(const struct object_id *oid, void *);
 extern void for_each_alternate_ref(alternate_ref_fn, void *);
 #endif
-- 
2.19.0.221.g150f307af



I want you to Distribute my funds to less priviledge if interested reply now

2018-10-08 Thread Shill Sheila Johnson


Re: [PATCH v11 8/8] list-objects-filter: implement filter tree:0

2018-10-08 Thread Matthew DeVore
On Sat, Oct 6, 2018 at 5:10 PM Junio C Hamano  wrote:
>
> As output made inside test_expect_{succcess,failure} are discarded
> by default and shown while debugging tests, there is no strong
> reason to use "grep -q" in our tests.  I saw a few instances of
> "grep -q" added in this series including this one
>
> test_must_fail grep -q "$file_4" observed
>
> that should probably be
>
> ! grep "$file_4" observed
Yeah, I remember I read in the testing guidelines that you should just
use ! for non-Git commands since it's not our job to make sure these
tools are not crashing. Thank you for pointing this out.

>
> > + printf "blob\ncommit\ntree\n" >unique_types.expected &&
> > ...
> > + printf "blob\ntree\n" >expected &&
>
> Using test_write_lines is probably easier to read.

Done. Below is an interdiff. Let me know if you want a reroll soon.
Otherwise, I will send one later this week.

- Matt

diff --git a/t/t5317-pack-objects-filter-objects.sh
b/t/t5317-pack-objects-filter-objects.sh
index 510d3537f..d9dccf4d4 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -69,7 +69,7 @@ test_expect_success 'get an error for missing tree object' '
 test_must_fail git -C r5 pack-objects --rev --stdout
2>bad_tree <<-EOF &&
 HEAD
 EOF
-grep -q "bad tree object" bad_tree
+grep "bad tree object" bad_tree
 '

 test_expect_success 'setup for tests of tree:0' '
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 53fbf7db8..392caa08f 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -192,7 +192,7 @@ test_expect_success 'use fsck before and after
manually fetching a missing subtr
 xargs -n1 git -C dst cat-file -t >fetched_types &&

 sort -u fetched_types >unique_types.observed &&
-printf "blob\ncommit\ntree\n" >unique_types.expected &&
+test_write_lines blob commit tree >unique_types.expected &&
 test_cmp unique_types.expected unique_types.observed
 '

diff --git a/t/t6112-rev-list-filters-objects.sh
b/t/t6112-rev-list-filters-objects.sh
index c8e3d87c4..08e0c7db6 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -38,8 +38,8 @@ test_expect_success 'specify blob explicitly
prevents filtering' '
  awk -f print_2.awk) &&

 git -C r1 rev-list --objects --filter=blob:none HEAD $file_3
>observed &&
-grep -q "$file_3" observed &&
-test_must_fail grep -q "$file_4" observed
+grep "$file_3" observed &&
+! grep "$file_4" observed
 '

 test_expect_success 'verify emitted+omitted == all' '
@@ -240,7 +240,7 @@ test_expect_success 'verify tree:0 includes trees
in "filtered" output' '
 xargs -n1 git -C r3 cat-file -t >unsorted_filtered_types &&

 sort -u unsorted_filtered_types >filtered_types &&
-printf "blob\ntree\n" >expected &&
+test_write_lines blob tree >expected &&
 test_cmp expected filtered_types
 '


Re: [PATCH][Outreachy] remove all the inclusions of git-compat-util.h in header files

2018-10-08 Thread Derrick Stolee

On 10/8/2018 1:05 PM, Ananya Krishna Maram wrote:

Hi All,

Hello, Ananya! Welcome.


I was searching through #leftovers and found this.
https://public-inbox.org/git/cabpp-bgvvxcbzx44er6to-pusfen_6gnyj1u5cuon9deaa4...@mail.gmail.com/

This patch address the task discussed in the above link.
The discussion above seems to not be intended for your commit message, 
but it does show up when I run `git am` and provide your email as input. 
The typical way to avoid this is to place all commentary below the "---" 
that signifies the commit message is over.

From: Ananya Krishan Maram 

skip the #include of git-compat-util.h since all .c files include it.

Signed-off-by: Ananya Krishna Maram 
---
  advice.h | 1 -
  commit-graph.h   | 1 -
  hash.h   | 1 -
  pkt-line.h   | 1 -
  t/helper/test-tool.h | 1 -
  5 files changed, 5 deletions(-)

diff --git a/advice.h b/advice.h
index ab24df0fd..09148baa6 100644
--- a/advice.h
+++ b/advice.h
@@ -1,7 +1,6 @@
  #ifndef ADVICE_H
  #define ADVICE_H
  
-#include "git-compat-util.h"
  
  extern int advice_push_update_rejected;

  extern int advice_push_non_ff_current;
diff --git a/commit-graph.h b/commit-graph.h
index b05047676..0e93c2bed 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -1,7 +1,6 @@
  #ifndef COMMIT_GRAPH_H
  #define COMMIT_GRAPH_H
  
-#include "git-compat-util.h"

  #include "repository.h"
  #include "string-list.h"
  #include "cache.h"
diff --git a/hash.h b/hash.h
index 7c8238bc2..9a4334c5d 100644
--- a/hash.h
+++ b/hash.h
@@ -1,7 +1,6 @@
  #ifndef HASH_H
  #define HASH_H
  
-#include "git-compat-util.h"
  
  #if defined(SHA1_PPC)

  #include "ppc/sha1.h"
diff --git a/pkt-line.h b/pkt-line.h
index 5b28d4347..fdd316494 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -1,7 +1,6 @@
  #ifndef PKTLINE_H
  #define PKTLINE_H
  
-#include "git-compat-util.h"

  #include "strbuf.h"
  
  /*

diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index e07495727..24e0a1589 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -1,7 +1,6 @@
  #ifndef __TEST_TOOL_H__
  #define __TEST_TOOL_H__
  
-#include "git-compat-util.h"
  
  int cmd__chmtime(int argc, const char **argv);

  int cmd__config(int argc, const char **argv);
I applied these changes locally and confirmed the code compiles, so all 
.c files including these _do_ include git-compat-util.h properly.


Thanks,
-Stolee



[PATCH][Outreachy] remove all the inclusions of git-compat-util.h in header files

2018-10-08 Thread Ananya Krishna Maram
Hi All, 
I was searching through #leftovers and found this.
https://public-inbox.org/git/cabpp-bgvvxcbzx44er6to-pusfen_6gnyj1u5cuon9deaa4...@mail.gmail.com/

This patch address the task discussed in the above link. 

From: Ananya Krishan Maram 

skip the #include of git-compat-util.h since all .c files include it.

Signed-off-by: Ananya Krishna Maram 
---
 advice.h | 1 -
 commit-graph.h   | 1 -
 hash.h   | 1 -
 pkt-line.h   | 1 -
 t/helper/test-tool.h | 1 -
 5 files changed, 5 deletions(-)

diff --git a/advice.h b/advice.h
index ab24df0fd..09148baa6 100644
--- a/advice.h
+++ b/advice.h
@@ -1,7 +1,6 @@
 #ifndef ADVICE_H
 #define ADVICE_H
 
-#include "git-compat-util.h"
 
 extern int advice_push_update_rejected;
 extern int advice_push_non_ff_current;
diff --git a/commit-graph.h b/commit-graph.h
index b05047676..0e93c2bed 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -1,7 +1,6 @@
 #ifndef COMMIT_GRAPH_H
 #define COMMIT_GRAPH_H
 
-#include "git-compat-util.h"
 #include "repository.h"
 #include "string-list.h"
 #include "cache.h"
diff --git a/hash.h b/hash.h
index 7c8238bc2..9a4334c5d 100644
--- a/hash.h
+++ b/hash.h
@@ -1,7 +1,6 @@
 #ifndef HASH_H
 #define HASH_H
 
-#include "git-compat-util.h"
 
 #if defined(SHA1_PPC)
 #include "ppc/sha1.h"
diff --git a/pkt-line.h b/pkt-line.h
index 5b28d4347..fdd316494 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -1,7 +1,6 @@
 #ifndef PKTLINE_H
 #define PKTLINE_H
 
-#include "git-compat-util.h"
 #include "strbuf.h"
 
 /*
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index e07495727..24e0a1589 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -1,7 +1,6 @@
 #ifndef __TEST_TOOL_H__
 #define __TEST_TOOL_H__
 
-#include "git-compat-util.h"
 
 int cmd__chmtime(int argc, const char **argv);
 int cmd__config(int argc, const char **argv);
-- 
2.19.0.272.ga00e0029e



Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-08 Thread Derrick Stolee

On 10/8/2018 12:41 PM, SZEDER Gábor wrote:

On Wed, Oct 03, 2018 at 03:18:05PM -0400, Jeff King wrote:

I'm still excited about the prospect of a bloom filter for paths which
each commit touches. I think that's the next big frontier in getting
things like "git log -- path" to a reasonable run-time.

There is certainly potential there.  With a (very) rough PoC
experiment, a 8MB bloom filter, and a carefully choosen path I can
achieve a nice, almost 25x speedup:

   $ time git rev-list --count HEAD -- t/valgrind/valgrind.sh
   6

   real0m1.563s
   user0m1.519s
   sys 0m0.045s

   $ time GIT_USE_POC_BLOOM_FILTER=y ~/src/git/git rev-list --count HEAD -- 
t/valgrind/valgrind.sh
   6

   real0m0.063s
   user0m0.043s
   sys 0m0.020s

   bloom filter total queries: 16269 definitely not: 16195 maybe: 74 false 
positives: 64 fp ratio: 0.003934
Nice! These numbers make sense to me, in terms of how many TREESAME 
queries we actually need to perform for such a query.

But I'm afraid it will take a while until I get around to turn it into
something presentable...
Do you have the code pushed somewhere public where one could take a 
look? I could provide some early feedback.


Thanks,
-Stolee


Re: We should add a "git gc --auto" after "git clone" due to commit graph

2018-10-08 Thread SZEDER Gábor
On Wed, Oct 03, 2018 at 03:18:05PM -0400, Jeff King wrote:
> I'm still excited about the prospect of a bloom filter for paths which
> each commit touches. I think that's the next big frontier in getting
> things like "git log -- path" to a reasonable run-time.

There is certainly potential there.  With a (very) rough PoC
experiment, a 8MB bloom filter, and a carefully choosen path I can
achieve a nice, almost 25x speedup:

  $ time git rev-list --count HEAD -- t/valgrind/valgrind.sh
  6

  real0m1.563s
  user0m1.519s
  sys 0m0.045s

  $ time GIT_USE_POC_BLOOM_FILTER=y ~/src/git/git rev-list --count HEAD -- 
t/valgrind/valgrind.sh
  6

  real0m0.063s
  user0m0.043s
  sys 0m0.020s

  bloom filter total queries: 16269 definitely not: 16195 maybe: 74 false 
positives: 64 fp ratio: 0.003934

But I'm afraid it will take a while until I get around to turn it into
something presentable...



Loan

2018-10-08 Thread Lawrence corporate loan




--
Are you a Business Owner or a Financial Consultant looking for a
Business loan or are you looking for a Personal loan. Our interest
rates are affordable, ranging from 3% to 4.5% interest rate per annual
depends on the loan type. If interested, please contact us today here
on email: lawrencecorpor...@gmail.com


Re: [PATCH v2 0/5] Fix the racy split index problem

2018-10-08 Thread SZEDER Gábor
On Mon, Oct 08, 2018 at 04:54:51PM +0200, Ævar Arnfjörð Bjarmason wrote:

> >> Thanks. I had ~400 runs of the tests I ran before and they were all
> >> OK. Now trying also with t1701 (which I hadn't noticed was a new
> >> test...).
> >
> > Ran that overnight with the same conditions as before. 2683 OK runs and
> > 0 failures (and counting). So it seems like the combination of the two
> > fixed the split index bugs.
> 
> I forgot I ad this running, and got up to 45482 OKs and 0 FAILs before
> finally Ctrl+C-ing it now :)

Yay! \o/

Thanks for testing, and I feel sorry for your poor machine...

I will send an updated version to address the (minor) points raised in
[1]...  well, most likely not today, but hopefully soon-ish.

1 - https://public-inbox.org/git/20180929091429.GF23446@localhost/



Re:Business proposition for you

2018-10-08 Thread Melvin Greg
Hello, 

Business proposition for you.

I have a client from Syrian who will like to invest with your 
company. My client is willing to invest $4 Million. Can I have 
your company website to show to my client your company so that 
they will check and decide if they will invest there funds with 
you as joint partner. 

This information is needed urgently.

Please reply. 

Best Regards,
Agent Melvin Greg
Tel:+1 4045966532


[PATCH 1/3] midx: fix broken free() in close_midx()

2018-10-08 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

When closing a multi-pack-index, we intend to close each pack-file
and free the struct packed_git that represents it. However, this
line was previously freeing the array of pointers, not the
pointer itself. This leads to a double-free issue.

Signed-off-by: Derrick Stolee 
---
 midx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index f3e8dbc108..999717b96f 100644
--- a/midx.c
+++ b/midx.c
@@ -190,7 +190,7 @@ static void close_midx(struct multi_pack_index *m)
for (i = 0; i < m->num_packs; i++) {
if (m->packs[i]) {
close_pack(m->packs[i]);
-   free(m->packs);
+   free(m->packs[i]);
}
}
FREE_AND_NULL(m->packs);
-- 
gitgitgadget



[PATCH 0/3] Add GIT_TEST_MULTI_PACK_INDEX environment variable

2018-10-08 Thread Derrick Stolee via GitGitGadget
To increase coverage of the multi-pack-index feature, add a
GIT_TEST_MULTI_PACK_INDEX environment variable similar to other GIT_TEST_*
variables.

After creating the environment variable and running the test suite with it
enabled, I found a few bugs in the multi-pack-index implementation. These
are handled by the first two patches.

I have set up a CI build on Azure Pipelines [1] that runs the test suite
with a few optional features enabled, including GIT_TEST_MULTI_PACK_INDEX
and GIT_TEST_COMMIT_GRAPH. I'll use this to watch the features and ensure
they work well with the rest of the ongoing work. Eventually, we can add
these variables to the Travis CI scripts.

[1] https://git.visualstudio.com/git/_build?definitionId=4

Derrick Stolee (3):
  midx: fix broken free() in close_midx()
  midx: close multi-pack-index on repack
  multi-pack-index: define GIT_TEST_MULTI_PACK_INDEX

 builtin/repack.c|  8 
 midx.c  | 17 +
 midx.h  |  4 
 t/README|  4 
 t/t5310-pack-bitmaps.sh |  1 +
 t/t5319-multi-pack-index.sh |  2 +-
 t/t9300-fast-import.sh  |  2 +-
 7 files changed, 32 insertions(+), 6 deletions(-)


base-commit: f84b9b09d40408cf91bbc500d9f190a7866c3e0f
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-27%2Fderrickstolee%2Fmidx-test%2Fupstream-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-27/derrickstolee/midx-test/upstream-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/27
-- 
gitgitgadget


[PATCH 3/3] multi-pack-index: define GIT_TEST_MULTI_PACK_INDEX

2018-10-08 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

The multi-pack-index feature is tested in isolation by
t5319-multi-pack-index.sh, but there are many more interesting
scenarios in the test suite surrounding pack-file data shapes
and interactions. Since the multi-pack-index is an optional
data structure, it does not make sense to include it by default
in those tests.

Instead, add a new GIT_TEST_MULTI_PACK_INDEX environment variable
that enables core.multiPackIndex and writes a multi-pack-index
after each 'git repack' command. This adds extra test coverage
when needed.

There are a few spots in the test suite that need to react to this
change:

* t5319-multi-pack-index.sh: there is a test that checks that
  'git repack' deletes the multi-pack-index. Disable the environment
  variable to ensure this still happens.

* t5310-pack-bitmaps.sh: One test moves a pack-file from the object
  directory to an alternate. This breaks the multi-pack-index, so
  delete the multi-pack-index at this point, if it exists.

* t9300-fast-import.sh: One test verifies the number of files in
  the .git/objects/pack directory is exactly 8. Exclude the
  multi-pack-index from this count so it is still 8 in all cases.

Signed-off-by: Derrick Stolee 
---
 builtin/repack.c| 4 
 midx.c  | 9 +++--
 midx.h  | 2 ++
 t/README| 4 
 t/t5310-pack-bitmaps.sh | 1 +
 t/t5319-multi-pack-index.sh | 2 +-
 t/t9300-fast-import.sh  | 2 +-
 7 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 7925bb976e..418442bfe2 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -558,6 +558,10 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
if (!no_update_server_info)
update_server_info(0);
remove_temporary_files();
+
+   if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0))
+   write_midx_file(get_object_directory());
+
string_list_clear(, 0);
string_list_clear(, 0);
string_list_clear(_packs, 0);
diff --git a/midx.c b/midx.c
index fe8532a9d1..aeafb58fa3 100644
--- a/midx.c
+++ b/midx.c
@@ -338,9 +338,14 @@ int prepare_multi_pack_index_one(struct repository *r, 
const char *object_dir, i
struct multi_pack_index *m;
struct multi_pack_index *m_search;
int config_value;
+   static int env_value = -1;
 
-   if (repo_config_get_bool(r, "core.multipackindex", _value) ||
-   !config_value)
+   if (env_value < 0)
+   env_value = git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0);
+
+   if (!env_value &&
+   (repo_config_get_bool(r, "core.multipackindex", _value) ||
+   !config_value))
return 0;
 
for (m_search = r->objects->multi_pack_index; m_search; m_search = 
m_search->next)
diff --git a/midx.h b/midx.h
index af6b5cb58f..bec8f73d28 100644
--- a/midx.h
+++ b/midx.h
@@ -3,6 +3,8 @@
 
 #include "repository.h"
 
+#define GIT_TEST_MULTI_PACK_INDEX "GIT_TEST_MULTI_PACK_INDEX"
+
 struct multi_pack_index {
struct multi_pack_index *next;
 
diff --git a/t/README b/t/README
index 3ea6c85460..9d0277c338 100644
--- a/t/README
+++ b/t/README
@@ -327,6 +327,10 @@ GIT_TEST_COMMIT_GRAPH=, when true, forces the 
commit-graph to
 be written after every 'git commit' command, and overrides the
 'core.commitGraph' setting to true.
 
+GIT_TEST_MULTI_PACK_INDEX=, when true, forces the multi-pack-
+index to be written after every 'git repack' command, and overrides the
+'core.multiPackIndex' setting to true.
+
 Naming Tests
 
 
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index 1be3459c5b..82d7f7f6a5 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -191,6 +191,7 @@ test_expect_success 'pack-objects respects 
--honor-pack-keep (local bitmapped pa
 
 test_expect_success 'pack-objects respects --local (non-local bitmapped pack)' 
'
mv .git/objects/pack/$packbitmap.* alt.git/objects/pack/ &&
+   rm -f .git/objects/pack/multi-pack-index &&
test_when_finished "mv alt.git/objects/pack/$packbitmap.* 
.git/objects/pack/" &&
echo HEAD | git pack-objects --local --stdout --revs >3b.pack &&
git index-pack 3b.pack &&
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 6f56b38674..4024ff9a39 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -152,7 +152,7 @@ compare_results_with_midx "twelve packs"
 
 test_expect_success 'repack removes multi-pack-index' '
test_path_is_file $objdir/pack/multi-pack-index &&
-   git repack -adf &&
+   GIT_TEST_MULTI_PACK_INDEX=0 git repack -adf &&
test_path_is_missing $objdir/pack/multi-pack-index
 '
 
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 40fe7e4976..59a13b6a77 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -1558,7 +1558,7 @@ test_expect_success 'O: blank lines 

[PATCH 2/3] midx: close multi-pack-index on repack

2018-10-08 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

When repacking, we may remove pack-files. This invalidates the
multi-pack-index (if it exists). Previously, we removed the
multi-pack-index file before removing any pack-file. In some cases,
the repack command may load the multi-pack-index into memory. This
may lead to later in-memory references to the non-existent pack-
files.

Signed-off-by: Derrick Stolee 
---
 builtin/repack.c | 4 
 midx.c   | 6 +-
 midx.h   | 2 ++
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index c6a7943d5c..7925bb976e 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -432,6 +432,10 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
 
if (!midx_cleared) {
/* if we move a packfile, it will invalidated 
the midx */
+   if (the_repository->objects) {
+   
close_midx(the_repository->objects->multi_pack_index);
+   
the_repository->objects->multi_pack_index = NULL;
+   }
clear_midx_file(get_object_directory());
midx_cleared = 1;
}
diff --git a/midx.c b/midx.c
index 999717b96f..fe8532a9d1 100644
--- a/midx.c
+++ b/midx.c
@@ -180,9 +180,13 @@ cleanup_fail:
return NULL;
 }
 
-static void close_midx(struct multi_pack_index *m)
+void close_midx(struct multi_pack_index *m)
 {
uint32_t i;
+
+   if (!m)
+   return;
+
munmap((unsigned char *)m->data, m->data_len);
close(m->fd);
m->fd = -1;
diff --git a/midx.h b/midx.h
index a210f1af2a..af6b5cb58f 100644
--- a/midx.h
+++ b/midx.h
@@ -44,4 +44,6 @@ int prepare_multi_pack_index_one(struct repository *r, const 
char *object_dir, i
 int write_midx_file(const char *object_dir);
 void clear_midx_file(const char *object_dir);
 
+void close_midx(struct multi_pack_index *m);
+
 #endif
-- 
gitgitgadget



Re: [PATCH 1/1] commit-graph: define GIT_TEST_COMMIT_GRAPH

2018-10-08 Thread Derrick Stolee

On 10/8/2018 10:58 AM, Ævar Arnfjörð Bjarmason wrote:

On Mon, Oct 08 2018, Derrick Stolee wrote:


On 10/8/2018 9:43 AM, Ævar Arnfjörð Bjarmason wrote:

On Tue, Aug 28 2018, Derrick Stolee via GitGitGadget wrote:


From: Derrick Stolee 

The commit-graph feature is tested in isolation by
t5318-commit-graph.sh and t6600-test-reach.sh, but there are many
more interesting scenarios involving commit walks. Many of these
scenarios are covered by the existing test suite, but we need to
maintain coverage when the optional commit-graph structure is not
present.

To allow running the full test suite with the commit-graph present,
add a new test environment variable, GIT_TEST_COMMIT_GRAPH. Similar
to GIT_TEST_SPLIT_INDEX, this variable makes every Git command try
to load the commit-graph when parsing commits, and writes the
commit-graph file after every 'git commit' command.

There are a few tests that rely on commits not existing in
pack-files to trigger important events, so manually set
GIT_TEST_COMMIT_GRAPH to false for the necessary commands.

There is one test in t6024-recursive-merge.sh that relies on the
merge-base algorithm picking one of two ambiguous merge-bases, and
the commit-graph feature changes which merge-base is picked.


The test feature itself seems fine, but this consistently fails ever
since it got introduced (a reset --hard on the commit merged to msater
in git.git):

  GIT_TEST_COMMIT_GRAPH=true prove -j$(parallel --number-of-cores) 
t5500-fetch-pack.sh t6001-rev-list-graft.sh t6050-replace.sh
  Test Summary Report
  ---
  t6001-rev-list-graft.sh (Wstat: 256 Tests: 14 Failed: 6)
Failed tests:  3, 5, 7, 9, 11, 13
Non-zero exit status: 1
  t6050-replace.sh   (Wstat: 256 Tests: 35 Failed: 9)
Failed tests:  12-16, 24-25, 30, 35
Non-zero exit status: 1
  t5500-fetch-pack.sh(Wstat: 256 Tests: 357 Failed: 1)
Failed test:  351
Non-zero exit status: 1

This is on Linux/Debian 4.17.0-1-amd64. Can you reproduce this? If not I
can provide more info (-x output etc..).

I see these failures, too, but I believe they are due to
ds/commit-graph-with-grafts not being merged to 'next' yet. The
purpose of that branch is to fix these test breaks. The environment
variable got merged a lot faster.

I just built & tested the 'jch' branch at 515d82d9 with
GIT_TEST_COMMIT_GRAPH=1 and they all passed.

I should have tested "pu" first. These failures are indeed fixed
there. Thanks, and sorry about the noise.
Thanks for testing with the optional features! It's good to keep them 
exercised.


Re: [PATCH 1/1] commit-graph: define GIT_TEST_COMMIT_GRAPH

2018-10-08 Thread Ævar Arnfjörð Bjarmason


On Mon, Oct 08 2018, Derrick Stolee wrote:

> On 10/8/2018 9:43 AM, Ævar Arnfjörð Bjarmason wrote:
>> On Tue, Aug 28 2018, Derrick Stolee via GitGitGadget wrote:
>>
>>> From: Derrick Stolee 
>>>
>>> The commit-graph feature is tested in isolation by
>>> t5318-commit-graph.sh and t6600-test-reach.sh, but there are many
>>> more interesting scenarios involving commit walks. Many of these
>>> scenarios are covered by the existing test suite, but we need to
>>> maintain coverage when the optional commit-graph structure is not
>>> present.
>>>
>>> To allow running the full test suite with the commit-graph present,
>>> add a new test environment variable, GIT_TEST_COMMIT_GRAPH. Similar
>>> to GIT_TEST_SPLIT_INDEX, this variable makes every Git command try
>>> to load the commit-graph when parsing commits, and writes the
>>> commit-graph file after every 'git commit' command.
>>>
>>> There are a few tests that rely on commits not existing in
>>> pack-files to trigger important events, so manually set
>>> GIT_TEST_COMMIT_GRAPH to false for the necessary commands.
>>>
>>> There is one test in t6024-recursive-merge.sh that relies on the
>>> merge-base algorithm picking one of two ambiguous merge-bases, and
>>> the commit-graph feature changes which merge-base is picked.
>>>
>> The test feature itself seems fine, but this consistently fails ever
>> since it got introduced (a reset --hard on the commit merged to msater
>> in git.git):
>>
>>  GIT_TEST_COMMIT_GRAPH=true prove -j$(parallel --number-of-cores) 
>> t5500-fetch-pack.sh t6001-rev-list-graft.sh t6050-replace.sh
>>  Test Summary Report
>>  ---
>>  t6001-rev-list-graft.sh (Wstat: 256 Tests: 14 Failed: 6)
>>Failed tests:  3, 5, 7, 9, 11, 13
>>Non-zero exit status: 1
>>  t6050-replace.sh   (Wstat: 256 Tests: 35 Failed: 9)
>>Failed tests:  12-16, 24-25, 30, 35
>>Non-zero exit status: 1
>>  t5500-fetch-pack.sh(Wstat: 256 Tests: 357 Failed: 1)
>>Failed test:  351
>>Non-zero exit status: 1
>>
>> This is on Linux/Debian 4.17.0-1-amd64. Can you reproduce this? If not I
>> can provide more info (-x output etc..).
> I see these failures, too, but I believe they are due to
> ds/commit-graph-with-grafts not being merged to 'next' yet. The
> purpose of that branch is to fix these test breaks. The environment
> variable got merged a lot faster.
>
> I just built & tested the 'jch' branch at 515d82d9 with
> GIT_TEST_COMMIT_GRAPH=1 and they all passed.

I should have tested "pu" first. These failures are indeed fixed
there. Thanks, and sorry about the noise.


Re: [PATCH v2 0/5] Fix the racy split index problem

2018-10-08 Thread Ævar Arnfjörð Bjarmason


On Fri, Sep 28 2018, Ævar Arnfjörð Bjarmason wrote:

> On Thu, Sep 27 2018, Ævar Arnfjörð Bjarmason wrote:
>
>> On Thu, Sep 27 2018, SZEDER Gábor wrote:
>>
>>> On Thu, Sep 27, 2018 at 03:53:24PM +0200, Ævar Arnfjörð Bjarmason wrote:

 On Thu, Sep 27 2018, SZEDER Gábor wrote:

 > This is the second attempt to fix the racy split index problem, which
 > causes occasional failures in several random test scripts when run
 > with 'GIT_TEST_SPLIT_INDEX=yes'.  The important details are in patches
 > 1 and 5 (corresponding to v1's 3 and 5).

 Thanks. I'm running the same sorts of tests I noted in
 https://public-inbox.org/git/87va7ireuu@evledraar.gmail.com/ on
 this. The fix Jeff had that you noted in
 https://public-inbox.org/git/20180906151439.GA8016@localhost/ is now in
 "master".

 I take it your
 https://github.com/szeder/git/commits/racy-split-index-fix is the same
 as this submission?
>>>
>>> Yes.
>>>
 Anyway, I'm testing that cherry-picked on top of the
 latest master.

 Unfortunate that we couldn't get the isolated test you made in
 https://public-inbox.org/git/20180907034942.GA10370@localhost/
>>>
>>> Nah, that's not an isolated test case, that's only a somewhat
>>> narrowed-down, but rather reliable reproduction recipe while I still
>>> had no idea what was going on :)
>>>
>>> The _real_ isolated test is the last test in t1701, that's what it
>>> eventually boiled down to.
>>
>> Thanks. I had ~400 runs of the tests I ran before and they were all
>> OK. Now trying also with t1701 (which I hadn't noticed was a new
>> test...).
>
> Ran that overnight with the same conditions as before. 2683 OK runs and
> 0 failures (and counting). So it seems like the combination of the two
> fixed the split index bugs.

I forgot I ad this running, and got up to 45482 OKs and 0 FAILs before
finally Ctrl+C-ing it now :)

 but I
 don't see how it could be added without some very liberal
 getenv("GIT_TEST_blahblah"), so it's probably best to not add it,
 particularly with the C rewrite of git-stash in-flight.

 I'll report back when I have enough test data to say how these patches
 affect the intermittent test failures under GIT_TEST_SPLIT_INDEX=yes.


[PATCH v4 1/1] contrib: add coverage-diff script

2018-10-08 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

We have coverage targets in our Makefile for using gcov to display line
coverage based on our test suite. The way I like to do it is to run:

make coverage-test
make coverage-report

This leaves the repo in a state where every X.c file that was covered has
an X.c.gcov file containing the coverage counts for every line, and "#"
at every uncovered line.

There have been a few bugs in recent patches what would have been caught
if the test suite covered those blocks (including a few of mine). I want
to work towards a "sensible" amount of coverage on new topics. In my opinion,
this means that any logic should be covered, but the 'die()' blocks covering
very unlikely (or near-impossible) situations may not warrant coverage.

It is important to not measure the coverage of the codebase by what old code
is not covered. To help, I created the 'contrib/coverage-diff.sh' script.
After creating the coverage statistics at a version (say, 'topic') you can
then run

contrib/coverage-diff.sh base topic

to see the lines added between 'base' and 'topic' that are not covered by the
test suite. The output uses 'git blame -s' format so you can find the commits
responsible and view the line numbers for quick access to the context, but
trims leading tabs in the file contents to reduce output width.

Signed-off-by: Derrick Stolee 
---
 contrib/coverage-diff.sh | 108 +++
 1 file changed, 108 insertions(+)
 create mode 100755 contrib/coverage-diff.sh

diff --git a/contrib/coverage-diff.sh b/contrib/coverage-diff.sh
new file mode 100755
index 00..4ec419f900
--- /dev/null
+++ b/contrib/coverage-diff.sh
@@ -0,0 +1,108 @@
+#!/bin/sh
+
+# Usage: Run 'contrib/coverage-diff.sh  ' from source-root
+# after running
+#
+# make coverage-test
+# make coverage-report
+#
+# while checked out at . This script combines the *.gcov files
+# generated by the 'make' commands above with 'git diff  '
+# to report new lines that are not covered by the test suite.
+
+V1=$1
+V2=$2
+
+diff_lines () {
+   perl -e '
+   my $line_num;
+   while (<>) {
+   # Hunk header?  Grab the beginning in postimage.
+   if (/^@@ -\d+(?:,\d+)? \+(\d+)(?:,\d+)? @@/) {
+   $line_num = $1;
+   next;
+   }
+
+   # Have we seen a hunk?  Ignore "diff --git" etc.
+   next unless defined $line_num;
+
+   # Deleted line? Ignore.
+   if (/^-/) {
+   next;
+   }
+
+   # Show only the line number of added lines.
+   if (/^\+/) {
+   print "$line_num\n";
+   }
+   # Either common context or added line appear in
+   # the postimage.  Count it.
+   $line_num++;
+   }
+   '
+}
+
+files=$(git diff --name-only "$V1" "$V2" -- \*.c)
+
+# create empty file
+>coverage-data.txt
+
+for file in $files
+do
+   git diff "$V1" "$V2" -- "$file" |
+   diff_lines |
+   sort >new_lines.txt
+
+   if ! test -s new_lines.txt
+   then
+   continue
+   fi
+
+   hash_file=$(echo $file | sed "s/\//\#/")
+
+   if ! test -s "$hash_file.gcov"
+   then
+   continue
+   fi
+
+   sed -ne '/#:/{
+   s/#://
+   s/:.*//
+   s/ //g
+   p
+   }' "$hash_file.gcov" |
+   sort >uncovered_lines.txt
+
+   comm -12 uncovered_lines.txt new_lines.txt |
+   sed -e 's/$/\)/' |
+   sed -e 's/^/ /' >uncovered_new_lines.txt
+
+   grep -q '[^[:space:]]' >coverage-data.txt &&
+   git blame -s "$V2" -- "$file" |
+   sed 's/\t//g' |
+   grep -f uncovered_new_lines.txt >>coverage-data.txt &&
+   echo >>coverage-data.txt
+
+   rm -f new_lines.txt uncovered_lines.txt uncovered_new_lines.txt
+done
+
+cat coverage-data.txt
+
+echo "Commits introducing uncovered code:"
+
+commit_list=$(cat coverage-data.txt |
+   grep -E '^[0-9a-f]{7,} ' |
+   awk '{print $1;}' |
+   sort |
+   uniq)
+
+(
+   for commit in $commit_list
+   do
+   git log --no-decorate --pretty=format:'%an  %h: %s' -1 
$commit
+   echo
+   done
+) | sort
+
+rm coverage-data.txt
-- 
gitgitgadget


[PATCH v4 0/1] contrib: Add script to show uncovered "new" lines

2018-10-08 Thread Derrick Stolee via GitGitGadget
We have coverage targets in our Makefile for using gcov to display line
coverage based on our test suite. The way I like to do it is to run:

make coverage-test
make coverage-report

This leaves the repo in a state where every X.c file that was covered has an
X.c.gcov file containing the coverage counts for every line, and "#" at
every uncovered line.

There have been a few bugs in recent patches what would have been caught if
the test suite covered those blocks (including a few of mine). I want to
work towards a "sensible" amount of coverage on new topics. In my opinion,
this means that any logic should be covered, but the 'die()' blocks in error
cases do not need to be covered.

It is important to not measure the coverage of the codebase by what old code
is not covered. To help, I created the 'contrib/coverage-diff.sh' script.
After creating the coverage statistics at a version (say, 'topic') you can
then run

contrib/coverage-diff.sh base topic

to see the lines added between 'base' and 'topic' that are not covered by
the test suite. For example, I ran this against the 'next' branch (e82ca0)
versus 'master' (f84b9b) and got the following output:

builtin/commit.c
76f2f5c1e3 builtin/commit.c 1657) 
write_commit_graph_reachable(get_object_directory(), 0, 0);

builtin/fsck.c
66ec0390e7 builtin/fsck.c 862) midx_argv[2] = "--object-dir";
66ec0390e7 builtin/fsck.c 863) midx_argv[3] = alt->path;
66ec0390e7 builtin/fsck.c 864) if (run_command(_verify))
66ec0390e7 builtin/fsck.c 865) errors_found |= ERROR_COMMIT_GRAPH;

fsck.c
fb8952077d  214) die_errno("Could not read '%s'", path);

midx.c
56ee7ff156  949) return 0;
cc6af73c02  990) midx_report(_("failed to load pack-index for packfile %s"),
cc6af73c02  991) e.p->pack_name);
cc6af73c02  992) break;

Commits introducing uncovered code:
Derrick Stolee  56ee7ff15: multi-pack-index: add 'verify' verb
Derrick Stolee  66ec0390e: fsck: verify multi-pack-index
Derrick Stolee  cc6af73c0: multi-pack-index: verify object offsets
Junio C Hamano  76f2f5c1e: Merge branch 'ab/commit-graph-progress' into next
René Scharfe  fb8952077: fsck: use strbuf_getline() to read skiplist file

Thanks, -Stolee

CHANGES IN V3: I took Junio's perl script verbatim, which speeds up the
performance greatly. Some of the other sed commands needed some massaging,
but also added extra cleanup. Thanks for the help!

CHANGES IN V4: I reduced the blame output using -s which decreases the
width. I include a summary of the commit authors at the end to help people
see the lines they wrote. This version is also copied into a build
definition in the public Git project on Azure Pipelines [1]. I'll use this
build definition to generate the coverage report after each "What's Cooking"
email.

[1] https://git.visualstudio.com/git/_build?definitionId=5

Derrick Stolee (1):
  contrib: add coverage-diff script

 contrib/coverage-diff.sh | 108 +++
 1 file changed, 108 insertions(+)
 create mode 100755 contrib/coverage-diff.sh


base-commit: 1d4361b0f344188ab5eec6dcea01f61a3a3a1670
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-40%2Fderrickstolee%2Fcoverage-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-40/derrickstolee/coverage-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/40

Range-diff vs v3:

 1:  21214cc321 ! 1:  6daf310a43 contrib: add coverage-diff script
 @@ -26,10 +26,10 @@
  contrib/coverage-diff.sh base topic
  
  to see the lines added between 'base' and 'topic' that are not 
covered by the
 -test suite. The output uses 'git blame -c' format so you can find the 
commits
 -responsible and view the line numbers for quick access to the context.
 +test suite. The output uses 'git blame -s' format so you can find the 
commits
 +responsible and view the line numbers for quick access to the 
context, but
 +trims leading tabs in the file contents to reduce output width.
  
 -Helped-by: Junio C Hamano 
  Signed-off-by: Derrick Stolee 
  
  diff --git a/contrib/coverage-diff.sh b/contrib/coverage-diff.sh
 @@ -81,13 +81,16 @@
  + '
  +}
  +
 -+files=$(git diff --name-only $V1 $V2 -- *.c)
 ++files=$(git diff --name-only "$V1" "$V2" -- \*.c)
 ++
 ++# create empty file
 ++>coverage-data.txt
  +
  +for file in $files
  +do
 -+ git diff $V1 $V2 -- $file \
 -+ | diff_lines \
 -+ | sort >new_lines.txt
 ++ git diff "$V1" "$V2" -- "$file" |
 ++ diff_lines |
 ++ sort >new_lines.txt
  +
  + if ! test -s new_lines.txt
  + then
 @@ -95,24 +98,50 @@
  + fi
  +
  + hash_file=$(echo $file | sed "s/\//\#/")
 ++
 ++ if ! test -s "$hash_file.gcov"
 ++ then
 ++ continue
 ++ fi
 ++
  + sed -ne '/#:/{
  + s/#://
  + s/:.*//
  

Re: [PATCH 1/1] commit-graph: define GIT_TEST_COMMIT_GRAPH

2018-10-08 Thread Derrick Stolee

On 10/8/2018 9:43 AM, Ævar Arnfjörð Bjarmason wrote:

On Tue, Aug 28 2018, Derrick Stolee via GitGitGadget wrote:


From: Derrick Stolee 

The commit-graph feature is tested in isolation by
t5318-commit-graph.sh and t6600-test-reach.sh, but there are many
more interesting scenarios involving commit walks. Many of these
scenarios are covered by the existing test suite, but we need to
maintain coverage when the optional commit-graph structure is not
present.

To allow running the full test suite with the commit-graph present,
add a new test environment variable, GIT_TEST_COMMIT_GRAPH. Similar
to GIT_TEST_SPLIT_INDEX, this variable makes every Git command try
to load the commit-graph when parsing commits, and writes the
commit-graph file after every 'git commit' command.

There are a few tests that rely on commits not existing in
pack-files to trigger important events, so manually set
GIT_TEST_COMMIT_GRAPH to false for the necessary commands.

There is one test in t6024-recursive-merge.sh that relies on the
merge-base algorithm picking one of two ambiguous merge-bases, and
the commit-graph feature changes which merge-base is picked.


The test feature itself seems fine, but this consistently fails ever
since it got introduced (a reset --hard on the commit merged to msater
in git.git):

 GIT_TEST_COMMIT_GRAPH=true prove -j$(parallel --number-of-cores) 
t5500-fetch-pack.sh t6001-rev-list-graft.sh t6050-replace.sh
 Test Summary Report
 ---
 t6001-rev-list-graft.sh (Wstat: 256 Tests: 14 Failed: 6)
   Failed tests:  3, 5, 7, 9, 11, 13
   Non-zero exit status: 1
 t6050-replace.sh   (Wstat: 256 Tests: 35 Failed: 9)
   Failed tests:  12-16, 24-25, 30, 35
   Non-zero exit status: 1
 t5500-fetch-pack.sh(Wstat: 256 Tests: 357 Failed: 1)
   Failed test:  351
   Non-zero exit status: 1

This is on Linux/Debian 4.17.0-1-amd64. Can you reproduce this? If not I
can provide more info (-x output etc..).
I see these failures, too, but I believe they are due to 
ds/commit-graph-with-grafts not being merged to 'next' yet. The purpose 
of that branch is to fix these test breaks. The environment variable got 
merged a lot faster.


I just built & tested the 'jch' branch at 515d82d9 with 
GIT_TEST_COMMIT_GRAPH=1 and they all passed.


Thanks,
-Stolee


Re: t3404.6 breaks on master under GIT_FSMONITOR_TEST

2018-10-08 Thread Ævar Arnfjörð Bjarmason


On Thu, Sep 06 2018, Ævar Arnfjörð Bjarmason wrote:

> On Thu, Feb 01 2018, Ævar Arnfjörð Bjarmason wrote:
>
>> The GIT_FSMONITOR_TEST variable allows you to roundtrip the fsmonitor
>> codpath in the whole test suite. On both Debian & CentOS this breaks for
>> me:
>>
>> (cd t && GIT_FSMONITOR_TEST=$PWD/t7519/fsmonitor-all 
>> ./t3404-rebase-interactive.sh -i)
>>
>> Whereas this works:
>>
>> (cd t && GIT_FSMONITOR_TEST=$PWD/t7519/fsmonitor-all 
>> GIT_SKIP_TESTS=t3404.6 ./t3404-rebase-interactive.sh -i)
>>
>> The entirety of the rest of the test suite still passes with
>> GIT_FSMONITOR_TEST.
>>
>> This has been failing ever since GIT_FSMONITOR_TEST was introduced in
>> 883e248b8a ("fsmonitor: teach git to optionally utilize a file system
>> monitor to speed up detecting new or changed files.", 2017-09-22). Under
>> -v -x -i:
>>
>> + echo test_must_fail: command succeeded: env 
>> FAKE_LINES=exec_echo_foo_>file1 1 git rebase -i HEAD^
>> test_must_fail: command succeeded: env FAKE_LINES=exec_echo_foo_>file1 1 
>> git rebase -i HEAD^
>> + return 1
>> error: last command exited with $?=1
>> not ok 6 - rebase -i with the exec command checks tree cleanness
>> #
>> #   git checkout master &&
>> #   set_fake_editor &&
>> #   test_must_fail env FAKE_LINES="exec_echo_foo_>file1 1" 
>> git rebase -i HEAD^ &&
>>
>> Maybe once this is fixed running the test suite under GIT_FSMONITOR_TEST
>> would be a useful Travis target, but I don't know the current status of
>> adding new options to Travis.
>
> *Poke* at this again. Ben, or anyone else with knowledge of fsmonitor:
> Can you reproduce this?
>
> This failure along with the one I noted in
> https://public-inbox.org/git/87tvn2remn@evledraar.gmail.com/ is
> failing the tests on Linux when run with GIT_FSMONITOR_TEST.
>
> I'm looking at this again because SZEDER's patches to the split index
> reminded me again that we have these long-standing failures in rare test
> modes (see
> https://public-inbox.org/git/87va7ireuu@evledraar.gmail.com/ for the
> split index discussion).

For what it's worth this is still broken, but more importantly (I'm not
just keeping bumping the same thing) the only thing that's now broken
under fsmonitor. I.e. my skip config is now GIT_SKIP_TESTS="t3404.7"
whereas before 43f1180814 ("git-mv: allow submodules and fsmonitor to
work together", 2018-09-10) I needed to add "t7411.3 t7411.4" to that.


RE:

2018-10-08 Thread Inderpreet Saini
unsubscribe git


Re: [PATCH 1/1] commit-graph: define GIT_TEST_COMMIT_GRAPH

2018-10-08 Thread Ævar Arnfjörð Bjarmason


On Tue, Aug 28 2018, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee 
>
> The commit-graph feature is tested in isolation by
> t5318-commit-graph.sh and t6600-test-reach.sh, but there are many
> more interesting scenarios involving commit walks. Many of these
> scenarios are covered by the existing test suite, but we need to
> maintain coverage when the optional commit-graph structure is not
> present.
>
> To allow running the full test suite with the commit-graph present,
> add a new test environment variable, GIT_TEST_COMMIT_GRAPH. Similar
> to GIT_TEST_SPLIT_INDEX, this variable makes every Git command try
> to load the commit-graph when parsing commits, and writes the
> commit-graph file after every 'git commit' command.
>
> There are a few tests that rely on commits not existing in
> pack-files to trigger important events, so manually set
> GIT_TEST_COMMIT_GRAPH to false for the necessary commands.
>
> There is one test in t6024-recursive-merge.sh that relies on the
> merge-base algorithm picking one of two ambiguous merge-bases, and
> the commit-graph feature changes which merge-base is picked.
>

The test feature itself seems fine, but this consistently fails ever
since it got introduced (a reset --hard on the commit merged to msater
in git.git):

GIT_TEST_COMMIT_GRAPH=true prove -j$(parallel --number-of-cores) 
t5500-fetch-pack.sh t6001-rev-list-graft.sh t6050-replace.sh
Test Summary Report
---
t6001-rev-list-graft.sh (Wstat: 256 Tests: 14 Failed: 6)
  Failed tests:  3, 5, 7, 9, 11, 13
  Non-zero exit status: 1
t6050-replace.sh   (Wstat: 256 Tests: 35 Failed: 9)
  Failed tests:  12-16, 24-25, 30, 35
  Non-zero exit status: 1
t5500-fetch-pack.sh(Wstat: 256 Tests: 357 Failed: 1)
  Failed test:  351
  Non-zero exit status: 1

This is on Linux/Debian 4.17.0-1-amd64. Can you reproduce this? If not I
can provide more info (-x output etc..).


Re: [PATCH v4 4/4] rebase --skip: clean up commit message after a failed fixup/squash

2018-10-08 Thread Phillip Wood

Hi Johannes

On 02/10/2018 14:50, Johannes Schindelin wrote:

Hi Phillip,

[sorry, I just got to this mail now]


Don't worry, I'm impressed you remembered it, I'd completely forgotten 
about it.




On Sun, 6 May 2018, Phillip Wood wrote:


On 27/04/18 21:48, Johannes Schindelin wrote:


During a series of fixup/squash commands, the interactive rebase builds
up a commit message with comments. This will be presented to the user in
the editor if at least one of those commands was a `squash`.

In any case, the commit message will be cleaned up eventually, removing
all those intermediate comments, in the final step of such a
fixup/squash chain.

However, if the last fixup/squash command in such a chain fails with
merge conflicts, and if the user then decides to skip it (or resolve it
to a clean worktree and then continue the rebase), the current code
fails to clean up the commit message.

This commit fixes that behavior.

The fix is quite a bit more involved than meets the eye because it is
not only about the question whether we are `git rebase --skip`ing a
fixup or squash. It is also about removing the skipped fixup/squash's
commit message from the accumulated commit message. And it is also about
the question whether we should let the user edit the final commit
message or not ("Was there a squash in the chain *that was not
skipped*?").

For example, in this case we will want to fix the commit message, but
not open it in an editor:

pick<- succeeds
fixup   <- succeeds
squash  <- fails, will be skipped

This is where the newly-introduced `current-fixups` file comes in real
handy. A quick look and we can determine whether there was a non-skipped
squash. We only need to make sure to keep it up to date with respect to
skipped fixup/squash commands. As a bonus, we can even avoid committing
unnecessarily, e.g. when there was only one fixup, and it failed, and
was skipped.

To fix only the bug where the final commit message was not cleaned up
properly, but without fixing the rest, would have been more complicated
than fixing it all in one go, hence this commit lumps together more than
a single concern.

For the same reason, this commit also adds a bit more to the existing
test case for the regression we just fixed.

The diff is best viewed with --color-moved.

Signed-off-by: Johannes Schindelin 
---
  sequencer.c| 113 -
  t/t3418-rebase-continue.sh |  35 ++--
  2 files changed, 131 insertions(+), 17 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index 56166b0d6c7..cec180714ef 100644
--- a/sequencer.c
+++ b/sequencer.c
[...]
@@ -2804,19 +2801,107 @@ static int commit_staged_changes(struct replay_opts 
*opts)
if (get_oid_hex(rev.buf, _amend))
return error(_("invalid contents: '%s'"),
rebase_path_amend());
-   if (oidcmp(, _amend))
+   if (!is_clean && oidcmp(, _amend))
return error(_("\nYou have uncommitted changes in your "
   "working tree. Please, commit them\n"
   "first and then run 'git rebase "
   "--continue' again."));
+   /*
+* When skipping a failed fixup/squash, we need to edit the
+* commit message, the current fixup list and count, and if it
+* was the last fixup/squash in the chain, we need to clean up
+* the commit message and if there was a squash, let the user
+* edit it.
+*/
+   if (is_clean && !oidcmp(, _amend) &&
+   opts->current_fixup_count > 0 &&
+   file_exists(rebase_path_stopped_sha())) {
+   const char *p = opts->current_fixups.buf;
+   int len = opts->current_fixups.len;
+
+   opts->current_fixup_count--;
+   if (!len)
+   BUG("Incorrect current_fixups:\n%s", p);
+   while (len && p[len - 1] != '\n')
+   len--;
+   strbuf_setlen(>current_fixups, len);
+   if (write_message(p, len, rebase_path_current_fixups(),
+ 0) < 0)
+   return error(_("could not write file: '%s'"),
+rebase_path_current_fixups());
+
+   /*
+* If a fixup/squash in a fixup/squash chain failed, the
+* commit message is already correct, no need to commit
+* it again.
+*
+* Only if it is the final command in the fixup/squash
+* chain, and only if the chain is longer than a single
+   

[no subject]

2018-10-08 Thread Netravnen
unsubscribe git


Translation to Portuguese

2018-10-08 Thread Thiago Saife
Hello Git Team.

I would like to help to continue the books' translation to Brazilian
Portuguese and I don't know how to proceed. Thanks in advance for your
help.

Regards,
-- 
Thiago Saife
(11) 97236-8742


Re: [PATCH v6 08/10] submodule: add a helper to check if it is safe to write to .gitmodules

2018-10-08 Thread Antonio Ospite
On Sun, 07 Oct 2018 08:44:20 +0900
Junio C Hamano  wrote:

> Stefan Beller  writes:
> 
> >>  static int module_config(int argc, const char **argv, const char *prefix)
> >>  {
> >> +   enum {
> >> +   CHECK_WRITEABLE = 1
> >> +   } command = 0;
> >
> > Can we have the default named? Then we would only use states
> > from within the enum?
> 
> Why?  Do we use a half-intelligent "switch () { case ...: ... }"
> checker that would otherwise complain if we handled "case 0" in such
> a switch statement, or something like that?
> 
> Are we going to gain a lot more enum members, by the way?  At this
> point, this looks more like a
> 
>   unsigned check_writable = 0; /* default is not to check */
> 
> to me.

Hi,

the CHECK_WRITEABLE operation is alternative to the get/set ones, not
an addition, so I can see the rationale behind Stefan's suggestion:
either have named enums members for all command "modes" or for none of
them; however other users of enum+OPT_CMDMODE seems to think like the
enum is for commands passed as *options* and the unnamed default is for
actions derived from *arguments*. I don't have a strong opinion on this
matter, tho, so just tell me what you prefer and I'll do it for v7.

Using an enum was to have a more explicit syntax in case other commands
were going to be added in the future (I imagine "--stage" or
"--list-all" as possible additions), and does not affect the generated
code, so I though it was worth it.

Anyways, these are really details, let's concentrate on patches 9 and
10 which deserve much more attention. :)

Thanks you,
   Antonio
-- 
Antonio Ospite
https://ao2.it
https://twitter.com/ao2it

A: Because it messes up the order in which people normally read text.
   See http://en.wikipedia.org/wiki/Posting_style
Q: Why is top-posting such a bad thing?


Re: What's so special about objects/17/ ?

2018-10-08 Thread Ævar Arnfjörð Bjarmason


On Sun, Oct 07 2018, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  writes:
>
>> 1. We still have this check of objects/17/ in builtin/gc.c today. Why
>>objects/17/ and not e.g. objects/00/ to go with other 000* magic such
>>as the  SHA-1?d  Statistically
>>it doesn't matter, but 17 seems like an odd thing to pick at random
>>out of 00..ff, does it have any significance?
>
> There is no "other 000* magic such as ...". There is only one 0{40}
> magic and that one must be memorable and explainable.

Depending on how we're counting there's at least two. We also use
 as a placeholder for "couldn't
read a ref" in addition or "this is a placeholder for an invalid ref" in
addition to how it's used to signify creation/deletion to the in the
likes of the pre-receive hook:

$ echo hello > .git/refs/something
$ git fsck
[...]
error: refs/something: invalid sha1 pointer 

$ > .git/refs/something
$ git fsck
[...]
error: refs/something: invalid sha1 pointer 


This is because the refs backend will memzero the oid struct, and if we
fail to read things it'll still be zero'd out.

This manifests e.g. in this confusing fsck output, due to a bug where
GitLab will write empty refs/keep-around/* refs sometimes:
https://gitlab.com/gitlab-org/gitlab-ce/issues/44431

> The 1/256 sample can be any one among 256.  Just like the date
> string on the first line of the output to be used as the /etc/magic
> signature by format-patch, it was an arbitrary choice, rather than a
> random choice, and unlike 0{40} this does not have to be memorable
> by general public and I do not have to explain the choice to the
> general public ;-)

I wanted to elaborate on the explanation for "gc.auto" in
git-config. Now we just say "approximately 6700". Since this behavior
has been really stable for a long time we could say we sample 1/256 of
the .git/objects/?? dirs, and this explains any perceived discrepancies
between the 6700 number and $(find .git/objects/?? -type f | wc -l).

>> 2. It seems overly paranoid to be checking that the files in
>>   .git/objects/17/ look like a SHA-1.
>
> There is no other reason than futureproofing.  We were paying cost
> to open and scan the directory anyway, and checking that we only
> count the loose object files was (and still is) a sensible thing to
> do to allow us not even worry about the other kind of things we
> might end up creating there.

Makes sense. Just wanted to ask if it was that or some workaround for
historical files being there.