date:20180823

brian m. carlson wrote:
> On Thu, Aug 23, 2018 at 06:54:38PM -0700, Jonathan Nieder wrote:

>> For what it's worth, even if it all is in one commit with message
>> "wip", I think I'd benefit from being able to see this code.  I can
>> promise not to critique it, and to only treat it as a rough
>> premonition of the future.
>
> It's in my object-id-partn branch at https://github.com/bk2204/git.
>
> It doesn't do protocol v2 yet, but it does do protocol v1.  It is, of
> course, subject to change (especially naming) depending on what the list
> thinks is most appropriate.

 $ git diff --shortstat origin/master...bmc/object-id-partn
  185 files changed, 2263 insertions(+), 1535 deletions(-)

Beautiful.  Thanks much for this.

Sincerely,
Jonathan

Re: Questions about the hash function transition

2018-08-23 Thread brian m. carlson

On Thu, Aug 23, 2018 at 06:54:38PM -0700, Jonathan Nieder wrote:
> brian m. carlson wrote:
> > I realize I have a lot of code that has not been sent in yet, but I also
> > tend to build on my own series a lot, and I probably need to be a bit
> > better about extracting reusable pieces that can go in independently
> > without waiting for the previous series to land.
> 
> For what it's worth, even if it all is in one commit with message
> "wip", I think I'd benefit from being able to see this code.  I can
> promise not to critique it, and to only treat it as a rough
> premonition of the future.

It's in my object-id-partn branch at https://github.com/bk2204/git.

It doesn't do protocol v2 yet, but it does do protocol v1.  It is, of
course, subject to change (especially naming) depending on what the list
thinks is most appropriate.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature

Re: [PATCH/RFC] commit: new option to abort -a something is already staged

On Thu, Aug 23, 2018 at 9:28 AM Junio C Hamano  wrote:
> I think the above example forgets "-a" on the final "git commit"
> step.  With it added, I can understand the concern (and I am sure
> you would, too).
>
> The user is trying to add everything done in the working tree, and
> "commit -a" would catch all changes to paths that were already
> tracked, but a separate "add" is necessary for newly created paths.
> But adding a new path means the index no longer matches HEAD, and
> the "commit -a" at the final step sweeps the changes to already
> tracked paths---failing that because there "already is something
> staged" will break the workflow.

Right. I think this would need to be able to understand the case of
"different only by new files".

Thanks,
Jake

Re: [ANNOUNCE] Git v2.19.0-rc0

On Thu, Aug 23, 2018 at 07:48:42PM -0700, Jacob Keller wrote:

> Odd...
> 
> What about..
> 
> - if (oidcmp(a,b))
> + if(!oideq(a,b))
>   { ... }

Nope, it doesn't like that syntactically.

> Or maybe you need to use something like
> 
>   <...
> - if (oidcmp(a,b))
> + if (!oideq(a,b))
>   ...>

Nor that (I also tried finding documentation on what exactly the angle
brackets mean, but couldn't).

> Hmm. Yea, semantic patches are a bit confusing overall sometimes.
> 
> But it looks like you got something which works?

Actually, what I showed earlier does seem to have some weirdness with
else-if. But I finally stumbled on something even better:

  - oidcmp(a, b) != 0
  + !oideq(a, b)

Because of the isomorphisms that coccinelle knows about, that catches
everything we want.  Obvious ones like:

diff --git a/bisect.c b/bisect.c
index 41c56a665e..7c1d8f1a6d 100644
--- a/bisect.c
+++ b/bisect.c
@@ -595,7 +595,7 @@ static struct commit_list *skip_away(struct commit_list 
*list, int count)
 
for (i = 0; cur; cur = cur->next, i++) {
if (i == index) {
-   if (oidcmp(>item->object.oid, current_bad_oid))
+   if (!oideq(>item->object.oid, current_bad_oid))
return cur;
if (previous)
return previous;

and compound conditionals like:

diff --git a/blame.c b/blame.c
index 10d72e36dd..538d0ab1aa 100644
--- a/blame.c
+++ b/blame.c
@@ -1834,7 +1834,7 @@ void setup_scoreboard(struct blame_scoreboard *sb,

sb->revs->children.name = "children";
while (c->parents &&
-  oidcmp(>object.oid, >final->object.oid)) {
+  !oideq(>object.oid, >final->object.oid)) {
struct commit_list *l = xcalloc(1, sizeof(*l));

l->item = c;

and even non-if contexts, like:

diff --git a/sha1-file.c b/sha1-file.c
index 631f6b9dc2..d85f4e93e1 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -825,7 +825,7 @@ int check_object_signature(const struct object_id *oid, 
void *map,
 
if (map) {
hash_object_file(map, size, type, _oid);
-   return oidcmp(oid, _oid) ? -1 : 0;
+   return !oideq(oid, _oid) ? -1 : 0;
}

So I think we have a winner. I'll polish that up into patches and send
it out later tonight.

-Peff

Re: [PATCH/RFC] commit: new option to abort -a something is already staged

On Mon, Aug 20, 2018 at 8:43 AM Nguyễn Thái Ngọc Duy  wrote:
> Instead, let's handle just this problem for now. This new option
> commit.preciousDirtyIndex, if set to false, will not allow `commit -a`
> to continue if the final index is different from the existing one. I
> don't think this can be achieved with hooks because the hooks don't
> know about "-a" or different commit modes.
>

Here you say setting it to "false" enables this check but...

> +commit.preciousDirtyIndex::
> +   If some changes are partially staged in the index (i.e.
> +   "git commit -a" and "git commit" commit different changes), reject
> +   "git commit -a".
> +

Here, sounds like it is enabled by setting it to true.

Thanks,
Jake

Re: Questions about the hash function transition

Ævar Arnfjörð Bjarmason wrote:

>> Objective
>> -
>> Migrate Git from SHA-1 to a stronger hash function.
>
> Should way say "Migrate Git from SHA-1 to SHA-256" here instead?
>
> Maybe it's overly specific, i.e. really we're also describnig how /any/
> hash function transition might happen, but having just read this now
> from start to finish it takes us a really long time to mention (and at
> first, only offhand) that SHA-256 is the new hash.

I answered this question in my other reply, but my answer missed the
point.

I think it would be fine for this to say "Migrate Git from SHA-1 to a
stronger hash function (SHA-256)".  More importantly, I think the
Background section should say something about SHA-256 --- e.g. how about
replacing the sentence

  SHA-1 still possesses the other properties such as fast object
  lookup and safe error checking, but other hash functions are equally
  suitable that are believed to be cryptographically secure.

with something about SHA-256?

Rereading the background section, I see some other bits that could be
clarified, too.  It has a run-on sentence:

  Thus Git has in effect already migrated to a new hash that isn't
  SHA-1 and doesn't share its vulnerabilities, its new hash function
  just happens to produce exactly the same output for all known
  inputs, except two PDFs published by the SHAttered researchers, and
  the new implementation (written by those researchers) claims to
  detect future cryptanalytic collision attacks.

The "," after vulnerabilities should be a period, ending the sentence.
My understanding is that sha1collisiondetection's safe-hash is meant
to protect against known attacks and that the code is meant to be
adaptable for future attacks of the same kind (by updating the list of
disturbance vectors), but it doesn't claim to guard against future
novel cryptanalysis methods that haven't been published yet.

Thanks,
Jonathan

Re: [ANNOUNCE] Git v2.19.0-rc0

On Thu, Aug 23, 2018 at 5:16 PM Jeff King  wrote:
>
> On Thu, Aug 23, 2018 at 08:06:37PM -0400, Jeff King wrote:
>
> > This almost works:
> >
> >   @@
> >   expression a, b;
> >   statement s;
> >   @@
> >   - if (oidcmp(a, b)) s
> >   + if (!oideq(a, b)) s
> >
> > [...]
>
> > So I really do want some way of saying "all of the block, no matter what
> > it is". Or of leaving it out as context.
>
> Aha. The magic invocation is:
>
>   @@
>   expression a, b;
>   statement s;
>   @@
>   - if (oidcmp(a, b))
>   + if (!oideq(a, b))
>   s
>
> I would have expected that you could replace "s" with "...", but that
> does not seem to work.
>
> -Peff

Odd...

What about..

- if (oidcmp(a,b))
+ if(!oideq(a,b))
  { ... }

Or maybe you need to use something like

  <...
- if (oidcmp(a,b))
+ if (!oideq(a,b))
  ...>

Hmm. Yea, semantic patches are a bit confusing overall sometimes.

But it looks like you got something which works?

Thanks,
Jake

Re: Questions about the hash function transition

Hi,

brian m. carlson wrote:
> On Thu, Aug 23, 2018 at 04:02:51PM +0200, Ævar Arnfjörð Bjarmason wrote:

>>> 1. Add SHA-256 support to Git protocol. This is valuable and the
>>>logical next step but it is out of scope for this initial design.
>>
>> This is a non-goal according to the docs, but now that we have protocol
>> v2 in git, perhaps we could start specifying or describing how this
>> protocol extension will work?
>
> I have code that does this.  The reason is that the first stage of the
[nice explanation snipped]
> I hope to be able to spend some time documenting this in a little bit.
> I have documentation for that code in my branch, but I haven't sent it
> in yet.

Yay!

> I realize I have a lot of code that has not been sent in yet, but I also
> tend to build on my own series a lot, and I probably need to be a bit
> better about extracting reusable pieces that can go in independently
> without waiting for the previous series to land.

For what it's worth, even if it all is in one commit with message
"wip", I think I'd benefit from being able to see this code.  I can
promise not to critique it, and to only treat it as a rough
premonition of the future.

[...]
> For SHA-1, I have 0x73686131, which is "sha1", big-endian, and for
> SHA-256, I have 0x73323536, which is "s256", big-endian.  The former is
> in the codebase already; the latter, in my hash-impl branch.

I mentioned in another reply that "sha2" sounds fine.  "s256" of
course also sounds fine to me.  Thanks to Ævar for asking so that we
have the reminder to pin it down in the doc.

Thanks,
Jonathan

Re: Questions about the hash function transition

Hi,

Ævar Arnfjörð Bjarmason wrote:

> I wanted to send another series to clarify things in
> hash-function-transition.txt, but for some of the issues I don't know
> the answer, and I had some questions after giving this another read.

Thanks for looking it over!  Let's go. :)

[...]
>> Objective
>> -
>> Migrate Git from SHA-1 to a stronger hash function.
>
> Should way say "Migrate Git from SHA-1 to SHA-256" here instead?
>
> Maybe it's overly specific, i.e. really we're also describnig how /any/
> hash function transition might happen, but having just read this now
> from start to finish it takes us a really long time to mention (and at
> first, only offhand) that SHA-256 is the new hash.

Well, the objective really is to migrate to a stronger hash function,
and that we chose SHA-256 is part of the details of how we chose to do
that.  So I think this would be a misleading change.

You can tell that I'm not just trying to justify after the fact
because the initial version of the design doc at [*] already uses this
wording, and that version assumed that the hash function was going to
be SHA-256.

[*] 
https://public-inbox.org/git/20170304011251.ga26...@aiede.mtv.corp.google.com/

[...]
>> Non-Goals
>> -
>> 1. Add SHA-256 support to Git protocol. This is valuable and the
>>logical next step but it is out of scope for this initial design.
>
> This is a non-goal according to the docs, but now that we have protocol
> v2 in git, perhaps we could start specifying or describing how this
> protocol extension will work?

Yes, that would be great!  But I suspect it's cleanest to do so in a
separate doc.  That would allow clarifying this part, by pointing to
the protocol doc.

[...]
>> 3. Intermixing objects using multiple hash functions in a single
>>repository.
>
> But isn't that the goal now per "Translation table" & writing both SHA-1
> and SHA-256 versions of objects?

No, we don't write both versions of objects.  The translation records
both names of an object.

[...]
>>   - For each object format:
>> - 4-byte format identifier (e.g., 'sha1' for SHA-1)
>
> So, given that we have 4-byte limit and have decided on SHA-256 are we
> just going to call this 'sha2'?

Good question.  'sha2' sounds fine to me.  If we want to do
SHA-512/256 later, say, we'd just have to come up with a name for that
at that point (and it doesn't have to be ASCII).

> That might be confusingly ambiguous

This is a binary format.  Are you really worried that people are going
to misinterpret the magic numbers it contains?

> since SHA2 is a standard with more than just SHA-256, maybe 's256', or
> maybe we should give this 8 bytes with trailing \0s so we can have
> "SHA-1\0\0\0" and "SHA-256\0"?

For what it's worth, if that's the alternative, I'd rather have four
random bytes.

[...]
>> The loose object index is protected against concurrent writes by a
>> lock file $GIT_OBJECT_DIR/loose-object-idx.lock. To add a new loose
>> object:
>>
>> 1. Write the loose object to a temporary file, like today.
>> 2. Open loose-object-idx.lock with O_CREAT | O_EXCL to acquire the lock.
>> 3. Rename the loose object into place.
>> 4. Open loose-object-idx with O_APPEND and write the new object
>> 5. Unlink loose-object-idx.lock to release the lock.
>>
>> To remove entries (e.g. in "git pack-refs" or "git-prune"):
>>
>> 1. Open loose-object-idx.lock with O_CREAT | O_EXCL to acquire the
>>lock.
>> 2. Write the new content to loose-object-idx.lock.
>> 3. Unlink any loose objects being removed.
>> 4. Rename to replace loose-object-idx, releasing the lock.
>
> Do we expect multiple concurrent writers to poll the lock if they can't
> aquire it right away? I.e. concurrent "git commit" would block? Has this
> overall approach been benchmarked somewhere?

Git doesn't support concurrent "git commit" today.

My feeling is that if loose object writing becomes a performance
problem, we should switch to writing packfiles instead (as "git
receive-pack" already does).  So when there's a choice between better
performance of writing loose objects and simplicity, I lean toward
simplicity (though that's not absolute, there are definitely tradeoffs
to be made).

Earlier discussion about this had sharded loose object indices for
each xy/ subdir.  It was more complicated, for not much gain.

[...]
> Maybe I've missed some subtlety where that won't work, I'm just
> concerned that something that's writing a lot of objects in parallel
> will be slowed down (e.g. the likes of BFG repo cleaner).

BFG repo cleaner is an application like fast-import that is a good fit
for writing packs, not loose objects.

[...]
>> Since all operations that make new objects (e.g., "git commit") add
>> the new objects to the corresponding index, this mapping is possible
>> for all objects in the object store.
>
> Are we going to need a midx version of these mapping files? How does
> midx fit into this picture? Perhaps it's too obscure to

Re: Questions about the hash function transition

2018-08-23 Thread brian m. carlson

On Thu, Aug 23, 2018 at 04:02:51PM +0200, Ævar Arnfjörð Bjarmason wrote:
> > [...]
> > Goals
> > -
> > 1. The transition to SHA-256 can be done one local repository at a time.
> >a. Requiring no action by any other party.
> >b. A SHA-256 repository can communicate with SHA-1 Git servers
> >   (push/fetch).
> >c. Users can use SHA-1 and SHA-256 identifiers for objects
> >   interchangeably (see "Object names on the command line", below).
> >d. New signed objects make use of a stronger hash function than
> >   SHA-1 for their security guarantees.
> > 2. Allow a complete transition away from SHA-1.
> >a. Local metadata for SHA-1 compatibility can be removed from a
> >   repository if compatibility with SHA-1 is no longer needed.
> > 3. Maintainability throughout the process.
> >a. The object format is kept simple and consistent.
> >b. Creation of a generalized repository conversion tool.
> >
> > Non-Goals
> > -
> > 1. Add SHA-256 support to Git protocol. This is valuable and the
> >logical next step but it is out of scope for this initial design.
> 
> This is a non-goal according to the docs, but now that we have protocol
> v2 in git, perhaps we could start specifying or describing how this
> protocol extension will work?

I have code that does this.  The reason is that the first stage of the
transition code is to implement stage 4 of the transition: that is, a
full SHA-256 implementation without any SHA-1 support.  Implementing it
that way means that we don't have to deal with any of the SHA-1 to
SHA-256 mapping in the first stage of the code.

In order to clone an SHA-256 repo (which the testsuite is completely
broken without), you need to be able to have basic SHA-256 support in
the protocol.  I know this was a non-goal, but the alternative is a an
inability to run the testsuite using SHA-256 until all the code is
merged, which is unsuitable for development.  The transition plan also
anticipates stage 4 (full SHA-256) support before earlier stages, so
this will be required.

I hope to be able to spend some time documenting this in a little bit.
I have documentation for that code in my branch, but I haven't sent it
in yet.

I realize I have a lot of code that has not been sent in yet, but I also
tend to build on my own series a lot, and I probably need to be a bit
better about extracting reusable pieces that can go in independently
without waiting for the previous series to land.

> > [...]
> > 3. Intermixing objects using multiple hash functions in a single
> >repository.
> 
> But isn't that the goal now per "Translation table" & writing both SHA-1
> and SHA-256 versions of objects?

No, I think this statement is basically that you have to have the entire
repository use all one algorithm under the hood in the .git directory,
translation tables excluded.  I don't think that's controversial.

> > [...]
> > Pack index
> > ~~
> > Pack index (.idx) files use a new v3 format that supports multiple
> > hash functions. They have the following format (all integers are in
> > network byte order):
> >
> > - A header appears at the beginning and consists of the following:
> >   - The 4-byte pack index signature: '\377t0c'
> >   - 4-byte version number: 3
> >   - 4-byte length of the header section, including the signature and
> > version number
> >   - 4-byte number of objects contained in the pack
> >   - 4-byte number of object formats in this pack index: 2
> >   - For each object format:
> > - 4-byte format identifier (e.g., 'sha1' for SHA-1)
> 
> So, given that we have 4-byte limit and have decided on SHA-256 are we
> just going to call this 'sha2'? That might be confusingly ambiguous
> since SHA2 is a standard with more than just SHA-256, maybe 's256', or
> maybe we should give this 8 bytes with trailing \0s so we can have
> "SHA-1\0\0\0" and "SHA-256\0"?

This is the format_version field in struct git_hash_algo.

For SHA-1, I have 0x73686131, which is "sha1", big-endian, and for
SHA-256, I have 0x73323536, which is "s256", big-endian.  The former is
in the codebase already; the latter, in my hash-impl branch.

If people have objections, we can change this up until we merge the pack
index v3 code (which is not yet finished).  It needs to be unique, and
that's it.  We could specify 0x0001 and 0x0002 if we wanted,
although I feel the values I mentioned above are self-documenting, which
is desirable.

> > [...]
> > - The trailer consists of the following:
> >   - A copy of the 20-byte SHA-256 checksum at the end of the
> > corresponding packfile.
> >
> >   - 20-byte SHA-256 checksum of all of the above.
> 
> We need to update both of these to 32 byte, right? Or are we planning to
> truncate the checksums?
> 
> This seems like just a mistake when we did s/NewHash/SHA-256/g, but then
> again it was originally "20-byte NewHash checksum" ever since 752414ae43
> ("technical doc: add a design doc for hash function transition",
>

Re: [ANNOUNCE] Git v2.19.0-rc0

On Thu, Aug 23, 2018 at 08:06:37PM -0400, Jeff King wrote:

> This almost works:
> 
>   @@
>   expression a, b;
>   statement s;
>   @@
>   - if (oidcmp(a, b)) s
>   + if (!oideq(a, b)) s
>
> [...]

> So I really do want some way of saying "all of the block, no matter what
> it is". Or of leaving it out as context.

Aha. The magic invocation is:

  @@
  expression a, b;
  statement s;
  @@
  - if (oidcmp(a, b))
  + if (!oideq(a, b))
  s

I would have expected that you could replace "s" with "...", but that
does not seem to work.

-Peff

Re: [ANNOUNCE] Git v2.19.0-rc0

On Thu, Aug 23, 2018 at 07:40:49PM -0400, Jeff King wrote:

> > You can look for explicitly "if (oidcmp(...))" though. I don't know if
> > you can catch *any* use which degrades to boolean outside of an if
> > statement, but I wouldn't expect there to be too many of those?
> 
> Yeah, that was my thought, too. And I've been trying this all afternoon
> without success. Why doesn't this work:
> 
>   @@
>   expression a, b;
>   @@
>   - if (oidcmp(a, b))
>   + if (!oideq(a, b))
> 
> I get:
> 
>   Fatal error: exception Failure("minus: parse error: \n = File
>   \"contrib/coccinelle/oideq.cocci\", line 21, column 0,  charpos =
>   221\naround = '', whole content = \n")
> 
> If I do:
> 
>   - if (oidcmp(a, b)) { ... }
> 
> that seems to please the parser for the minus line. But I cannot include
> the "..." on the plus line. Clearly the "..." part should be context,
> but I can't seem to find the right syntax.

This almost works:

  @@
  expression a, b;
  statement s;
  @@
  - if (oidcmp(a, b)) s
  + if (!oideq(a, b)) s

It generates this, for example:

diff -u -p a/bisect.c b/bisect.c
--- a/bisect.c
+++ b/bisect.c
@@ -595,7 +595,7 @@ static struct commit_list *skip_away(str
 
for (i = 0; cur; cur = cur->next, i++) {
if (i == index) {
-   if (oidcmp(>item->object.oid, current_bad_oid))
+   if (!oideq(>item->object.oid, current_bad_oid))
return cur;
if (previous)
return previous;

which is what we want. But it also generates this:

diff -u -p a/bundle.c b/bundle.c
--- a/bundle.c
+++ b/bundle.c
@@ -369,25 +369,11 @@ static int write_bundle_refs(int bundle_
 * commit that is referenced by the tag, and not the tag
 * itself.
 */
-   if (oidcmp(, >item->oid)) {
-   /*
-* Is this the positive end of a range expressed
-* in terms of a tag (e.g. v2.0 from the range
-* "v1.0..v2.0")?
-*/
-   struct commit *one = 
lookup_commit_reference(the_repository,
-);
+   if (!oideq(, >item->oid)) {
+   struct commit 
*one=lookup_commit_reference(the_repository,
+  );
struct object *obj;
-
if (e->item == &(one->object)) {
-   /*
-* Need to include e->name as an
-* independent ref to the pack-objects
-* input, so that the tag is included
-* in the output; otherwise we would
-* end up triggering "empty bundle"
-* error.
-*/
obj = parse_object_or_die(, e->name);
obj->flags |= SHOWN;
add_pending_object(revs, obj, e->name);

So I really do want some way of saying "all of the block, no matter what
it is". Or of leaving it out as context.

-Peff

Re: [ANNOUNCE] Git v2.19.0-rc0

On Thu, Aug 23, 2018 at 04:30:10PM -0700, Jacob Keller wrote:

> On Thu, Aug 23, 2018 at 9:30 AM Jeff King  wrote:
> > I think that audit isn't actually too bad (but definitely not something
> > we should do for v2.19). The cocci patch I showed earlier hits most of
> > them. It misses the negated ones (e.g., "if (oidcmp(...))"). I'm not
> > sure if there's a way to ask coccinelle only for oidcmp()
> > used in a boolean context.
> >
> 
> You can look for explicitly "if (oidcmp(...))" though. I don't know if
> you can catch *any* use which degrades to boolean outside of an if
> statement, but I wouldn't expect there to be too many of those?

Yeah, that was my thought, too. And I've been trying this all afternoon
without success. Why doesn't this work:

  @@
  expression a, b;
  @@
  - if (oidcmp(a, b))
  + if (!oideq(a, b))

I get:

  Fatal error: exception Failure("minus: parse error: \n = File
  \"contrib/coccinelle/oideq.cocci\", line 21, column 0,  charpos =
  221\naround = '', whole content = \n")

If I do:

  - if (oidcmp(a, b)) { ... }

that seems to please the parser for the minus line. But I cannot include
the "..." on the plus line. Clearly the "..." part should be context,
but I can't seem to find the right syntax.

FWIW, I do have patches adding hasheq() and converting all of the
!oidcmp() cases. I may resort to hand-investigating each of the negated
ones, but I really feel like I should be able to do better with
coccinelle.

-Peff

Re: [ANNOUNCE] Git v2.19.0-rc0

On Thu, Aug 23, 2018 at 9:30 AM Jeff King  wrote:
> I think that audit isn't actually too bad (but definitely not something
> we should do for v2.19). The cocci patch I showed earlier hits most of
> them. It misses the negated ones (e.g., "if (oidcmp(...))"). I'm not
> sure if there's a way to ask coccinelle only for oidcmp()
> used in a boolean context.
>

You can look for explicitly "if (oidcmp(...))" though. I don't know if
you can catch *any* use which degrades to boolean outside of an if
statement, but I wouldn't expect there to be too many of those?

Thanks,
Jake

Re: [PATCH v3] range-diff: update stale summary of --no-dual-color

Junio C Hamano wrote:
> Jonathan Nieder  writes:
>> Kyle Meyer wrote:

>>> Subject: [PATCH v3] range-diff: update stale summary of --no-dual-color
[...]
>> Reviewed-by: Jonathan Nieder 
>
> Sorry, too late.  I'll revert the merge of the previous round out of
> 'next' and requeue this one, but that will have to wait until the
> next integration cycle.

Thanks for the heads up.  Sounds like a fine plan.

Jonathan

Re: [PATCH v3] range-diff: update stale summary of --no-dual-color

Jonathan Nieder  writes:

> Kyle Meyer wrote:
>
>> 275267937b (range-diff: make dual-color the default mode, 2018-08-13)
>> replaced --dual-color with --no-dual-color but left the option's
>> summary untouched.  Rewrite the summary to describe --no-dual-color
>> rather than dual-color.
>>
>> Helped-by: Jonathan Nieder 
>
> Ha, I don't think I deserve much credit here, but I'll take it. ;-)
>
>> Helped-by: Johannes Schindelin 
>> Signed-off-by: Kyle Meyer 
>> ---
>>  builtin/range-diff.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> Reviewed-by: Jonathan Nieder 

Sorry, too late.  I'll revert the merge of the previous round out of
'next' and requeue this one, but that will have to wait until the
next integration cycle.

Re: [PATCH v3] range-diff: update stale summary of --no-dual-color

Kyle Meyer wrote:

> 275267937b (range-diff: make dual-color the default mode, 2018-08-13)
> replaced --dual-color with --no-dual-color but left the option's
> summary untouched.  Rewrite the summary to describe --no-dual-color
> rather than dual-color.
>
> Helped-by: Jonathan Nieder 

Ha, I don't think I deserve much credit here, but I'll take it. ;-)

> Helped-by: Johannes Schindelin 
> Signed-off-by: Kyle Meyer 
> ---
>  builtin/range-diff.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Jonathan Nieder

[PATCH v3] range-diff: update stale summary of --no-dual-color

2018-08-23 Thread Kyle Meyer

275267937b (range-diff: make dual-color the default mode, 2018-08-13)
replaced --dual-color with --no-dual-color but left the option's
summary untouched.  Rewrite the summary to describe --no-dual-color
rather than dual-color.

Helped-by: Jonathan Nieder 
Helped-by: Johannes Schindelin 
Signed-off-by: Kyle Meyer 
---
 builtin/range-diff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/range-diff.c b/builtin/range-diff.c
index f52d45d9d6..0aa9bed41f 100644
--- a/builtin/range-diff.c
+++ b/builtin/range-diff.c
@@ -25,7 +25,7 @@ int cmd_range_diff(int argc, const char **argv, const char 
*prefix)
OPT_INTEGER(0, "creation-factor", _factor,
N_("Percentage by which creation is weighted")),
OPT_BOOL(0, "no-dual-color", _color,
-   N_("color both diff and diff-between-diffs")),
+   N_("use simple diff colors")),
OPT_END()
};
int i, j, res = 0;
-- 
2.18.0

Re: [PATCH] range-diff: update stale summary of --no-dual-color

Kyle Meyer  writes:

> Junio C Hamano  writes:
>
>> Johannes Schindelin  writes:
>
> [...]
>
 -  N_("color both diff and diff-between-diffs")),
 +  N_("restrict coloring to outer diff markers")),
>>>
>>> How about "use simple diff colors" instead?
>
> That's certainly better than the one above, and I also prefer it to
> "color only based on the diff-between-diffs" in v2.
>
>> I am wondering if it makes sense to remove the option altogether.
>> I've been trying to view the comparison of the same ranges in both
>> styles for the past few days, and I never found a reason to choose
>> "no dual color" option myself.
>
> But I like this suggestion even better.

That wasn't even a suggestion.  I just did not see the point of the
optional behaviour myself, and was soliciting concrete examples
where the --no-dual mode would help.

Re: [PATCH] range-diff: update stale summary of --no-dual-color

Hi Junio,

On Thu, 23 Aug 2018, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> > On Wed, 22 Aug 2018, Kyle Meyer wrote:
> >
> >> 275267937b (range-diff: make dual-color the default mode, 2018-08-13)
> >> replaced --dual-color with --no-dual-color but left the option's
> >> summary untouched.  Rewrite the summary to describe --no-dual-color
> >> rather than dual-color.
> >> 
> >> Signed-off-by: Kyle Meyer 
> >> ---
> >>  builtin/range-diff.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >> 
> >> diff --git a/builtin/range-diff.c b/builtin/range-diff.c
> >> index f52d45d9d6..7dc90a5ec3 100644
> >> --- a/builtin/range-diff.c
> >> +++ b/builtin/range-diff.c
> >> @@ -25,7 +25,7 @@ int cmd_range_diff(int argc, const char **argv, const 
> >> char *prefix)
> >>OPT_INTEGER(0, "creation-factor", _factor,
> >>N_("Percentage by which creation is weighted")),
> >>OPT_BOOL(0, "no-dual-color", _color,
> >> -  N_("color both diff and diff-between-diffs")),
> >> +  N_("restrict coloring to outer diff markers")),
> >
> > How about "use simple diff colors" instead?
> 
> I am wondering if it makes sense to remove the option altogether.
> I've been trying to view the comparison of the same ranges in both
> styles for the past few days, and I never found a reason to choose
> "no dual color" option myself.

We do have a track record of making decisions based on our little bubble,
don't we.

On IRC, there is at least on publicly viewable comment by a user who
preferred the simple color diff, at least in one use case:

http://colabti.org/irclogger/irclogger_log/git-devel?date=2018-07-13#l97

And I am living in my own bubble, too. I think I heard feedback regarding
range-diff from some dozen people. Multiplying 6% by the download numbers
of Git for Windows alone... that's a lot of people who can put
--no-dual-color to good use at least in *some* situations.

In short: I am hesitant to remove a feature that would help some users.

Ciao,
Dscho

Re: [PATCH] i18n: fix mistakes in translated strings

Jean-Noël Avila  writes:

> - die(_("run_command returned non-zero status while"
> + die(_("run_command returned non-zero status while "
>   "recursing in the nested submodules of %s\n."),

Obviously good.

> diff --git a/config.c b/config.c
> index 9a0b10d4bc..3461993f0a 100644
> --- a/config.c
> +++ b/config.c
> @@ -124,7 +124,7 @@ static const char include_depth_advice[] = N_(
>  "%s\n"
>  "from\n"
>  "%s\n"
> -"Do you have circular includes?");
> +"This might be due to circular includes.");

OK.

> diff --git a/sequencer.c b/sequencer.c
> index 65d371c746..84bf598c3e 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -720,7 +720,7 @@ static const char *read_author_ident(struct strbuf *buf)
>   /* dequote values and construct ident line in-place */
>   for (in = buf->buf; i < 3 && in - buf->buf < buf->len; i++) {
>   if (!skip_prefix(in, keys[i], (const char **))) {
> - warning(_("could not parse '%s' (looking for '%s'"),
> + warning(_("could not parse '%s' (looking for '%s')"),

Good.  Thanks.

Re: [PATCH] range-diff: update stale summary of --no-dual-color

2018-08-23 Thread Kyle Meyer

Junio C Hamano  writes:

> Johannes Schindelin  writes:

[...]

>>> -   N_("color both diff and diff-between-diffs")),
>>> +   N_("restrict coloring to outer diff markers")),
>>
>> How about "use simple diff colors" instead?

That's certainly better than the one above, and I also prefer it to
"color only based on the diff-between-diffs" in v2.

> I am wondering if it makes sense to remove the option altogether.
> I've been trying to view the comparison of the same ranges in both
> styles for the past few days, and I never found a reason to choose
> "no dual color" option myself.

But I like this suggestion even better.

Re: [GSoC][PATCH v6 15/20] rebase -i: rewrite write_basic_state() in C

Hi Phillip,

On Fri, 17 Aug 2018, Phillip Wood wrote:

> On 10/08/2018 17:51, Alban Gruin wrote:
> 
> > +{
> > +   const char *quiet = getenv("GIT_QUIET");
> > +
> > +   if (head_name)
> > +   write_file(rebase_path_head_name(), "%s\n", head_name);
> 
> write_file() can call die() which isn't encouraged for code in libgit.
> I'm not sure how much it matters in this case. Rewriting all these as
> 
>   if (head_name && write_message(onto, strlen(onto), rebase_path_onto(), 
> 1))
>   return -1;
> 
> is a bit tedious. An alternative would be it leave it for now and in the
> longer term move this function (and the ones above which I've just
> noticed also call write_file()) to in builtin/rebase.c (assuming that
> builtin/rebase--interactive.c and builtin/rebase.c get merged once
> they're finalized - I'm not sure if there is a plan for that or not.)

This came up in the review, and Alban said exactly what you did.

I then even dragged Peff into the discussion, as it was his idea to change
`write_file()` from returning an `int` to returning a `void` (instead of
libifying the function so that it would not `die()` in error cases and
`return 0` otherwise):

https://github.com/git/git/pull/518#discussion_r200606997

Christian Couder (one of Alban's mentors) then even jumped in and *agreed*
that libifying code "could be seen as unnecessary code churn and
rejected."

In light of these two respected community members suggesting to Alban to
go and not give a flying fish about proper error handling, I have to admit
that I am sympathetic to Alban simply using `write_file()` as-is.

I do agree with you, of course, that the over-use of `die()` in our code
base is a pretty bad thing.

But that's neither Alban's fault, nor should he be punished for the advice
he has been given.

In short: I agree with you that `write_file()` should be libified
properly, and I would suggest not to burden Alban with this (Alban, of
course you should feel free to work on this if this is something you care
about, too).

Ciao,
Dscho

Re: [PATCH] range-diff: update stale summary of --no-dual-color

Johannes Schindelin  writes:

> On Wed, 22 Aug 2018, Kyle Meyer wrote:
>
>> 275267937b (range-diff: make dual-color the default mode, 2018-08-13)
>> replaced --dual-color with --no-dual-color but left the option's
>> summary untouched.  Rewrite the summary to describe --no-dual-color
>> rather than dual-color.
>> 
>> Signed-off-by: Kyle Meyer 
>> ---
>>  builtin/range-diff.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/builtin/range-diff.c b/builtin/range-diff.c
>> index f52d45d9d6..7dc90a5ec3 100644
>> --- a/builtin/range-diff.c
>> +++ b/builtin/range-diff.c
>> @@ -25,7 +25,7 @@ int cmd_range_diff(int argc, const char **argv, const char 
>> *prefix)
>>  OPT_INTEGER(0, "creation-factor", _factor,
>>  N_("Percentage by which creation is weighted")),
>>  OPT_BOOL(0, "no-dual-color", _color,
>> -N_("color both diff and diff-between-diffs")),
>> +N_("restrict coloring to outer diff markers")),
>
> How about "use simple diff colors" instead?

I am wondering if it makes sense to remove the option altogether.
I've been trying to view the comparison of the same ranges in both
styles for the past few days, and I never found a reason to choose
"no dual color" option myself.

Re: [PATCH] range-diff: update stale summary of --no-dual-color

Hi,

Johannes Schindelin wrote:
> On Wed, 22 Aug 2018, Jonathan Nieder wrote:

>>  OPT_INTEGER(0, "creation-factor", _factor,
>>  N_("Percentage by which creation is weighted")),
>> -OPT_BOOL(0, "no-dual-color", _color,
>> -N_("color both diff and diff-between-diffs")),
>> +OPT_BOOL(0, "dual-color", _color,
>> +N_("color both diff and diff-between-diffs 
>> (default)")),
>
> There is one very good reason *not* to do that. And that reason is the
> output of `git range-diff -h`. If anybody read that the option
> `--dual-color` exists, they are prone to believe that the default is *not*
> dual color. In contrast, when reading `--no-dual-color`, it is clear that
> dual color mode is the default.

The whole patch is about "git range-diff -h" output, and of course I
tested it.  Did you see the "(default)" part of the string in the
patch?

That said, the conversation continued and I agree with the conclusion
it led to (which is better than the patch you're replying to).

Thanks,
Jonathan

Re: [PATCH v2] range-diff: update stale summary of --no-dual-color

Kyle Meyer wrote:

>   OPT_BOOL(0, "no-dual-color", _color,
> - N_("color both diff and diff-between-diffs")),
> + N_("color only based on the diff-between-diffs")),

Reviewed-by: Jonathan Nieder 

Dscho's suggestion "use simple diff colors" also sounds fine to me
(probably even better).

Thanks,
Jonathan

Re: [GSoC][PATCH v6 18/20] rebase--interactive2: rewrite the submodes of interactive rebase in C

Hi Junio,

On Wed, 22 Aug 2018, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> > I made this same mistake over and over again, myself. For some reason,
> > John Keeping decided to use the singular form "revision" in 1e0dacdbdb75
> > (rebase: omit patch-identical commits with --fork-point, 2014-07-16), not
> > the plural.
> 
> Perhaps we should give a synonym to the option?  Renaming it to the
> plural form may keep the existing usage working as it would be the
> unique abbreviation, which may be a way to reduce the mistake.

The obvious plan is to switch from spawning a separate process after
parsing the `git rebase` options just to execute the interactive rebase to
performing both parts in the same process.

In other words: this option will simply go away.

I'd much rather spend time and effort on designing a nice API for calling
the rebase backends in-process than on adding code that adds some plural
form (which might not even be appropriate, given that you really can only
pass one single negative revision to restrict the commit range).

Ciao,
Dscho

> 
> >
> > So you will need to squash this in:
> >
> > -- snipsnap --
> > diff --git a/git-legacy-rebase.sh b/git-legacy-rebase.sh
> > index fb0395af5b1..7600765f541 100755
> > --- a/git-legacy-rebase.sh
> > +++ b/git-legacy-rebase.sh
> > @@ -145,8 +145,8 @@ run_interactive () {
> > test -n "$autosquash" && autosquash="--autosquash"
> > test -n "$verbose" && verbose="--verbose"
> > test -n "$force_rebase" && force_rebase="--no-ff"
> > -   test -n "$restrict_revisions" && \
> > -   restrict_revisions="--restrict-revisions=^$restrict_revisions"
> > +   test -n "$restrict_revision" && \
> > +   restrict_revision="--restrict-revision=^$restrict_revision"
> > test -n "$upstream" && upstream="--upstream=$upstream"
> > test -n "$onto" && onto="--onto=$onto"
> > test -n "$squash_onto" && squash_onto="--squash-onto=$squash_onto"
>

[PATCH] i18n: fix mistakes in translated strings

2018-08-23 Thread Jean-Noël Avila

Fix typos and convert a question which does not expect to be replied
to a simple advice.

Signed-off-by: Jean-Noël Avila 
---
 builtin/submodule--helper.c | 2 +-
 config.c| 2 +-
 sequencer.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 2bcc70fdfe..b56028ba9d 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -542,7 +542,7 @@ static void runcommand_in_submodule_cb(const struct 
cache_entry *list_item,
argv_array_pushv(, info->argv);
 
if (run_command())
-   die(_("run_command returned non-zero status while"
+   die(_("run_command returned non-zero status while "
"recursing in the nested submodules of %s\n."),
displaypath);
}
diff --git a/config.c b/config.c
index 9a0b10d4bc..3461993f0a 100644
--- a/config.c
+++ b/config.c
@@ -124,7 +124,7 @@ static const char include_depth_advice[] = N_(
 "  %s\n"
 "from\n"
 "  %s\n"
-"Do you have circular includes?");
+"This might be due to circular includes.");
 static int handle_path_include(const char *path, struct config_include_data 
*inc)
 {
int ret = 0;
diff --git a/sequencer.c b/sequencer.c
index 65d371c746..84bf598c3e 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -720,7 +720,7 @@ static const char *read_author_ident(struct strbuf *buf)
/* dequote values and construct ident line in-place */
for (in = buf->buf; i < 3 && in - buf->buf < buf->len; i++) {
if (!skip_prefix(in, keys[i], (const char **))) {
-   warning(_("could not parse '%s' (looking for '%s'"),
+   warning(_("could not parse '%s' (looking for '%s')"),
rebase_path_author_script(), keys[i]);
return NULL;
}
-- 
2.18.0

Re: [PATCH v2 1/1] t2024: mark a `checkout -p` test as requiring Perl

Hi Ævar,

On Thu, 23 Aug 2018, Ævar Arnfjörð Bjarmason wrote:

> 
> On Thu, Aug 23 2018, Johannes Schindelin via GitGitGadget wrote:
> 
> > From: Johannes Schindelin 
> >
> > A recently-added test case tries to verify that the output of `checkout
> > -p` contains a certain piece of advice.
> >
> > But if Git was built without Perl and therefore lacks support for `git
> > add -i`, the error output contains the hint that `-p` is not even
> > available instead.
> >
> > Let's just skip that test case altogether if Git was built with NO_PERL.
> >
> > Signed-off-by: Johannes Schindelin 
> > ---
> >  t/t2024-checkout-dwim.sh | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/t/t2024-checkout-dwim.sh b/t/t2024-checkout-dwim.sh
> > index 26dc3f1fc0..29e1e25300 100755
> > --- a/t/t2024-checkout-dwim.sh
> > +++ b/t/t2024-checkout-dwim.sh
> > @@ -76,7 +76,8 @@ test_expect_success 'checkout of branch from multiple 
> > remotes fails #1' '
> > test_branch master
> >  '
> >
> > -test_expect_success 'checkout of branch from multiple remotes fails with 
> > advice' '
> > +test_expect_success NO_PERL \
> > +   'checkout of branch from multiple remotes fails with advice' '
> > git checkout -B master &&
> > test_might_fail git branch -D foo &&
> > test_must_fail git checkout foo 2>stderr &&
> 
> This issue is already fixed in master as 3338e9950e ("t2024: mark test
> using "checkout -p" with PERL prerequisite", 2018-08-18).

Excellent,
Dscho

Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-23 Thread Derrick Stolee


On 8/23/2018 2:53 PM, Jeff King wrote:

On Thu, Aug 23, 2018 at 06:26:58AM -0400, Derrick Stolee wrote:


I think you can safely
ignore the rest of it if you are otherwise occupied. Even if v2.19 ships
without some mitigation, I don't know that it's all that big a deal,
given the numbers I generated (which for some reason are less dramatic
than Stolee's).

My numbers may be more dramatic because my Linux environment is a virtual
machine.

If you have a chance, can you run p0001 on my patch (compared to
2.19-rc0, or to both v2.18 and v2.19-rc0)? It would be nice to double
check that it really is fixing the problem you saw.


Sure. Note: I had to create a new Linux VM on a different machine 
between Tuesday and today, so the absolute numbers are different.


Using git/git:

Test  v2.18.0   v2.19.0-rc0 HEAD
-
0001.2:   3.10(3.02+0.08)   3.27(3.17+0.09) +5.5% 3.14(3.02+0.11) +1.3%


Using torvalds/linux:

Test v2.18.0 v2.19.0-rc0   HEAD
--
0001.2:  56.08(45.91+1.50)   56.60(46.62+1.50) +0.9% 54.61(45.47+1.46) -2.6%


Now here is where I get on my soapbox (and create a TODO for myself 
later). I ran the above with GIT_PERF_REPEAT_COUNT=10, which intuitively 
suggests that the results should be _more_ accurate than the default of 
3. However, I then remember that we only report the *minimum* time from 
all the runs, which is likely to select an outlier from the 
distribution. To test this, I ran a few tests manually and found the 
variation between runs to be larger than 3%.


When I choose my own metrics for performance tests, I like to run at 
least 10 runs, remove the largest AND smallest runs from the samples, 
and then take the average. I did this manually for 'git rev-list --all 
--objects' on git/git and got the following results:


v2.18.0    v2.19.0-rc0   HEAD

3.126 s    3.308 s   3.170 s

For full disclosure, here is a full table including all samples:

|  | v2.18.0 | v2.19.0-rc0 | HEAD    |
|--|-|-|-|
|  | 4.58    | 3.302   | 3.239   |
|  | 3.13    | 3.337   | 3.133   |
|  | 3.213   | 3.291   | 3.159   |
|  | 3.219   | 3.318   | 3.131   |
|  | 3.077   | 3.302   | 3.163   |
|  | 3.074   | 3.328   | 3.119   |
|  | 3.022   | 3.277   | 3.125   |
|  | 3.083   | 3.259   | 3.203   |
|  | 3.057   | 3.311   | 3.223   |
|  | 3.155   | 3.413   | 3.225   |
| Max  | 4.58    | 3.413   | 3.239   |
| Min  | 3.022   | 3.259   | 3.119   |
| Avg* | 3.126   | 3.30825 | 3.17025 |

(Note that the largest one was the first run, on v2.18.0, which is due 
to a cold disk.)


I just kicked off a script that will run this test on the Linux repo 
while I drive home. I'll be able to report a similar table of data easily.


My TODO is to consider aggregating the data this way (or with a median) 
instead of reporting the minimum.


Thanks,

-Stolee

Re: [PATCH v3 3/5] tests: use shorter here-docs in chainlint.sed for AIX sed

2018-08-23 Thread Eric Sunshine

On Thu, Aug 23, 2018 at 4:36 PM Ævar Arnfjörð Bjarmason
 wrote:
> As noted in [1] there's still a remaining recently introduced
> portability issue also introduced in 878f988350 ("t/test-lib: teach
> --chain-lint to detect broken &&-chains in subshells", 2018-07-11), so
> under AIX the tests must be run with GIT_TEST_CHAIN_LINT=0.
>
> I don't know how to solve the other issue, and this gets us some of
> the way to GIT_TEST_CHAIN_LINT=1 working again on AIX.

Does unindenting the comment, as I suggested in [1], fix the remaining
problem for you?

[1]: 
https://public-inbox.org/git/CAPig+cTTbU5HFMKgNyrxTp3+kcK46-Fn=4zh6zdt1oqchac...@mail.gmail.com/

Re: [PATCH] range-diff: update stale summary of --no-dual-color

Hi Kyle,

On Wed, 22 Aug 2018, Kyle Meyer wrote:

> 275267937b (range-diff: make dual-color the default mode, 2018-08-13)
> replaced --dual-color with --no-dual-color but left the option's
> summary untouched.  Rewrite the summary to describe --no-dual-color
> rather than dual-color.
> 
> Signed-off-by: Kyle Meyer 
> ---
>  builtin/range-diff.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/builtin/range-diff.c b/builtin/range-diff.c
> index f52d45d9d6..7dc90a5ec3 100644
> --- a/builtin/range-diff.c
> +++ b/builtin/range-diff.c
> @@ -25,7 +25,7 @@ int cmd_range_diff(int argc, const char **argv, const char 
> *prefix)
>   OPT_INTEGER(0, "creation-factor", _factor,
>   N_("Percentage by which creation is weighted")),
>   OPT_BOOL(0, "no-dual-color", _color,
> - N_("color both diff and diff-between-diffs")),
> + N_("restrict coloring to outer diff markers")),

How about "use simple diff colors" instead?

Ciao,
Johannes

Re: [PATCH] t/lib-rebase.sh: support explicit 'pick' commands in 'fake_editor.sh'

Hi Gábor,

On Thu, 23 Aug 2018, SZEDER Gábor wrote:

> The verbose output of the test 'reword without issues functions as
> intended' in 't3423-rebase-reword.sh', added in a9279c6785 (sequencer:
> do not squash 'reword' commits when we hit conflicts, 2018-06-19),
> contains the following error output:
> 
>   sed: -e expression #1, char 2: extra characters after command
> 
> This error comes from within the 'fake-editor.sh' script created by
> 'lib-rebase.sh's set_fake_editor() function, and the root cause is the
> FAKE_LINES="pick 1 reword 2" variable in the test in question, in
> particular the "pick" word.  'fake-editor.sh' assumes 'pick' to be the
> default rebase command and doesn't support an explicit 'pick' command
> in FAKE_LINES.  As a result, 'pick' will be used instead of a line
> number when assembling the following 'sed' script:
> 
>   sed -n picks/^pick/pick/p
> 
> which triggers the aforementioned error.
> 
> Luckily, this didn't affect the test's correctness: the erroring 'sed'
> command doesn't write anything to the todo script, and processing the
> rest of FAKE_LINES generates the desired todo script, as if that
> 'pick' command were not there at all.
> 
> The minimal fix would be to remove the 'pick' word from FAKE_LINES,
> but that would leave us susceptible to similar issues in the future.
> 
> Instead, teach the fake-editor script to recognize an explicit 'pick'
> command, which is still a fairly trivial change.
> 
> In the future we might want to consider reinforcing this fake editor
> script with an &&-chain and stricter parsing of the FAKE_LINES
> variable (e.g. to error out when encountering unknown rebase commands
> or commands and line numbers in the wrong order).
> 
> Signed-off-by: SZEDER Gábor 

ACK!

Thank you very much,
Dscho

> ---
>  t/lib-rebase.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/t/lib-rebase.sh b/t/lib-rebase.sh
> index 25a77ee5cb..592865d019 100644
> --- a/t/lib-rebase.sh
> +++ b/t/lib-rebase.sh
> @@ -47,7 +47,7 @@ set_fake_editor () {
>   action=pick
>   for line in $FAKE_LINES; do
>   case $line in
> - squash|fixup|edit|reword|drop)
> + pick|squash|fixup|edit|reword|drop)
>   action="$line";;
>   exec*)
>   echo "$line" | sed 's/_/ /g' >> "$1";;
> -- 
> 2.19.0.rc0.136.gd2dd172e64
> 
>

Re: [PATCH v3 5/5] tests: fix and add lint for non-portable grep --file

Ævar Arnfjörð Bjarmason   writes:

> The --file option to grep isn't in POSIX[1], but -f is[1]. Let's check
> for that in the future, and fix the portability regression in
> f237c8b6fe ("commit-graph: implement git-commit-graph write",
> 2018-04-02) that broke e.g. AIX.
>
> 1. http://pubs.opengroup.org/onlinepubs/009695399/utilities/grep.html
>
> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---

Thanks.

>  t/check-non-portable-shell.pl | 1 +
>  t/t5318-commit-graph.sh   | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/t/check-non-portable-shell.pl b/t/check-non-portable-shell.pl
> index 75f38298d7..b45bdac688 100755
> --- a/t/check-non-portable-shell.pl
> +++ b/t/check-non-portable-shell.pl
> @@ -43,6 +43,7 @@ sub err {
>   /\bwc -l.*"\s*=/ and err '`"$(wc -l)"` is not portable (use 
> test_line_count)';
>   /\bhead\s+-c\b/ and err 'head -c is not portable (use test_copy_bytes 
> BYTES out)';
>   /(?:\$\(seq|^\s*seq\b)/ and err 'seq is not portable (use test_seq)';
> + /\bgrep\b.*--file\b/ and err 'grep --file FILE is not portable (use 
> grep -f FILE)';
>   /\bexport\s+[A-Za-z0-9_]*=/ and err '"export FOO=bar" is not portable 
> (use FOO=bar && export FOO)';
>   /^\s*([A-Z0-9_]+=(\w+|(["']).*?\3)\s+)+(\w+)/ and exists($func{$4}) and
>   err '"FOO=bar shell_func" assignment extends beyond 
> "shell_func"';
> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
> index 3c1ffad491..0c500f7ca2 100755
> --- a/t/t5318-commit-graph.sh
> +++ b/t/t5318-commit-graph.sh
> @@ -134,7 +134,7 @@ test_expect_success 'Add one more commit' '
>   git branch commits/8 &&
>   ls $objdir/pack | grep idx >existing-idx &&
>   git repack &&
> - ls $objdir/pack| grep idx | grep -v --file=existing-idx >new-idx
> + ls $objdir/pack| grep idx | grep -v -f existing-idx >new-idx
>  '
>  
>  # Current graph structure:

Re: [PATCH v3 3/5] tests: use shorter here-docs in chainlint.sed for AIX sed

Ævar Arnfjörð Bjarmason   writes:

> Improve the portability of chainlint by using shorter here-docs. On
> AIX sed will complain about:
>
> sed: 0602-417 The label :hereslurp is greater than eight
> characters
>
> As noted in [1] there's still a remaining recently introduced
> portability issue also introduced in 878f988350 ("t/test-lib: teach
> --chain-lint to detect broken &&-chains in subshells", 2018-07-11), so
> under AIX the tests must be run with GIT_TEST_CHAIN_LINT=0.
>
> I don't know how to solve the other issue, and this gets us some of
> the way to GIT_TEST_CHAIN_LINT=1 working again on AIX.
>
> 1. https://public-inbox.org/git/871sapezba@evledraar.gmail.com/
>
> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---

I'll globally do s/here-doc/label/ while queueing.

POSIX says "The implementation shall support label arguments
recognized as unique up to at least 8 bytes", so replacing these
labels to shorter strings makes perfect sense.

Will queue; thanks.

>  t/chainlint.sed | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/t/chainlint.sed b/t/chainlint.sed
> index 8544df38df..2333705b27 100644
> --- a/t/chainlint.sed
> +++ b/t/chainlint.sed
> @@ -97,11 +97,11 @@
>  /<<[ ]*[-\\']*[A-Za-z0-9_]/ {
>   s/^\(.*\)<<[]*[-\\']*\([A-Za-z0-9_][A-Za-z0-9_]*\)'*/<\2>\1<   s/[ ]*< - :hereslurp
> + :hered
>   N
>   /^<\([^>]*\)>.*\n[  ]*\1[   ]*$/!{
>   s/\n.*$//
> - bhereslurp
> + bhered
>   }
>   s/^<[^>]*>//
>   s/\n.*$//
> @@ -283,11 +283,11 @@ bfolded
>  :heredoc
>  s/^\(.*\)<<[ ]*[-\\']*\([A-Za-z0-9_][A-Za-z0-9_]*\)'*/<\2>\1<  s/[  ]*< -:hereslurpsub
> +:heredsub
>  N
>  /^<\([^>]*\)>.*\n[   ]*\1[   ]*$/!{
>   s/\n.*$//
> - bhereslurpsub
> + bheredsub
>  }
>  s/^<[^>]*>//
>  s/\n.*$//

[PATCH v3 1/5] tests: fix and add lint for non-portable head -c N

The "head -c BYTES" option is non-portable (not in POSIX[1]). Change
such invocations to use the test_copy_bytes wrapper added in
48860819e8 ("t9300: factor out portable "head -c" replacement",
2016-06-30).

This fixes a test added in 9d2e330b17 ("ewah_read_mmap: bounds-check
mmap reads", 2018-06-14), which has been breaking
t5310-pack-bitmaps.sh on OpenBSD since 2.18.0. The OpenBSD ports
already have a similar workaround after their upgrade to 2.18.0[2].

I have not tested this on IRIX, but according to 4de0bbd898 ("t9300:
use perl "head -c" clone in place of "dd bs=1 count=16000" kluge",
2010-12-13) this invocation would have broken things there too.

Also, change a valgrind-specific codepath in test-lib.sh to use this
wrapper. Given where valgrind runs I don't think this would ever
become a portability issue in practice, but it's easier to just use
the wrapper than introduce some exception for the "make test-lint"
check being added here.

1. http://pubs.opengroup.org/onlinepubs/9699919799/utilities/head.html
2. 
https://github.com/openbsd/ports/commit/08d5d82eaefe5cf2f125ecc0c6a57df9cf91350c#diff-f7d3c4fabeed1691620d608f1534f5e5

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/check-non-portable-shell.pl | 1 +
 t/t5310-pack-bitmaps.sh   | 2 +-
 t/test-lib.sh | 4 ++--
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/t/check-non-portable-shell.pl b/t/check-non-portable-shell.pl
index d5823f71d8..c8f10d40a1 100755
--- a/t/check-non-portable-shell.pl
+++ b/t/check-non-portable-shell.pl
@@ -41,6 +41,7 @@ sub err {
/^\s*[^#]\s*which\s/ and err 'which is not portable (use type)';
/\btest\s+[^=]*==/ and err '"test a == b" is not portable (use =)';
/\bwc -l.*"\s*=/ and err '`"$(wc -l)"` is not portable (use 
test_line_count)';
+   /\bhead\s+-c\b/ and err 'head -c is not portable (use test_copy_bytes 
BYTES out)';
/\bexport\s+[A-Za-z0-9_]*=/ and err '"export FOO=bar" is not portable 
(use FOO=bar && export FOO)';
/^\s*([A-Z0-9_]+=(\w+|(["']).*?\3)\s+)+(\w+)/ and exists($func{$4}) and
err '"FOO=bar shell_func" assignment extends beyond 
"shell_func"';
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index 557bd0d0c0..7bff7923f2 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -335,7 +335,7 @@ test_expect_success 'truncated bitmap fails gracefully' '
git rev-list --use-bitmap-index --count --all >expect &&
bitmap=$(ls .git/objects/pack/*.bitmap) &&
test_when_finished "rm -f $bitmap" &&
-   head -c 512 <$bitmap >$bitmap.tmp &&
+   test_copy_bytes 512 <$bitmap >$bitmap.tmp &&
mv -f $bitmap.tmp $bitmap &&
git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
test_cmp expect actual &&
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 8bb0f4348e..44288cbb59 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -867,7 +867,7 @@ then
# handle only executables, unless they are shell libraries that
# need to be in the exec-path.
test -x "$1" ||
-   test "# " = "$(head -c 2 <"$1")" ||
+   test "# " = "$(test_copy_bytes 2 <"$1")" ||
return;
 
base=$(basename "$1")
@@ -882,7 +882,7 @@ then
# do not override scripts
if test -x "$symlink_target" &&
test ! -d "$symlink_target" &&
-   test "#!" != "$(head -c 2 < "$symlink_target")"
+   test "#!" != "$(test_copy_bytes 2 <"$symlink_target")"
then
symlink_target=../valgrind.sh
fi
-- 
2.18.0.865.gffc8e1a3cd6

[PATCH v3 3/5] tests: use shorter here-docs in chainlint.sed for AIX sed

Improve the portability of chainlint by using shorter here-docs. On
AIX sed will complain about:

sed: 0602-417 The label :hereslurp is greater than eight
characters

As noted in [1] there's still a remaining recently introduced
portability issue also introduced in 878f988350 ("t/test-lib: teach
--chain-lint to detect broken &&-chains in subshells", 2018-07-11), so
under AIX the tests must be run with GIT_TEST_CHAIN_LINT=0.

I don't know how to solve the other issue, and this gets us some of
the way to GIT_TEST_CHAIN_LINT=1 working again on AIX.

1. https://public-inbox.org/git/871sapezba@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/chainlint.sed | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/t/chainlint.sed b/t/chainlint.sed
index 8544df38df..2333705b27 100644
--- a/t/chainlint.sed
+++ b/t/chainlint.sed
@@ -97,11 +97,11 @@
 /<<[   ]*[-\\']*[A-Za-z0-9_]/ {
s/^\(.*\)<<[]*[-\\']*\([A-Za-z0-9_][A-Za-z0-9_]*\)'*/<\2>\1<]*\)>.*\n[  ]*\1[   ]*$/!{
s/\n.*$//
-   bhereslurp
+   bhered
}
s/^<[^>]*>//
s/\n.*$//
@@ -283,11 +283,11 @@ bfolded
 :heredoc
 s/^\(.*\)<<[   ]*[-\\']*\([A-Za-z0-9_][A-Za-z0-9_]*\)'*/<\2>\1<]*\)>.*\n[ ]*\1[   ]*$/!{
s/\n.*$//
-   bhereslurpsub
+   bheredsub
 }
 s/^<[^>]*>//
 s/\n.*$//
-- 
2.18.0.865.gffc8e1a3cd6

[PATCH v3 4/5] tests: fix version-specific portability issue in Perl JSON

The test guarded by PERLJSON added in 75459410ed ("json_writer: new
routines to create JSON data", 2018-07-13) assumed that a JSON boolean
value like "true" or "false" would be represented as "1" or "0" in
Perl.

This behavior can't be relied upon, e.g. with JSON.pm 2.50 and
JSON::PP A JSON::PP::Boolean object will be represented as "true" or
"false". To work around this let's check if we have any refs left
after we check for hashes and arrays, assume those are JSON objects,
and coerce them to a known boolean value.

The behavior of this test still looks odd to me. Why implement our own
ad-hoc encoder just for some one-off test, as opposed to say Perl's
own Data::Dumper with Sortkeys et al? But with this change it works,
so let's leave it be.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t0019/parse_json.perl | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/t/t0019/parse_json.perl b/t/t0019/parse_json.perl
index ca4e5bfa78..fea87fb81b 100644
--- a/t/t0019/parse_json.perl
+++ b/t/t0019/parse_json.perl
@@ -34,6 +34,9 @@ sub dump_item {
 } elsif (ref($value) eq 'HASH') {
print "$label_in hash\n";
dump_hash($label_in, $value);
+} elsif (ref $value) {
+   my $bool = $value ? 1 : 0;
+   print "$label_in $bool\n";
 } elsif (defined $value) {
print "$label_in $value\n";
 } else {
-- 
2.18.0.865.gffc8e1a3cd6

[PATCH v3 2/5] tests: fix and add lint for non-portable seq

The seq command is not in POSIX, and doesn't exist on
e.g. OpenBSD. We've had the test_seq wrapper since d17cf5f3a3 ("tests:
Introduce test_seq", 2012-08-04), but use of it keeps coming back,
e.g. in the recently added "fetch negotiator" tests being added here.

So let's also add a check to "make test-lint". The regex is aiming to
capture the likes of $(seq ..) and "seq" as a stand-alone command,
without capturing some existing cases where we e.g. have files called
"seq", as \bseq\b would do.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/check-non-portable-shell.pl|  1 +
 t/t5552-skipping-fetch-negotiator.sh | 12 ++--
 t/t5703-upload-pack-ref-in-want.sh   |  4 ++--
 3 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/t/check-non-portable-shell.pl b/t/check-non-portable-shell.pl
index c8f10d40a1..75f38298d7 100755
--- a/t/check-non-portable-shell.pl
+++ b/t/check-non-portable-shell.pl
@@ -42,6 +42,7 @@ sub err {
/\btest\s+[^=]*==/ and err '"test a == b" is not portable (use =)';
/\bwc -l.*"\s*=/ and err '`"$(wc -l)"` is not portable (use 
test_line_count)';
/\bhead\s+-c\b/ and err 'head -c is not portable (use test_copy_bytes 
BYTES out)';
+   /(?:\$\(seq|^\s*seq\b)/ and err 'seq is not portable (use test_seq)';
/\bexport\s+[A-Za-z0-9_]*=/ and err '"export FOO=bar" is not portable 
(use FOO=bar && export FOO)';
/^\s*([A-Z0-9_]+=(\w+|(["']).*?\3)\s+)+(\w+)/ and exists($func{$4}) and
err '"FOO=bar shell_func" assignment extends beyond 
"shell_func"';
diff --git a/t/t5552-skipping-fetch-negotiator.sh 
b/t/t5552-skipping-fetch-negotiator.sh
index 5ad5bece55..30857b84a8 100755
--- a/t/t5552-skipping-fetch-negotiator.sh
+++ b/t/t5552-skipping-fetch-negotiator.sh
@@ -46,7 +46,7 @@ test_expect_success 'commits with no parents are sent 
regardless of skip distanc
test_commit -C server to_fetch &&
 
git init client &&
-   for i in $(seq 7)
+   for i in $(test_seq 7)
do
test_commit -C client c$i
done &&
@@ -89,7 +89,7 @@ test_expect_success 'when two skips collide, favor the larger 
one' '
test_commit -C server to_fetch &&
 
git init client &&
-   for i in $(seq 11)
+   for i in $(test_seq 11)
do
test_commit -C client c$i
done &&
@@ -168,14 +168,14 @@ test_expect_success 'do not send "have" with ancestors of 
commits that server AC
test_commit -C server to_fetch &&
 
git init client &&
-   for i in $(seq 8)
+   for i in $(test_seq 8)
do
git -C client checkout --orphan b$i &&
test_commit -C client b$i.c0
done &&
-   for j in $(seq 19)
+   for j in $(test_seq 19)
do
-   for i in $(seq 8)
+   for i in $(test_seq 8)
do
git -C client checkout b$i &&
test_commit -C client b$i.c$j
@@ -205,7 +205,7 @@ test_expect_success 'do not send "have" with ancestors of 
commits that server AC
 
# fetch-pack should thus not send any more commits in the b1 branch, but
# should still send the others (in this test, just check b2).
-   for i in $(seq 0 8)
+   for i in $(test_seq 0 8)
do
have_not_sent b1.c$i
done &&
diff --git a/t/t5703-upload-pack-ref-in-want.sh 
b/t/t5703-upload-pack-ref-in-want.sh
index a73c55a47e..d1ccc22331 100755
--- a/t/t5703-upload-pack-ref-in-want.sh
+++ b/t/t5703-upload-pack-ref-in-want.sh
@@ -176,7 +176,7 @@ test_expect_success 'setup repos for 
change-while-negotiating test' '
git clone "http://127.0.0.1:$LIB_HTTPD_PORT/smart/repo; 
"$LOCAL_PRISTINE" &&
cd "$LOCAL_PRISTINE" &&
git checkout -b side &&
-   for i in $(seq 1 33); do test_commit s$i; done &&
+   for i in $(test_seq 1 33); do test_commit s$i; done &&
 
# Add novel commits to upstream
git checkout master &&
@@ -289,7 +289,7 @@ test_expect_success 'setup repos for fetching with 
ref-in-want tests' '
git clone "file://$REPO" "$LOCAL_PRISTINE" &&
cd "$LOCAL_PRISTINE" &&
git checkout -b side &&
-   for i in $(seq 1 33); do test_commit s$i; done &&
+   for i in $(test_seq 1 33); do test_commit s$i; done &&
 
# Add novel commits to upstream
git checkout master &&
-- 
2.18.0.865.gffc8e1a3cd6

[PATCH v3 0/5] OpenBSD & AIX etc. portability fixes

This grew a bit more. I'm going to stop poking at this for now. The
tests are still broken on OpenBSD (3-5 broken) and on AIX something
like 20-30 are broken, but this makes it slightly better.

Ævar Arnfjörð Bjarmason (5):
  tests: fix and add lint for non-portable head -c N
  tests: fix and add lint for non-portable seq
  tests: use shorter here-docs in chainlint.sed for AIX sed
  tests: fix version-specific portability issue in Perl JSON
  tests: fix and add lint for non-portable grep --file

 t/chainlint.sed  |  8 
 t/check-non-portable-shell.pl|  3 +++
 t/t0019/parse_json.perl  |  3 +++
 t/t5310-pack-bitmaps.sh  |  2 +-
 t/t5318-commit-graph.sh  |  2 +-
 t/t5552-skipping-fetch-negotiator.sh | 12 ++--
 t/t5703-upload-pack-ref-in-want.sh   |  4 ++--
 t/test-lib.sh|  4 ++--
 8 files changed, 22 insertions(+), 16 deletions(-)

-- 
2.18.0.865.gffc8e1a3cd6

[PATCH v3 5/5] tests: fix and add lint for non-portable grep --file

The --file option to grep isn't in POSIX[1], but -f is[1]. Let's check
for that in the future, and fix the portability regression in
f237c8b6fe ("commit-graph: implement git-commit-graph write",
2018-04-02) that broke e.g. AIX.

1. http://pubs.opengroup.org/onlinepubs/009695399/utilities/grep.html

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/check-non-portable-shell.pl | 1 +
 t/t5318-commit-graph.sh   | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/t/check-non-portable-shell.pl b/t/check-non-portable-shell.pl
index 75f38298d7..b45bdac688 100755
--- a/t/check-non-portable-shell.pl
+++ b/t/check-non-portable-shell.pl
@@ -43,6 +43,7 @@ sub err {
/\bwc -l.*"\s*=/ and err '`"$(wc -l)"` is not portable (use 
test_line_count)';
/\bhead\s+-c\b/ and err 'head -c is not portable (use test_copy_bytes 
BYTES out)';
/(?:\$\(seq|^\s*seq\b)/ and err 'seq is not portable (use test_seq)';
+   /\bgrep\b.*--file\b/ and err 'grep --file FILE is not portable (use 
grep -f FILE)';
/\bexport\s+[A-Za-z0-9_]*=/ and err '"export FOO=bar" is not portable 
(use FOO=bar && export FOO)';
/^\s*([A-Z0-9_]+=(\w+|(["']).*?\3)\s+)+(\w+)/ and exists($func{$4}) and
err '"FOO=bar shell_func" assignment extends beyond 
"shell_func"';
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 3c1ffad491..0c500f7ca2 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -134,7 +134,7 @@ test_expect_success 'Add one more commit' '
git branch commits/8 &&
ls $objdir/pack | grep idx >existing-idx &&
git repack &&
-   ls $objdir/pack| grep idx | grep -v --file=existing-idx >new-idx
+   ls $objdir/pack| grep idx | grep -v -f existing-idx >new-idx
 '
 
 # Current graph structure:
-- 
2.18.0.865.gffc8e1a3cd6

Re: [PATCH v1] read-cache: speed up index load through parallelization

2018-08-23 Thread Ben Peart





On 8/23/2018 2:06 PM, Junio C Hamano wrote:

Ben Peart  writes:


This patch helps address the CPU cost of loading the index by creating
multiple threads to divide the work of loading and converting the cache
entries across all available CPU cores.


Nice.


+int git_config_get_fast_index(void)
+{
+   int val;
+
+   if (!git_config_get_maybe_bool("core.fastindex", ))
+   return val;
+
+   if (getenv("GIT_FASTINDEX_TEST"))
+   return 1;


It probably makes sense to use git_env_bool() to be consistent,
which allows GIT_FASTINDEX_TEST=0 to turn it off after this becomes
the default.


diff --git a/read-cache.c b/read-cache.c
index 7b1354d759..0fa7e1a04c 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -24,6 +24,10 @@
  #include "utf8.h"
  #include "fsmonitor.h"
  
+#ifndef min

+#define min(a,b) (((a) < (b)) ? (a) : (b))
+#endif


Let's lose this, which is used only once, even though it could be
used elsewhere but not used (e.g. threads vs cpus near the beginning
of load_cache_entries()).



I didn't have it, then added it to make it trivial to see what was 
actually happening.  I can switch back.



+static unsigned long load_cache_entry_block(struct index_state *istate, struct 
mem_pool *ce_mem_pool, int offset, int nr, void *mmap, unsigned long 
start_offset, struct strbuf *previous_name)


Wrap and possibly add comment before the function to describe what
it does and what its parameters mean?


+{
+   int i;
+   unsigned long src_offset = start_offset;
+
+   for (i = offset; i < offset + nr; i++) {
+   struct ondisk_cache_entry *disk_ce;
+   struct cache_entry *ce;
+   unsigned long consumed;
+
+   disk_ce = (struct ondisk_cache_entry *)((char *)mmap + 
src_offset);
+   ce = create_from_disk(ce_mem_pool, disk_ce, , 
previous_name);
+   set_index_entry(istate, i, ce);
+
+   src_offset += consumed;
+   }
+   return src_offset - start_offset;
+}


OK.


+static unsigned long load_all_cache_entries(struct index_state *istate, void 
*mmap, size_t mmap_size, unsigned long src_offset)
+{


(following aloud) This "all" variant is "one thread does all", iow,
unthreaded version.  Makes sense.


+   struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
+   unsigned long consumed;
+
+   if (istate->version == 4) {
+   previous_name = _name_buf;
+   mem_pool_init(>ce_mem_pool,
+ 
estimate_cache_size_from_compressed(istate->cache_nr));
+   } else {
+   previous_name = NULL;
+   mem_pool_init(>ce_mem_pool,
+ estimate_cache_size(mmap_size, istate->cache_nr));
+   }


I count there are three instances of "if version 4 use the strbuf
for name-buf, otherwise..." in this patch, which made me wonder if
we can make them shared more and/or if it makes sense to attempt to
do so.



Actually, they are all different and all required.  One sets it up for 
the "do it all on one thread" path.  One sets it up for each thread. The 
last one is used by the primary thread when scanning for blocks to hand 
off to the child threads.



+   consumed = load_cache_entry_block(istate, istate->ce_mem_pool, 0, 
istate->cache_nr, mmap, src_offset, previous_name);
+   strbuf_release(_name_buf);
+   return consumed;
+}
+
+#ifdef NO_PTHREADS
+
+#define load_cache_entries load_all_cache_entries
+
+#else
+
+#include "thread-utils.h"
+
+/*
+* Mostly randomly chosen maximum thread counts: we
+* cap the parallelism to online_cpus() threads, and we want
+* to have at least 7500 cache entries per thread for it to
+* be worth starting a thread.
+*/
+#define THREAD_COST(7500)
+
+struct load_cache_entries_thread_data
+{
+   pthread_t pthread;
+   struct index_state *istate;
+   struct mem_pool *ce_mem_pool;
+   int offset, nr;
+   void *mmap;
+   unsigned long start_offset;
+   struct strbuf previous_name_buf;
+   struct strbuf *previous_name;
+   unsigned long consumed; /* return # of bytes in index file processed */
+};
+
+/*
+* A thread proc to run the load_cache_entries() computation
+* across multiple background threads.
+*/
+static void *load_cache_entries_thread(void *_data)
+{
+   struct load_cache_entries_thread_data *p = _data;
+
+   p->consumed += load_cache_entry_block(p->istate, p->ce_mem_pool, p->offset, 
p->nr, p->mmap, p->start_offset, p->previous_name);
+   return NULL;
+}


(following aloud) And the threaded version chews the block of ce's
given to each thread.  Makes sense.


+static unsigned long load_cache_entries(struct index_state *istate, void 
*mmap, size_t mmap_size, unsigned long src_offset)
+{
+   struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
+   struct load_cache_entries_thread_data *data;
+   int threads, cpus, thread_nr;
+   unsigned long consumed;
+

Re: [PATCH v1] read-cache: speed up index load through parallelization

2018-08-23 Thread Ben Peart





On 8/23/2018 1:31 PM, Stefan Beller wrote:

On Thu, Aug 23, 2018 at 8:45 AM Ben Peart  wrote:


This patch helps address the CPU cost of loading the index by creating
multiple threads to divide the work of loading and converting the cache
entries across all available CPU cores.

It accomplishes this by having the primary thread loop across the index file
tracking the offset and (for V4 indexes) expanding the name. It creates a
thread to process each block of entries as it comes to them. Once the
threads are complete and the cache entries are loaded, the rest of the
extensions can be loaded and processed normally on the primary thread.

Performance impact:

read cache .git/index times on a synthetic repo with:

100,000 entries
FALSE   TRUESavings %Savings
0.014798767 0.009580433 0.005218333 35.26%

1,000,000 entries
FALSE   TRUESavings %Savings
0.240896533 0.1751243   0.065772233 27.30%

read cache .git/index times on an actual repo with:

~3M entries
FALSE   TRUESavings %Savings
0.59898098  0.4513169   0.14766408  24.65%

Signed-off-by: Ben Peart 
---

Notes:
 Base Ref: master
 Web-Diff: https://github.com/benpeart/git/commit/67a700419b
 Checkout: git fetch https://github.com/benpeart/git read-index-multithread-v1 
&& git checkout 67a700419b

  Documentation/config.txt |   8 ++
  config.c |  13 +++
  config.h |   1 +
  read-cache.c | 218 ++-
  4 files changed, 216 insertions(+), 24 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 1c42364988..3344685cc4 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -899,6 +899,14 @@ relatively high IO latencies.  When enabled, Git will do 
the
  index comparison to the filesystem data in parallel, allowing
  overlapping IO's.  Defaults to true.

+core.fastIndex::
+   Enable parallel index loading
++
+This can speed up operations like 'git diff' and 'git status' especially
+when the index is very large.  When enabled, Git will do the index
+loading from the on disk format to the in-memory format in parallel.
+Defaults to true.


"fast" is a non-descriptive word as we try to be fast in any operation?
Maybe core.parallelIndexReading as that just describes what it
turns on/off, without second guessing its effects?
(Are there still computers with just a single CPU, where this would not
make it faster? ;-))



How about core.parallelReadIndex?  Slightly shorter and matches the 
function names better.





+int git_config_get_fast_index(void)
+{
+   int val;
+
+   if (!git_config_get_maybe_bool("core.fastindex", ))
+   return val;
+
+   if (getenv("GIT_FASTINDEX_TEST"))
+   return 1;


We look at this env value just before calling this function,
can be write it to only look at the evn variable once?



Sure, I didn't like the fact that it was called twice but didn't get 
around to cleaning it up.



+++ b/config.h
@@ -250,6 +250,7 @@ extern int git_config_get_untracked_cache(void);
  extern int git_config_get_split_index(void);
  extern int git_config_get_max_percent_split_change(void);
  extern int git_config_get_fsmonitor(void);
+extern int git_config_get_fast_index(void);


Oh. nd/no-extern did not cover config.h




+#ifndef min
+#define min(a,b) (((a) < (b)) ? (a) : (b))
+#endif


We do not have a minimum function in the tree,
except for xdiff/xmacros.h:29: XDL_MIN.
I wonder what the rationale is for not having a MIN()
definition, I think we discussed that on the list a couple
times but the rationale escaped me.

If we introduce a min/max macro, can we put it somewhere
more prominent? (I would find it useful elsewhere)



I'll avoid that particular rabbit hole and just remove the min macro 
definition.  ;-)



+/*
+* Mostly randomly chosen maximum thread counts: we
+* cap the parallelism to online_cpus() threads, and we want
+* to have at least 7500 cache entries per thread for it to
+* be worth starting a thread.
+*/
+#define THREAD_COST(7500)


This reads very similar to preload-index.c THREAD_COST


+   /* loop through index entries starting a thread for every thread_nr 
entries */
+   consumed = thread = 0;
+   for (i = 0; ; i++) {
+   struct ondisk_cache_entry *ondisk;
+   const char *name;
+   unsigned int flags;
+
+   /* we've reached the begining of a block of cache entries, kick 
off a thread to process them */


beginning



Thanks


Thanks,
Stefan

Re: [ANNOUNCE] Git v2.19.0-rc0

On Thu, Aug 23, 2018 at 06:26:58AM -0400, Derrick Stolee wrote:

> > I think you can safely
> > ignore the rest of it if you are otherwise occupied. Even if v2.19 ships
> > without some mitigation, I don't know that it's all that big a deal,
> > given the numbers I generated (which for some reason are less dramatic
> > than Stolee's).
> My numbers may be more dramatic because my Linux environment is a virtual
> machine.

If you have a chance, can you run p0001 on my patch (compared to
2.19-rc0, or to both v2.18 and v2.19-rc0)? It would be nice to double
check that it really is fixing the problem you saw.

-Peff

Re: [PATCH 0/9] trailer-parsing false positives

Jeff King  writes:

> So this turned into a bit of a rabbit hole. Here's what I have so far
> (which I think is not quite ready, but I wanted to start discussing).
>
> Issues (2) and (3) are actually the same issue. The caller that does the
> bogus appending for (2) is always append_signoff(). But it always has
> just a commit message (modulo some complications, which I'll get to in a
> minute), and so fixing it with respect to (3) magically solves (2).
> I.e., the code was simply not prepared for the case of "end of string is
> not end of trailers". But since that case cannot exist, we do not have
> to deal with it. :)
>
> Now here's the tricky part. I think patches 1-8 are mostly sensible.

Yeah, nothing that made me go "Huh?" in these 8 patches.  Thanks.

> So I think there may be further opportunities for cleanup here. I'm not
> sure if we'd need to retain this behavior for git-interpret-trailers.
> AFAICT it is not documented, and I suspect is mostly historical
> accident, and not anything anybody ever wanted.

I tend to think the behaviour was not designed but it "just happens
to work that way".

> If we do keep it by default, then the "--no-divider" option I added in
> patch 4 should probably get a more generic name and cover this.
> Something like "--verbatim-input".

Perhaps.  Even if this is not covered, --verbatim-input would be a
good name for the option ;-)

> I'm going to sleep on it and see how I feel tomorrow.
>
>   [1/9]: trailer: use size_t for string offsets
>   [2/9]: trailer: use size_t for iterating trailer list
>   [3/9]: trailer: pass process_trailer_opts to trailer_info_get()
>   [4/9]: interpret-trailers: tighten check for "---" patch boundary
>   [5/9]: interpret-trailers: allow suppressing "---" divider
>   [6/9]: pretty, ref-filter: format %(trailers) with no_divider option
>   [7/9]: sequencer: ignore "---" divider when parsing trailers
>   [8/9]: append_signoff: use size_t for string offsets
>   [9/9]: sequencer: handle ignore_footer when parsing trailers
>
>  Documentation/git-interpret-trailers.txt | 10 +++-
>  builtin/interpret-trailers.c |  1 +
>  commit.c |  6 +--
>  commit.h |  2 +-
>  pretty.c |  3 ++
>  ref-filter.c |  2 +
>  sequencer.c  | 20 ++--
>  sequencer.h  |  9 +++-
>  t/t4205-log-pretty-formats.sh| 23 +
>  t/t6300-for-each-ref.sh  | 23 +
>  t/t7501-commit.sh| 16 ++
>  t/t7513-interpret-trailers.sh| 42 
>  trailer.c| 62 +---
>  trailer.h|  4 +-
>  14 files changed, 184 insertions(+), 39 deletions(-)
>
> -Peff

Re: [PATCH v2 01/10] t/test-lib: teach --chain-lint to detect broken &&-chains in subshells

2018-08-23 Thread Eric Sunshine

On Thu, Aug 23, 2018 at 2:02 PM Ævar Arnfjörð Bjarmason
 wrote:
> Found in some 2.19 testing on AIX:

Thanks for reporting these issues.

> > + /^[ ]*EOF[  ]*$/!bhereslurp
>
> AIX sed doesn't like this, and will yell:
> :hereslurp is greater than eight characters
> This on top fixes it:

Fix make sense, and checking POSIX[1] , I see that it says that
behavior is unspecified for labels longer than 8 bytes.

[1]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.html

> > +# bare "(" line?
> > +/^[  ]*([]*$/ {
> > + # stash for later printing
>
> AIX sed doesn't like this either, and prints:
> sed:# stash for later printing is not a recognized function.
> I have no idea what the fix is for that one.

That's the only indented comment in the entire script, so it's almost
certainly that. How about we move the indented comment up to the
comment for the enclosing block, so it reads:

# bare "(" line? -- stash for later printing
/^[  ]*([]*$/ {
h
bnextline
}

I could prepare such a patch, although perhaps it would be better for
you to do so since you are in a position to test it (and because you
discovered the problems)?

Re: worktree duplicates, was: [PATCH] SubmittingPatches: mention doc-diff

2018-08-23 Thread Eric Sunshine

On Tue, Aug 21, 2018 at 4:43 PM Jeff King  wrote:
> On Tue, Aug 21, 2018 at 04:22:08PM -0400, Eric Sunshine wrote:
> > On Tue, Aug 21, 2018 at 3:36 PM Jeff King  wrote:
> > How about using "git clone --shared" instead?
>
> That seems even more dangerous to me, since the created clone can become
> corrupt when the parent prunes. Probably not huge for a single
> operation, but you may be surprised when you run the script a few days
> later and it barfs horribly due to a missing object.

Okay. I had thought that doc-diff was never doing anything other than
read-only operations on the checked-out worktree after the initial
creation, but, looking more closely at the script, I now see that it
can perform other Git-based operations, so what you say makes sense.

> > In the case that you've already blown away the directory, then having
> > "git worktree add" prune away the old worktree bookkeeping would make
> > sense and wouldn't lose anything (you've already thrown it away
> > manually). However, it could be lossy for the case when the directory
> > is only temporarily missing (because it's on removable media or a
> > network share).
>
> I think the removable ones already suffer from that problem (an auto-gc
> can prune them). And they should already be marked with "git worktree
> lock". That said, people don't always do what they should, and I'd
> rather not make the problem worse. :)

Hmph. I thought that "git worktree prune" had a sensible "expire"
default to protect against such cases of removable media for which
"git worktree lock" wasn't invoked, but, looking at the code, I see
that the default is TIME_MAX.

> > In this case, it might make sense for "git worktree add" to refuse to
> > operate if an existing worktree entry still points at the directory
> > that you're trying to add. That should prevent those duplicate
> > worktree entries you saw.
>
> Yes, but then what's the next step for my script? I can't "remove" since
> the worktree isn't there. I can't blow away any directory that I know
> about, since there isn't one.

I was thinking that "worktree add" could start respecting the --force
option as an escape hatch.

> I need to somehow know that an existing
> "$GIT_DIR/worktrees/foo" is the problem. But "foo" is not even
> deterministic. Looking at the duplicates, it seems to be the basename of
> the working tree, but then mutated to avoid collisions with other
> worktrees.

If the worktree directory still existed, "git -C rev-parse --git-dir"
inside the worktree would give you the proper path of
$GIT_DIR/worktrees/foo, but the directory doesn't exist, so...
nothing.

> What about refusing by default, but forcing an overwrite with "-f"?

My thought, also.

Re: [PATCH v1] read-cache: speed up index load through parallelization

Ben Peart  writes:

> This patch helps address the CPU cost of loading the index by creating
> multiple threads to divide the work of loading and converting the cache
> entries across all available CPU cores.

Nice.

> +int git_config_get_fast_index(void)
> +{
> + int val;
> +
> + if (!git_config_get_maybe_bool("core.fastindex", ))
> + return val;
> +
> + if (getenv("GIT_FASTINDEX_TEST"))
> + return 1;

It probably makes sense to use git_env_bool() to be consistent,
which allows GIT_FASTINDEX_TEST=0 to turn it off after this becomes
the default.

> diff --git a/read-cache.c b/read-cache.c
> index 7b1354d759..0fa7e1a04c 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -24,6 +24,10 @@
>  #include "utf8.h"
>  #include "fsmonitor.h"
>  
> +#ifndef min
> +#define min(a,b) (((a) < (b)) ? (a) : (b))
> +#endif

Let's lose this, which is used only once, even though it could be
used elsewhere but not used (e.g. threads vs cpus near the beginning
of load_cache_entries()).

> +static unsigned long load_cache_entry_block(struct index_state *istate, 
> struct mem_pool *ce_mem_pool, int offset, int nr, void *mmap, unsigned long 
> start_offset, struct strbuf *previous_name)

Wrap and possibly add comment before the function to describe what
it does and what its parameters mean?

> +{
> + int i;
> + unsigned long src_offset = start_offset;
> +
> + for (i = offset; i < offset + nr; i++) {
> + struct ondisk_cache_entry *disk_ce;
> + struct cache_entry *ce;
> + unsigned long consumed;
> +
> + disk_ce = (struct ondisk_cache_entry *)((char *)mmap + 
> src_offset);
> + ce = create_from_disk(ce_mem_pool, disk_ce, , 
> previous_name);
> + set_index_entry(istate, i, ce);
> +
> + src_offset += consumed;
> + }
> + return src_offset - start_offset;
> +}

OK.

> +static unsigned long load_all_cache_entries(struct index_state *istate, void 
> *mmap, size_t mmap_size, unsigned long src_offset)
> +{

(following aloud) This "all" variant is "one thread does all", iow,
unthreaded version.  Makes sense.

> + struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
> + unsigned long consumed;
> +
> + if (istate->version == 4) {
> + previous_name = _name_buf;
> + mem_pool_init(>ce_mem_pool,
> +   
> estimate_cache_size_from_compressed(istate->cache_nr));
> + } else {
> + previous_name = NULL;
> + mem_pool_init(>ce_mem_pool,
> +   estimate_cache_size(mmap_size, istate->cache_nr));
> + }

I count there are three instances of "if version 4 use the strbuf
for name-buf, otherwise..." in this patch, which made me wonder if
we can make them shared more and/or if it makes sense to attempt to
do so.

> + consumed = load_cache_entry_block(istate, istate->ce_mem_pool, 0, 
> istate->cache_nr, mmap, src_offset, previous_name);
> + strbuf_release(_name_buf);
> + return consumed;
> +}
> +
> +#ifdef NO_PTHREADS
> +
> +#define load_cache_entries load_all_cache_entries
> +
> +#else
> +
> +#include "thread-utils.h"
> +
> +/*
> +* Mostly randomly chosen maximum thread counts: we
> +* cap the parallelism to online_cpus() threads, and we want
> +* to have at least 7500 cache entries per thread for it to
> +* be worth starting a thread.
> +*/
> +#define THREAD_COST  (7500)
> +
> +struct load_cache_entries_thread_data
> +{
> + pthread_t pthread;
> + struct index_state *istate;
> + struct mem_pool *ce_mem_pool;
> + int offset, nr;
> + void *mmap;
> + unsigned long start_offset;
> + struct strbuf previous_name_buf;
> + struct strbuf *previous_name;
> + unsigned long consumed; /* return # of bytes in index file processed */
> +};
> +
> +/*
> +* A thread proc to run the load_cache_entries() computation
> +* across multiple background threads.
> +*/
> +static void *load_cache_entries_thread(void *_data)
> +{
> + struct load_cache_entries_thread_data *p = _data;
> +
> + p->consumed += load_cache_entry_block(p->istate, p->ce_mem_pool, 
> p->offset, p->nr, p->mmap, p->start_offset, p->previous_name);
> + return NULL;
> +}

(following aloud) And the threaded version chews the block of ce's
given to each thread.  Makes sense.

> +static unsigned long load_cache_entries(struct index_state *istate, void 
> *mmap, size_t mmap_size, unsigned long src_offset)
> +{
> + struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
> + struct load_cache_entries_thread_data *data;
> + int threads, cpus, thread_nr;
> + unsigned long consumed;
> + int i, thread;
> +
> + cpus = online_cpus();
> + threads = istate->cache_nr / THREAD_COST;
> + if (threads > cpus)
> + threads = cpus;

No other caller of online_cpus() is prepared to deal with faulty
return from the function (e.g. 0 or negative), so it is perfectly

Re: [PATCH v2 01/10] t/test-lib: teach --chain-lint to detect broken &&-chains in subshells



On Wed, Jul 11 2018, Eric Sunshine wrote:

Found in some 2.19 testing on AIX:

> +# here-doc -- swallow it to avoid false hits within its body (but keep the
> +# command to which it was attached)
> +/<<[ ]*[-\\]*EOF[]*/ {
> + s/[ ]*<<[   ]*[-\\]*EOF//
> + h
> + :hereslurp
> + N
> + s/.*\n//
> + /^[ ]*EOF[  ]*$/!bhereslurp
> + x
> +}

AIX sed doesn't like this, and will yell:

:hereslurp is greater than eight characters

This on top fixes it:

diff --git a/t/chainlint.sed b/t/chainlint.sed
index 8544df38df..2333705b27 100644
--- a/t/chainlint.sed
+++ b/t/chainlint.sed
@@ -100 +100 @@
-   :hereslurp
+   :hered
@@ -104 +104 @@
-   bhereslurp
+   bhered
@@ -286 +286 @@ s/[ ]*< +:subshell
> +# bare "(" line?
> +/^[  ]*([]*$/ {
> + # stash for later printing
> + h
> + bnextline
> +}
> +# "(..." line -- split off and stash "(", then process "..." as its own line

AIX sed doesn't like this either, and prints:

sed:# stash for later printing is not a recognized function.

I have no idea what the fix is for that one.

Re: [PATCH v1] read-cache: speed up index load through parallelization

2018-08-23 Thread Stefan Beller

On Thu, Aug 23, 2018 at 8:45 AM Ben Peart  wrote:
>
> This patch helps address the CPU cost of loading the index by creating
> multiple threads to divide the work of loading and converting the cache
> entries across all available CPU cores.
>
> It accomplishes this by having the primary thread loop across the index file
> tracking the offset and (for V4 indexes) expanding the name. It creates a
> thread to process each block of entries as it comes to them. Once the
> threads are complete and the cache entries are loaded, the rest of the
> extensions can be loaded and processed normally on the primary thread.
>
> Performance impact:
>
> read cache .git/index times on a synthetic repo with:
>
> 100,000 entries
> FALSE   TRUESavings %Savings
> 0.014798767 0.009580433 0.005218333 35.26%
>
> 1,000,000 entries
> FALSE   TRUESavings %Savings
> 0.240896533 0.1751243   0.065772233 27.30%
>
> read cache .git/index times on an actual repo with:
>
> ~3M entries
> FALSE   TRUESavings %Savings
> 0.59898098  0.4513169   0.14766408  24.65%
>
> Signed-off-by: Ben Peart 
> ---
>
> Notes:
> Base Ref: master
> Web-Diff: https://github.com/benpeart/git/commit/67a700419b
> Checkout: git fetch https://github.com/benpeart/git 
> read-index-multithread-v1 && git checkout 67a700419b
>
>  Documentation/config.txt |   8 ++
>  config.c |  13 +++
>  config.h |   1 +
>  read-cache.c | 218 ++-
>  4 files changed, 216 insertions(+), 24 deletions(-)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 1c42364988..3344685cc4 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -899,6 +899,14 @@ relatively high IO latencies.  When enabled, Git will do 
> the
>  index comparison to the filesystem data in parallel, allowing
>  overlapping IO's.  Defaults to true.
>
> +core.fastIndex::
> +   Enable parallel index loading
> ++
> +This can speed up operations like 'git diff' and 'git status' especially
> +when the index is very large.  When enabled, Git will do the index
> +loading from the on disk format to the in-memory format in parallel.
> +Defaults to true.

"fast" is a non-descriptive word as we try to be fast in any operation?
Maybe core.parallelIndexReading as that just describes what it
turns on/off, without second guessing its effects?
(Are there still computers with just a single CPU, where this would not
make it faster? ;-))


> +int git_config_get_fast_index(void)
> +{
> +   int val;
> +
> +   if (!git_config_get_maybe_bool("core.fastindex", ))
> +   return val;
> +
> +   if (getenv("GIT_FASTINDEX_TEST"))
> +   return 1;

We look at this env value just before calling this function,
can be write it to only look at the evn variable once?

> +++ b/config.h
> @@ -250,6 +250,7 @@ extern int git_config_get_untracked_cache(void);
>  extern int git_config_get_split_index(void);
>  extern int git_config_get_max_percent_split_change(void);
>  extern int git_config_get_fsmonitor(void);
> +extern int git_config_get_fast_index(void);

Oh. nd/no-extern did not cover config.h


>
> +#ifndef min
> +#define min(a,b) (((a) < (b)) ? (a) : (b))
> +#endif

We do not have a minimum function in the tree,
except for xdiff/xmacros.h:29: XDL_MIN.
I wonder what the rationale is for not having a MIN()
definition, I think we discussed that on the list a couple
times but the rationale escaped me.

If we introduce a min/max macro, can we put it somewhere
more prominent? (I would find it useful elsewhere)

> +/*
> +* Mostly randomly chosen maximum thread counts: we
> +* cap the parallelism to online_cpus() threads, and we want
> +* to have at least 7500 cache entries per thread for it to
> +* be worth starting a thread.
> +*/
> +#define THREAD_COST(7500)

This reads very similar to preload-index.c THREAD_COST

> +   /* loop through index entries starting a thread for every thread_nr 
> entries */
> +   consumed = thread = 0;
> +   for (i = 0; ; i++) {
> +   struct ondisk_cache_entry *ondisk;
> +   const char *name;
> +   unsigned int flags;
> +
> +   /* we've reached the begining of a block of cache entries, 
> kick off a thread to process them */

beginning

Thanks,
Stefan

Re: [PATCH] tests: fix and add lint for non-portable head -c N



On Thu, Aug 23 2018, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason   writes:
>
>> Junio: Even though this isn't a 2.19.0-rc0 regression I think it makes
>> sense for 2.19.0. The fix is trivial, and it'll unbreak (at least some
>> of) the tests on stock git on OpenBSD.
>
> I would have been a bit more sympathetic if this were recent
> regression (say, 2.18 or so), even if it is not new in this cycle,
> and the patch applied to the target track cleanly (say, maint or
> maint-2.17),

Yeah it's not a big deal if it's not in 2.19.

> but seeing that some of these are quite old, dating back to 2009, I am
> not sure I am enthused.

Note that those parts are the "while I'm at it" valgrind-specific parts,
i.e. parts of the code that likely never ran on anything except a GNU
toolchain.

> The changes certainly look trivial.  Let's apply.
>
> Thanks.
>
>>
>> OpenBSD guys: If you CC the git mailing list when you find you need to
>> apply patches like these, we're happy to fix this more pro-actively. I
>> just happened to be testing the upcoming 2.19.0 on OpenBSD and spotted
>> this.
>>
>>  t/check-non-portable-shell.pl | 1 +
>>  t/t5310-pack-bitmaps.sh   | 2 +-
>>  t/test-lib.sh | 4 ++--
>>  3 files changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/t/check-non-portable-shell.pl b/t/check-non-portable-shell.pl
>> index d5823f71d8..94a7e6165e 100755
>> --- a/t/check-non-portable-shell.pl
>> +++ b/t/check-non-portable-shell.pl
>> @@ -35,6 +35,7 @@ sub err {
>>  chomp;
>>  }
>>
>> +/\bhead\s+-c\b/ and err 'head -c is not portable (use test_copy_bytes 
>> BYTES out)';
>>  /\bsed\s+-i/ and err 'sed -i is not portable';
>>  /\becho\s+-[neE]/ and err 'echo with option is not portable (use 
>> printf)';
>>  /^\s*declare\s+/ and err 'arrays/declare not portable';
>> diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
>> index 557bd0d0c0..7bff7923f2 100755
>> --- a/t/t5310-pack-bitmaps.sh
>> +++ b/t/t5310-pack-bitmaps.sh
>> @@ -335,7 +335,7 @@ test_expect_success 'truncated bitmap fails gracefully' '
>>  git rev-list --use-bitmap-index --count --all >expect &&
>>  bitmap=$(ls .git/objects/pack/*.bitmap) &&
>>  test_when_finished "rm -f $bitmap" &&
>> -head -c 512 <$bitmap >$bitmap.tmp &&
>> +test_copy_bytes 512 <$bitmap >$bitmap.tmp &&
>>  mv -f $bitmap.tmp $bitmap &&
>>  git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
>>  test_cmp expect actual &&
>> diff --git a/t/test-lib.sh b/t/test-lib.sh
>> index 8bb0f4348e..44288cbb59 100644
>> --- a/t/test-lib.sh
>> +++ b/t/test-lib.sh
>> @@ -867,7 +867,7 @@ then
>>  # handle only executables, unless they are shell libraries that
>>  # need to be in the exec-path.
>>  test -x "$1" ||
>> -test "# " = "$(head -c 2 <"$1")" ||
>> +test "# " = "$(test_copy_bytes 2 <"$1")" ||
>>  return;
>>
>>  base=$(basename "$1")
>> @@ -882,7 +882,7 @@ then
>>  # do not override scripts
>>  if test -x "$symlink_target" &&
>>  test ! -d "$symlink_target" &&
>> -test "#!" != "$(head -c 2 < "$symlink_target")"
>> +test "#!" != "$(test_copy_bytes 2 <"$symlink_target")"
>>  then
>>  symlink_target=../valgrind.sh
>>  fi

Re: [PATCH v2 2/2] tests: fix and add lint for non-portable seq



On Thu, Aug 23 2018, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason   writes:
>
>> GNU seq is not a POSIX command, and doesn't exist on
>
> s/GNU //; the command did not even originate there, but came from V8
> and/or Plan9 IIRC.
>
>> e.g. OpenBSD. We've had the test_seq wrapper since d17cf5f3a3 ("tests:
>> Introduce test_seq", 2012-08-04), but use of it keeps coming back,
>> e.g. in the recently added "fetch negotiator" tests being added here.
>>
>> So let's also add a check to "make test-lint". The regex is aiming to
>> capture the likes of $(seq ..) and "seq" as a stand-alone command,
>> without capturing some existing cases where we e.g. have files called
>> "seq".
>>
>> Signed-off-by: Ævar Arnfjörð Bjarmason 
>> ---
>>
>> Now with a fix & check in v2 for the seq issue mentioned in
>> https://public-inbox.org/git/87a7pdfltp@evledraar.gmail.com/
>
> Thanks.
>
>>  t/check-non-portable-shell.pl|  1 +
>>  t/t5552-skipping-fetch-negotiator.sh | 12 ++--
>>  t/t5703-upload-pack-ref-in-want.sh   |  4 ++--
>>  3 files changed, 9 insertions(+), 8 deletions(-)
>>
>> diff --git a/t/check-non-portable-shell.pl b/t/check-non-portable-shell.pl
>> index c8f10d40a1..75f38298d7 100755
>> --- a/t/check-non-portable-shell.pl
>> +++ b/t/check-non-portable-shell.pl
>> @@ -42,6 +42,7 @@ sub err {
>>  /\btest\s+[^=]*==/ and err '"test a == b" is not portable (use =)';
>>  /\bwc -l.*"\s*=/ and err '`"$(wc -l)"` is not portable (use 
>> test_line_count)';
>>  /\bhead\s+-c\b/ and err 'head -c is not portable (use test_copy_bytes 
>> BYTES out)';
>> +/(?:\$\(seq|^\s*seq\b)/ and err 'seq is not portable (use test_seq)';
>
> Looking at $(wc -l) thing a few lines above, I am not sure if this
> deviation is a good idea.  Wouldn't /\bseq\s/ be sufficient?

As alluded to in the commit message that will find cases like:

$ git grep -P '\bseq\b' -- t | grep -P -v '(?:\$\(seq|^\s*seq\b)'
t/perf/p3400-rebase.sh:16:  seq 1000 >>unrelated-file$i &&
t/perf/p3400-rebase.sh:21:  seq 1000 | tac >>unrelated-file$i &&
t/t3404-rebase-interactive.sh:1227: test_seq 5 | sed 
"s/$double/&&/" >seq &&
t/t3404-rebase-interactive.sh:1228: git add seq &&
t/t3404-rebase-interactive.sh:1230: git commit -m seq-$double
t/t3404-rebase-interactive.sh:1232: git tag seq-onto &&
t/t3404-rebase-interactive.sh:1234: git cherry-pick seq-onto &&
t/t3404-rebase-interactive.sh:1236: test_must_fail env FAKE_LINES= git 
rebase -i seq-onto &&
t/t3404-rebase-interactive.sh:1239: git diff --exit-code seq-onto &&
t/t4022-diff-rewrite.sh:69: test_seq 1 99 >seq &&
t/t4022-diff-rewrite.sh:70: printf 100 >>seq &&
t/t4022-diff-rewrite.sh:71: git add seq &&
t/t4022-diff-rewrite.sh:72: git commit seq -m seq
t/t4022-diff-rewrite.sh:76: test_seq 1 5 >seq &&
t/t4022-diff-rewrite.sh:77: test_seq 9331 9420 >>seq &&
t/t4022-diff-rewrite.sh:78: test_seq 96 100 >>seq
t/t4022-diff-rewrite.sh:82: git diff -B seq >res &&
t/t4022-diff-rewrite.sh:87: git diff seq >res &&
t/t4022-diff-rewrite.sh:93: git diff -B seq >res &&
t/t4150-am.sh:933:  git format-patch -2 --stdout >seq.patch &&
t/t4150-am.sh:945:  test_must_fail git am -3 seq.patch &&
t/t4150-am.sh:955:  test_must_fail git am -3 seq.patch &&
t/t5520-pull.sh:295:git checkout -b seq &&
t/t5520-pull.sh:296:test_seq 5 >seq.txt &&
t/t5520-pull.sh:297:git add seq.txt &&
t/t5520-pull.sh:299:git commit -m "Add seq.txt" &&
t/t5520-pull.sh:300:echo 6 >>seq.txt &&
t/t5520-pull.sh:302:git commit -m "Append to seq.txt" seq.txt &&
t/t5520-pull.sh:304:echo conflicting >>seq.txt &&
t/t5520-pull.sh:306:git commit -m "Create conflict" seq.txt &&
t/t5520-pull.sh:307:test_must_fail git pull --rebase . seq 2>err >out &&

Re: [PATCH v2] config: fix commit-graph related config docs

Derrick Stolee  writes:

> The core.commitGraph config setting was accidentally removed from
> the config documentation. In that same patch, the config setting
> that writes a commit-graph during garbage collection was incorrectly
> written to the doc as "gc.commitGraph" instead of "gc.writeCommitGraph".
>
> Reported-by: Szeder Gábor 
> Signed-off-by: Derrick Stolee 
> ---
>  Documentation/config.txt | 17 +++--
>  1 file changed, 11 insertions(+), 6 deletions(-)

Thanks; that's an appropriate update before the release to tie the
final loose ends.  Will apply.

> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 1c42364988..6ee1890984 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -917,12 +917,10 @@ core.notesRef::
>  This setting defaults to "refs/notes/commits", and it can be overridden by
>  the `GIT_NOTES_REF` environment variable.  See linkgit:git-notes[1].
>  
> -gc.commitGraph::
> - If true, then gc will rewrite the commit-graph file when
> - linkgit:git-gc[1] is run. When using linkgit:git-gc[1]
> - '--auto' the commit-graph will be updated if housekeeping is
> - required. Default is false. See linkgit:git-commit-graph[1]
> - for details.
> +core.commitGraph::
> + If true, then git will read the commit-graph file (if it exists)
> + to parse the graph structure of commits. Defaults to false. See
> + linkgit:git-commit-graph[1] for more information.
>  
>  core.useReplaceRefs::
>   If set to `false`, behave as if the `--no-replace-objects`
> @@ -1763,6 +1761,13 @@ this configuration variable is ignored, all packs 
> except the base pack
>  will be repacked. After this the number of packs should go below
>  gc.autoPackLimit and gc.bigPackThreshold should be respected again.
>  
> +gc.writeCommitGraph::
> + If true, then gc will rewrite the commit-graph file when
> + linkgit:git-gc[1] is run. When using linkgit:git-gc[1]
> + '--auto' the commit-graph will be updated if housekeeping is
> + required. Default is false. See linkgit:git-commit-graph[1]
> + for details.
> +
>  gc.logExpiry::
>   If the file gc.log exists, then `git gc --auto` won't run
>   unless that file is more than 'gc.logExpiry' old.  Default is

wide t/perf output, was Re: [ANNOUNCE] Git v2.19.0-rc0

On Thu, Aug 23, 2018 at 06:20:26AM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > Here are numbers for p0001.2 run against linux.git on a few
> > versions. This is using -O2 with gcc 8.2.0.
> >
> >   Test v2.18.0 v2.19.0-rc0   HEAD
> >   
> > --
> >   0001.2:  34.24(33.81+0.43)   34.83(34.42+0.40) +1.7%   33.90(33.47+0.42) 
> > -1.0%
> 
> I see what you did to the formatting here, which is a topic of
> another thread ;-).

Do you happen to have a link? I missed that one, and digging turned up
nothing.

A while ago I wrote the patch below. I don't recall why I never sent it
in (and it doesn't apply cleanly these days, though I'm sure it could be
forward-ported).

-- >8 --
Date: Wed, 20 Jan 2016 23:54:14 -0500
Subject: [PATCH] t/perf: add "tall" output format

When aggregating results, we usually show a list of tests,
one per line, with the tested revisions in columns across.
Like:

$ ./aggregate.perl 348d4f2^ 348d4f2 p7000-filter-branch.sh
Test  348d4f2^   348d4f2
---
7000.2: noop filter   295.32(269.61+14.36)   7.92(0.85+0.72) -97.3%

This is useful if you have a lot of tests to show, but few
revisions; you're effectively comparing the two items on
each line. But sometimes you have the opposite: few tests,
but a large number of revisions. In this case, the lines
get very long, and it's hard to compare values.

This patch introduces a "tall" format that shows the same
data in a more vertical manner:

$ ./aggregate.perl --tall \
348d4f2^ 348d4f2 \
jk/filter-branch-empty^ \
jk/filter-branch-empty \
p7000-filter-branch.sh
Test: p7000-filter-branch.2
348d4f2^  295.32(269.61+14.36)
348d4f27.92(0.85+0.72) -97.3%
jk/filter-branch-empty^9.37(0.87+0.80) -96.8%
jk/filter-branch-empty 7.71(0.92+0.62) -97.4%

Signed-off-by: Jeff King 
---
 t/perf/aggregate.perl | 124 ++
 1 file changed, 88 insertions(+), 36 deletions(-)

diff --git a/t/perf/aggregate.perl b/t/perf/aggregate.perl
index e401208488..d108a02ccd 100755
--- a/t/perf/aggregate.perl
+++ b/t/perf/aggregate.perl
@@ -17,29 +17,41 @@ sub get_times {
return ($rt, $4, $5);
 }
 
-sub format_times {
+sub format_times_list {
my ($r, $u, $s, $firstr) = @_;
if (!defined $r) {
return "";
}
my $out = sprintf "%.2f(%.2f+%.2f)", $r, $u, $s;
+   my $pct;
if (defined $firstr) {
if ($firstr > 0) {
-   $out .= sprintf " %+.1f%%", 100.0*($r-$firstr)/$firstr;
+   $pct = sprintf "%+.1f%%", 100.0*($r-$firstr)/$firstr;
} elsif ($r == 0) {
-   $out .= " =";
+   $pct = "=";
} else {
-   $out .= " +inf";
+   $pct = "+inf";
}
}
-   return $out;
+   return ($out, $pct);
+}
+
+sub format_times {
+   my ($times, $pct) = format_times_list(@_);
+   return defined $pct ? "$times $pct" : $times;
 }
 
 my (@dirs, %dirnames, %dirabbrevs, %prefixes, @tests);
+my ($tall_format);
 while (scalar @ARGV) {
my $arg = $ARGV[0];
my $dir;
last if -f $arg or $arg eq "--";
+   if ($arg eq "--tall") {
+   $tall_format = 1;
+   shift @ARGV;
+   next;
+   }
if (! -d $arg) {
my $rev = Git::command_oneline(qw(rev-parse --verify), $arg);
$dir = "build/".$rev;
@@ -122,6 +134,11 @@ sub have_slash {
return 0;
 }
 
+sub printable_dir {
+   my ($d) = @_;
+   return exists $dirabbrevs{$d} ? $dirabbrevs{$d} : $dirnames{$d};
+}
+
 my %newdirabbrevs = %dirabbrevs;
 while (!have_duplicate(values %newdirabbrevs)) {
%dirabbrevs = %newdirabbrevs;
@@ -132,44 +149,79 @@ sub have_slash {
}
 }
 
-my %times;
-my @colwidth = ((0)x@dirs);
-for my $i (0..$#dirs) {
-   my $d = $dirs[$i];
-   my $w = length (exists $dirabbrevs{$d} ? $dirabbrevs{$d} : 
$dirnames{$d});
-   $colwidth[$i] = $w if $w > $colwidth[$i];
-}
-for my $t (@subtests) {
-   my $firstr;
+binmode STDOUT, ":utf8" or die "PANIC on binmode: $!";
+
+if (!$tall_format) {
+   my %times;
+   my @colwidth = ((0)x@dirs);
for my $i (0..$#dirs) {
my $d = $dirs[$i];
-   $times{$prefixes{$d}.$t} = 
[get_times("$resultsdir/$prefixes{$d}$t.times")];
-   my ($r,$u,$s) = @{$times{$prefixes{$d}.$t}};
-   my $w = length format_times($r,$u,$s,$firstr);
+   my $w = length(printable_dir($d));
$colwidth[$i] = $w if $w > $colwidth[$i];
-   $firstr = $r unless defined $firstr;
}
-}
-my

Re: [PATCH] t/lib-rebase.sh: support explicit 'pick' commands in 'fake_editor.sh'

SZEDER Gábor  writes:

> diff --git a/t/lib-rebase.sh b/t/lib-rebase.sh
> index 25a77ee5cb..592865d019 100644
> --- a/t/lib-rebase.sh
> +++ b/t/lib-rebase.sh
> @@ -47,7 +47,7 @@ set_fake_editor () {
>   action=pick
>   for line in $FAKE_LINES; do
>   case $line in
> - squash|fixup|edit|reword|drop)
> + pick|squash|fixup|edit|reword|drop)
>   action="$line";;
>   exec*)
>   echo "$line" | sed 's/_/ /g' >> "$1";;

There is a large comment before the function that says

...
#   The following word combinations are possible:
#
#   "" -- add a "pick" line with the SHA1 taken from the
#   specified line.
#
#   " " -- add a line with the specified command
#   ("squash", "fixup", "edit", "reword" or "drop") and the SHA1 taken
#   from the specified line.
#
#   "exec_cmd_with_args" -- add an "exec cmd with args" line.
...

which probably needs an update, too.

Re: [ANNOUNCE] Git v2.19.0-rc0

On Thu, Aug 23, 2018 at 06:26:58AM -0400, Derrick Stolee wrote:

> Around the time that my proposed approaches were getting vetoed for
> alignment issues, I figured I was out of my depth here. I reached out to
> Daniel Lemire (of EWAH bitmap fame) on Twitter [1]. His blog is full of
> posts of word-based approaches to different problems, so I thought he might
> know something off the top of his head that would be applicable. His
> conclusion (after looking only a short time) was to take a 'hasheq' approach
> [2] like Peff suggested [3]. Since that requires auditing all callers of
> hashcmp to see if hasheq is appropriate, it is not a good solution for 2.19
> but (in my opinion) should be evaluated as part of the 2.20 cycle.

I think that audit isn't actually too bad (but definitely not something
we should do for v2.19). The cocci patch I showed earlier hits most of
them. It misses the negated ones (e.g., "if (oidcmp(...))"). I'm not
sure if there's a way to ask coccinelle only for oidcmp()
used in a boolean context.

Just skimming the grep output, it's basically every call except the ones
we use for binary search.

-Peff

Re: Questions about the hash function transition

Ævar Arnfjörð Bjarmason  writes:

> On Thu, Aug 23 2018, Junio C Hamano wrote:
>
>> Ævar Arnfjörð Bjarmason  writes:
>>
 - The trailer consists of the following:
   - A copy of the 20-byte SHA-256 checksum at the end of the
 corresponding packfile.

   - 20-byte SHA-256 checksum of all of the above.
>>>
>>> We need to update both of these to 32 byte, right? Or are we planning to
>>> truncate the checksums?
>>
>> https://public-inbox.org/git/ca+55afwc7uq61ebnj36pfu_abcxgya4jut-tvppj21hkhre...@mail.gmail.com/
>
> Thanks.
>
> Yeah for this checksum purpose even 10 or 5 characters would do, but
> since we'll need a new pack format anyway for SHA-256 why not just use
> the full length of the SHA-256 here? We're using the full length of the
> SHA-1.
>
> I don't see it mattering for security / corruption detection purposes,
> but just to avoid confusion. We'll have this one place left where
> something looks like a SHA-1, but is actually a trunctated SHA-256.

I would prefer to see us at least explore if the gain in throughput
is sufficiently big if we switch to weaker checksum, like crc32.  If
does not give us sufficient gain, I'd agree with you that consistently
using full hash everywhere would conceptually be cleaner.

Re: [PATCH v2 1/2] tests: fix and add lint for non-portable head -c N

On Thu, Aug 23, 2018 at 03:25:01PM +, Ævar Arnfjörð Bjarmason wrote:

> The "head -c BYTES" option is non-portable (not in POSIX[1]). Change
> such invocations to use the test_copy_bytes wrapper added in
> 48860819e8 ("t9300: factor out portable "head -c" replacement",
> 2016-06-30).
> 
> This fixes a test added in 9d2e330b17 ("ewah_read_mmap: bounds-check
> mmap reads", 2018-06-14), which has been breaking
> t5310-pack-bitmaps.sh on OpenBSD since 2.18.0. The OpenBSD ports
> already have a similar workaround after their upgrade to 2.18.0[2].

Heh, I even considered using this when writing that test. But the reason
I introduced test_copy_bytes is not because the target platform did not
have "head -c" at all, but because some tests need very specific
buffering guarantees when reading from a shared pipe.

That said, if OpenBSD's "head" doesn't have "-c" at all, I'm fine with
this as a fix (and it sounds like we know that IRIX lacks it, too).

> Also, change a valgrind-specific codepath in test-lib.sh to use this
> wrapper. Given where valgrind runs I don't think this would ever
> become a portability issue in practice, but it's easier to just use
> the wrapper than introduce some exception for the "make test-lint"
> check being added here.

When working on 9d2e330b17, I recall finding these other "head -c"
invocations in the test suite when I did 9d2e330b17 and took them as
evidence that it was OK to use in vanilla cases. So even if these sites
don't affect any platforms in practice, I think it's worth it to ban
"head -c" completely.

> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---
>  t/check-non-portable-shell.pl | 1 +
>  t/t5310-pack-bitmaps.sh   | 2 +-
>  t/test-lib.sh | 4 ++--
>  3 files changed, 4 insertions(+), 3 deletions(-)

Patch itself looks good to me. Thanks.

-Peff

Re: [PATCH v2 2/2] tests: fix and add lint for non-portable seq

Ævar Arnfjörð Bjarmason   writes:

> GNU seq is not a POSIX command, and doesn't exist on

s/GNU //; the command did not even originate there, but came from V8
and/or Plan9 IIRC.

> e.g. OpenBSD. We've had the test_seq wrapper since d17cf5f3a3 ("tests:
> Introduce test_seq", 2012-08-04), but use of it keeps coming back,
> e.g. in the recently added "fetch negotiator" tests being added here.
>
> So let's also add a check to "make test-lint". The regex is aiming to
> capture the likes of $(seq ..) and "seq" as a stand-alone command,
> without capturing some existing cases where we e.g. have files called
> "seq".
>
> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---
>
> Now with a fix & check in v2 for the seq issue mentioned in
> https://public-inbox.org/git/87a7pdfltp@evledraar.gmail.com/

Thanks.

>  t/check-non-portable-shell.pl|  1 +
>  t/t5552-skipping-fetch-negotiator.sh | 12 ++--
>  t/t5703-upload-pack-ref-in-want.sh   |  4 ++--
>  3 files changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/t/check-non-portable-shell.pl b/t/check-non-portable-shell.pl
> index c8f10d40a1..75f38298d7 100755
> --- a/t/check-non-portable-shell.pl
> +++ b/t/check-non-portable-shell.pl
> @@ -42,6 +42,7 @@ sub err {
>   /\btest\s+[^=]*==/ and err '"test a == b" is not portable (use =)';
>   /\bwc -l.*"\s*=/ and err '`"$(wc -l)"` is not portable (use 
> test_line_count)';
>   /\bhead\s+-c\b/ and err 'head -c is not portable (use test_copy_bytes 
> BYTES out)';
> + /(?:\$\(seq|^\s*seq\b)/ and err 'seq is not portable (use test_seq)';

Looking at $(wc -l) thing a few lines above, I am not sure if this
deviation is a good idea.  Wouldn't /\bseq\s/ be sufficient?

Re: Test failures on OpenBSD

Ævar Arnfjörð Bjarmason  writes:

> + echo Error (-1) reading configuration file a-directory.
> + > expect 
> + mkdir a-directory
> + test_expect_code 2 test-tool config configset_get_value foo.bar 
> a-directory
> + 2> output 
> Value not found for "foo.bar"
> test_expect_code: command exited with 1, we wanted 2 test-tool config 
> configset_get_value foo.bar a-directory
> error: last command exited with $?=1
> not ok 23 - proper error on directory "files"

A blind guess.  is this "BSD allows opening a FILE* on a directory
and reading bytes from it" aka FREAD_READS_DIRECTORIES?

I wonder why FreeBSD sets it but not OpenBSD in config.mak.uname.

> + make_dir extract
> + tar xf subtree-all.tar -C extract
> tar: Cannot identify format. Searching...
> tar: End of archive volume 1 reached
> tar: Sorry, unable to determine archive format.
> error: last command exited with $?=1

Perhaps modern tar elements like extended headers are not supported
on the platform-supplied "tar"?  "make TAR=gnutar"?

> + seq 11
> ./t5552-skipping-fetch-negotiator.sh: seq: not found

This must use test_seq.

 t/t5552-skipping-fetch-negotiator.sh | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/t/t5552-skipping-fetch-negotiator.sh 
b/t/t5552-skipping-fetch-negotiator.sh
index 5ad5bece55..30857b84a8 100755
--- a/t/t5552-skipping-fetch-negotiator.sh
+++ b/t/t5552-skipping-fetch-negotiator.sh
@@ -46,7 +46,7 @@ test_expect_success 'commits with no parents are sent 
regardless of skip distanc
test_commit -C server to_fetch &&
 
git init client &&
-   for i in $(seq 7)
+   for i in $(test_seq 7)
do
test_commit -C client c$i
done &&
@@ -89,7 +89,7 @@ test_expect_success 'when two skips collide, favor the larger 
one' '
test_commit -C server to_fetch &&
 
git init client &&
-   for i in $(seq 11)
+   for i in $(test_seq 11)
do
test_commit -C client c$i
done &&
@@ -168,14 +168,14 @@ test_expect_success 'do not send "have" with ancestors of 
commits that server AC
test_commit -C server to_fetch &&
 
git init client &&
-   for i in $(seq 8)
+   for i in $(test_seq 8)
do
git -C client checkout --orphan b$i &&
test_commit -C client b$i.c0
done &&
-   for j in $(seq 19)
+   for j in $(test_seq 19)
do
-   for i in $(seq 8)
+   for i in $(test_seq 8)
do
git -C client checkout b$i &&
test_commit -C client b$i.c$j
@@ -205,7 +205,7 @@ test_expect_success 'do not send "have" with ancestors of 
commits that server AC
 
# fetch-pack should thus not send any more commits in the b1 branch, but
# should still send the others (in this test, just check b2).
-   for i in $(seq 0 8)
+   for i in $(test_seq 0 8)
do
have_not_sent b1.c$i
done &&

[PATCH v2] config: fix commit-graph related config docs

2018-08-23 Thread Derrick Stolee

The core.commitGraph config setting was accidentally removed from
the config documentation. In that same patch, the config setting
that writes a commit-graph during garbage collection was incorrectly
written to the doc as "gc.commitGraph" instead of "gc.writeCommitGraph".

Reported-by: Szeder Gábor 
Signed-off-by: Derrick Stolee 
---
 Documentation/config.txt | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 1c42364988..6ee1890984 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -917,12 +917,10 @@ core.notesRef::
 This setting defaults to "refs/notes/commits", and it can be overridden by
 the `GIT_NOTES_REF` environment variable.  See linkgit:git-notes[1].
 
-gc.commitGraph::
-   If true, then gc will rewrite the commit-graph file when
-   linkgit:git-gc[1] is run. When using linkgit:git-gc[1]
-   '--auto' the commit-graph will be updated if housekeeping is
-   required. Default is false. See linkgit:git-commit-graph[1]
-   for details.
+core.commitGraph::
+   If true, then git will read the commit-graph file (if it exists)
+   to parse the graph structure of commits. Defaults to false. See
+   linkgit:git-commit-graph[1] for more information.
 
 core.useReplaceRefs::
If set to `false`, behave as if the `--no-replace-objects`
@@ -1763,6 +1761,13 @@ this configuration variable is ignored, all packs except 
the base pack
 will be repacked. After this the number of packs should go below
 gc.autoPackLimit and gc.bigPackThreshold should be respected again.
 
+gc.writeCommitGraph::
+   If true, then gc will rewrite the commit-graph file when
+   linkgit:git-gc[1] is run. When using linkgit:git-gc[1]
+   '--auto' the commit-graph will be updated if housekeeping is
+   required. Default is false. See linkgit:git-commit-graph[1]
+   for details.
+
 gc.logExpiry::
If the file gc.log exists, then `git gc --auto` won't run
unless that file is more than 'gc.logExpiry' old.  Default is
-- 
2.19.0.rc0

[PATCH v1] read-cache: speed up index load through parallelization

2018-08-23 Thread Ben Peart

This patch helps address the CPU cost of loading the index by creating
multiple threads to divide the work of loading and converting the cache
entries across all available CPU cores.

It accomplishes this by having the primary thread loop across the index file
tracking the offset and (for V4 indexes) expanding the name. It creates a
thread to process each block of entries as it comes to them. Once the
threads are complete and the cache entries are loaded, the rest of the
extensions can be loaded and processed normally on the primary thread.

Performance impact:

read cache .git/index times on a synthetic repo with:

100,000 entries
FALSE   TRUESavings %Savings
0.014798767 0.009580433 0.005218333 35.26%

1,000,000 entries
FALSE   TRUESavings %Savings
0.240896533 0.1751243   0.065772233 27.30%

read cache .git/index times on an actual repo with:

~3M entries
FALSE   TRUESavings %Savings
0.59898098  0.4513169   0.14766408  24.65%

Signed-off-by: Ben Peart 
---

Notes:
Base Ref: master
Web-Diff: https://github.com/benpeart/git/commit/67a700419b
Checkout: git fetch https://github.com/benpeart/git 
read-index-multithread-v1 && git checkout 67a700419b

 Documentation/config.txt |   8 ++
 config.c |  13 +++
 config.h |   1 +
 read-cache.c | 218 ++-
 4 files changed, 216 insertions(+), 24 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 1c42364988..3344685cc4 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -899,6 +899,14 @@ relatively high IO latencies.  When enabled, Git will do 
the
 index comparison to the filesystem data in parallel, allowing
 overlapping IO's.  Defaults to true.
 
+core.fastIndex::
+   Enable parallel index loading
++
+This can speed up operations like 'git diff' and 'git status' especially
+when the index is very large.  When enabled, Git will do the index
+loading from the on disk format to the in-memory format in parallel.
+Defaults to true.
+
 core.createObject::
You can set this to 'link', in which case a hardlink followed by
a delete of the source are used to make sure that object creation
diff --git a/config.c b/config.c
index 9a0b10d4bc..883092fdd3 100644
--- a/config.c
+++ b/config.c
@@ -2289,6 +2289,19 @@ int git_config_get_fsmonitor(void)
return 0;
 }
 
+int git_config_get_fast_index(void)
+{
+   int val;
+
+   if (!git_config_get_maybe_bool("core.fastindex", ))
+   return val;
+
+   if (getenv("GIT_FASTINDEX_TEST"))
+   return 1;
+
+   return -1; /* default value */
+}
+
 NORETURN
 void git_die_config_linenr(const char *key, const char *filename, int linenr)
 {
diff --git a/config.h b/config.h
index ab46e0165d..74ca4e7db5 100644
--- a/config.h
+++ b/config.h
@@ -250,6 +250,7 @@ extern int git_config_get_untracked_cache(void);
 extern int git_config_get_split_index(void);
 extern int git_config_get_max_percent_split_change(void);
 extern int git_config_get_fsmonitor(void);
+extern int git_config_get_fast_index(void);
 
 /* This dies if the configured or default date is in the future */
 extern int git_config_get_expiry(const char *key, const char **output);
diff --git a/read-cache.c b/read-cache.c
index 7b1354d759..0fa7e1a04c 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -24,6 +24,10 @@
 #include "utf8.h"
 #include "fsmonitor.h"
 
+#ifndef min
+#define min(a,b) (((a) < (b)) ? (a) : (b))
+#endif
+
 /* Mask for the name length in ce_flags in the on-disk index */
 
 #define CE_NAMEMASK  (0x0fff)
@@ -1889,16 +1893,203 @@ static size_t estimate_cache_size(size_t ondisk_size, 
unsigned int entries)
return ondisk_size + entries * per_entry;
 }
 
+static unsigned long load_cache_entry_block(struct index_state *istate, struct 
mem_pool *ce_mem_pool, int offset, int nr, void *mmap, unsigned long 
start_offset, struct strbuf *previous_name)
+{
+   int i;
+   unsigned long src_offset = start_offset;
+
+   for (i = offset; i < offset + nr; i++) {
+   struct ondisk_cache_entry *disk_ce;
+   struct cache_entry *ce;
+   unsigned long consumed;
+
+   disk_ce = (struct ondisk_cache_entry *)((char *)mmap + 
src_offset);
+   ce = create_from_disk(ce_mem_pool, disk_ce, , 
previous_name);
+   set_index_entry(istate, i, ce);
+
+   src_offset += consumed;
+   }
+   return src_offset - start_offset;
+}
+
+static unsigned long load_all_cache_entries(struct index_state *istate, void 
*mmap, size_t mmap_size, unsigned long src_offset)
+{
+   struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
+   unsigned long consumed;
+
+   if (istate->version == 4) {
+   previous_name = _name_buf;
+   mem_pool_init(>ce_mem_pool,
+

Re: [PATCH] tests: fix and add lint for non-portable head -c N

Ævar Arnfjörð Bjarmason   writes:

> Junio: Even though this isn't a 2.19.0-rc0 regression I think it makes
> sense for 2.19.0. The fix is trivial, and it'll unbreak (at least some
> of) the tests on stock git on OpenBSD.

I would have been a bit more sympathetic if this were recent
regression (say, 2.18 or so), even if it is not new in this cycle,
and the patch applied to the target track cleanly (say, maint or
maint-2.17), but seeing that some of these are quite old, dating
back to 2009, I am not sure I am enthused.

The changes certainly look trivial.  Let's apply.

Thanks.

>
> OpenBSD guys: If you CC the git mailing list when you find you need to
> apply patches like these, we're happy to fix this more pro-actively. I
> just happened to be testing the upcoming 2.19.0 on OpenBSD and spotted
> this.
>
>  t/check-non-portable-shell.pl | 1 +
>  t/t5310-pack-bitmaps.sh   | 2 +-
>  t/test-lib.sh | 4 ++--
>  3 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/t/check-non-portable-shell.pl b/t/check-non-portable-shell.pl
> index d5823f71d8..94a7e6165e 100755
> --- a/t/check-non-portable-shell.pl
> +++ b/t/check-non-portable-shell.pl
> @@ -35,6 +35,7 @@ sub err {
>   chomp;
>   }
>  
> + /\bhead\s+-c\b/ and err 'head -c is not portable (use test_copy_bytes 
> BYTES out)';
>   /\bsed\s+-i/ and err 'sed -i is not portable';
>   /\becho\s+-[neE]/ and err 'echo with option is not portable (use 
> printf)';
>   /^\s*declare\s+/ and err 'arrays/declare not portable';
> diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
> index 557bd0d0c0..7bff7923f2 100755
> --- a/t/t5310-pack-bitmaps.sh
> +++ b/t/t5310-pack-bitmaps.sh
> @@ -335,7 +335,7 @@ test_expect_success 'truncated bitmap fails gracefully' '
>   git rev-list --use-bitmap-index --count --all >expect &&
>   bitmap=$(ls .git/objects/pack/*.bitmap) &&
>   test_when_finished "rm -f $bitmap" &&
> - head -c 512 <$bitmap >$bitmap.tmp &&
> + test_copy_bytes 512 <$bitmap >$bitmap.tmp &&
>   mv -f $bitmap.tmp $bitmap &&
>   git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
>   test_cmp expect actual &&
> diff --git a/t/test-lib.sh b/t/test-lib.sh
> index 8bb0f4348e..44288cbb59 100644
> --- a/t/test-lib.sh
> +++ b/t/test-lib.sh
> @@ -867,7 +867,7 @@ then
>   # handle only executables, unless they are shell libraries that
>   # need to be in the exec-path.
>   test -x "$1" ||
> - test "# " = "$(head -c 2 <"$1")" ||
> + test "# " = "$(test_copy_bytes 2 <"$1")" ||
>   return;
>  
>   base=$(basename "$1")
> @@ -882,7 +882,7 @@ then
>   # do not override scripts
>   if test -x "$symlink_target" &&
>   test ! -d "$symlink_target" &&
> - test "#!" != "$(head -c 2 < "$symlink_target")"
> + test "#!" != "$(test_copy_bytes 2 <"$symlink_target")"
>   then
>   symlink_target=../valgrind.sh
>   fi

Re: [PATCH/RFC] commit: new option to abort -a something is already staged

Duy Nguyen  writes:

> On Thu, Aug 23, 2018 at 4:15 AM Jonathan Nieder  wrote:
>> > The remaining question becomes scripts.  A script might do
>> >
>> >   ... modify old-file and create new-file ...
>> >   git add new-file
>> >   git commit -m "some great message"
>> >
>> > which would trip this error.  For that matter, humans might do that,
>> > too.  Could the check detect this case (where the only changes in the
>> > index are additions of new files) and treat it as non-destructive?
>>
>> (where the only changes in the index are additions of new files *that
>> match the worktree*)
>
> I don't think my change would affect this case. If "git commit" does
> not change the index by itself, there's no point in stopping it.

I think the above example forgets "-a" on the final "git commit"
step.  With it added, I can understand the concern (and I am sure
you would, too).

The user is trying to add everything done in the working tree, and
"commit -a" would catch all changes to paths that were already
tracked, but a separate "add" is necessary for newly created paths.
But adding a new path means the index no longer matches HEAD, and
the "commit -a" at the final step sweeps the changes to already
tracked paths---failing that because there "already is something
staged" will break the workflow.

[PATCH v2 1/2] tests: fix and add lint for non-portable head -c N

The "head -c BYTES" option is non-portable (not in POSIX[1]). Change
such invocations to use the test_copy_bytes wrapper added in
48860819e8 ("t9300: factor out portable "head -c" replacement",
2016-06-30).

This fixes a test added in 9d2e330b17 ("ewah_read_mmap: bounds-check
mmap reads", 2018-06-14), which has been breaking
t5310-pack-bitmaps.sh on OpenBSD since 2.18.0. The OpenBSD ports
already have a similar workaround after their upgrade to 2.18.0[2].

I have not tested this on IRIX, but according to 4de0bbd898 ("t9300:
use perl "head -c" clone in place of "dd bs=1 count=16000" kluge",
2010-12-13) this invocation would have broken things there too.

Also, change a valgrind-specific codepath in test-lib.sh to use this
wrapper. Given where valgrind runs I don't think this would ever
become a portability issue in practice, but it's easier to just use
the wrapper than introduce some exception for the "make test-lint"
check being added here.

1. http://pubs.opengroup.org/onlinepubs/9699919799/utilities/head.html
2. 
https://github.com/openbsd/ports/commit/08d5d82eaefe5cf2f125ecc0c6a57df9cf91350c#diff-f7d3c4fabeed1691620d608f1534f5e5

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/check-non-portable-shell.pl | 1 +
 t/t5310-pack-bitmaps.sh   | 2 +-
 t/test-lib.sh | 4 ++--
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/t/check-non-portable-shell.pl b/t/check-non-portable-shell.pl
index d5823f71d8..c8f10d40a1 100755
--- a/t/check-non-portable-shell.pl
+++ b/t/check-non-portable-shell.pl
@@ -41,6 +41,7 @@ sub err {
/^\s*[^#]\s*which\s/ and err 'which is not portable (use type)';
/\btest\s+[^=]*==/ and err '"test a == b" is not portable (use =)';
/\bwc -l.*"\s*=/ and err '`"$(wc -l)"` is not portable (use 
test_line_count)';
+   /\bhead\s+-c\b/ and err 'head -c is not portable (use test_copy_bytes 
BYTES out)';
/\bexport\s+[A-Za-z0-9_]*=/ and err '"export FOO=bar" is not portable 
(use FOO=bar && export FOO)';
/^\s*([A-Z0-9_]+=(\w+|(["']).*?\3)\s+)+(\w+)/ and exists($func{$4}) and
err '"FOO=bar shell_func" assignment extends beyond 
"shell_func"';
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index 557bd0d0c0..7bff7923f2 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -335,7 +335,7 @@ test_expect_success 'truncated bitmap fails gracefully' '
git rev-list --use-bitmap-index --count --all >expect &&
bitmap=$(ls .git/objects/pack/*.bitmap) &&
test_when_finished "rm -f $bitmap" &&
-   head -c 512 <$bitmap >$bitmap.tmp &&
+   test_copy_bytes 512 <$bitmap >$bitmap.tmp &&
mv -f $bitmap.tmp $bitmap &&
git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
test_cmp expect actual &&
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 8bb0f4348e..44288cbb59 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -867,7 +867,7 @@ then
# handle only executables, unless they are shell libraries that
# need to be in the exec-path.
test -x "$1" ||
-   test "# " = "$(head -c 2 <"$1")" ||
+   test "# " = "$(test_copy_bytes 2 <"$1")" ||
return;
 
base=$(basename "$1")
@@ -882,7 +882,7 @@ then
# do not override scripts
if test -x "$symlink_target" &&
test ! -d "$symlink_target" &&
-   test "#!" != "$(head -c 2 < "$symlink_target")"
+   test "#!" != "$(test_copy_bytes 2 <"$symlink_target")"
then
symlink_target=../valgrind.sh
fi
-- 
2.18.0.865.gffc8e1a3cd6

[PATCH v2 2/2] tests: fix and add lint for non-portable seq

GNU seq is not a POSIX command, and doesn't exist on
e.g. OpenBSD. We've had the test_seq wrapper since d17cf5f3a3 ("tests:
Introduce test_seq", 2012-08-04), but use of it keeps coming back,
e.g. in the recently added "fetch negotiator" tests being added here.

So let's also add a check to "make test-lint". The regex is aiming to
capture the likes of $(seq ..) and "seq" as a stand-alone command,
without capturing some existing cases where we e.g. have files called
"seq".

Signed-off-by: Ævar Arnfjörð Bjarmason 
---

Now with a fix & check in v2 for the seq issue mentioned in
https://public-inbox.org/git/87a7pdfltp@evledraar.gmail.com/

 t/check-non-portable-shell.pl|  1 +
 t/t5552-skipping-fetch-negotiator.sh | 12 ++--
 t/t5703-upload-pack-ref-in-want.sh   |  4 ++--
 3 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/t/check-non-portable-shell.pl b/t/check-non-portable-shell.pl
index c8f10d40a1..75f38298d7 100755
--- a/t/check-non-portable-shell.pl
+++ b/t/check-non-portable-shell.pl
@@ -42,6 +42,7 @@ sub err {
/\btest\s+[^=]*==/ and err '"test a == b" is not portable (use =)';
/\bwc -l.*"\s*=/ and err '`"$(wc -l)"` is not portable (use 
test_line_count)';
/\bhead\s+-c\b/ and err 'head -c is not portable (use test_copy_bytes 
BYTES out)';
+   /(?:\$\(seq|^\s*seq\b)/ and err 'seq is not portable (use test_seq)';
/\bexport\s+[A-Za-z0-9_]*=/ and err '"export FOO=bar" is not portable 
(use FOO=bar && export FOO)';
/^\s*([A-Z0-9_]+=(\w+|(["']).*?\3)\s+)+(\w+)/ and exists($func{$4}) and
err '"FOO=bar shell_func" assignment extends beyond 
"shell_func"';
diff --git a/t/t5552-skipping-fetch-negotiator.sh 
b/t/t5552-skipping-fetch-negotiator.sh
index 5ad5bece55..30857b84a8 100755
--- a/t/t5552-skipping-fetch-negotiator.sh
+++ b/t/t5552-skipping-fetch-negotiator.sh
@@ -46,7 +46,7 @@ test_expect_success 'commits with no parents are sent 
regardless of skip distanc
test_commit -C server to_fetch &&
 
git init client &&
-   for i in $(seq 7)
+   for i in $(test_seq 7)
do
test_commit -C client c$i
done &&
@@ -89,7 +89,7 @@ test_expect_success 'when two skips collide, favor the larger 
one' '
test_commit -C server to_fetch &&
 
git init client &&
-   for i in $(seq 11)
+   for i in $(test_seq 11)
do
test_commit -C client c$i
done &&
@@ -168,14 +168,14 @@ test_expect_success 'do not send "have" with ancestors of 
commits that server AC
test_commit -C server to_fetch &&
 
git init client &&
-   for i in $(seq 8)
+   for i in $(test_seq 8)
do
git -C client checkout --orphan b$i &&
test_commit -C client b$i.c0
done &&
-   for j in $(seq 19)
+   for j in $(test_seq 19)
do
-   for i in $(seq 8)
+   for i in $(test_seq 8)
do
git -C client checkout b$i &&
test_commit -C client b$i.c$j
@@ -205,7 +205,7 @@ test_expect_success 'do not send "have" with ancestors of 
commits that server AC
 
# fetch-pack should thus not send any more commits in the b1 branch, but
# should still send the others (in this test, just check b2).
-   for i in $(seq 0 8)
+   for i in $(test_seq 0 8)
do
have_not_sent b1.c$i
done &&
diff --git a/t/t5703-upload-pack-ref-in-want.sh 
b/t/t5703-upload-pack-ref-in-want.sh
index a73c55a47e..d1ccc22331 100755
--- a/t/t5703-upload-pack-ref-in-want.sh
+++ b/t/t5703-upload-pack-ref-in-want.sh
@@ -176,7 +176,7 @@ test_expect_success 'setup repos for 
change-while-negotiating test' '
git clone "http://127.0.0.1:$LIB_HTTPD_PORT/smart/repo; 
"$LOCAL_PRISTINE" &&
cd "$LOCAL_PRISTINE" &&
git checkout -b side &&
-   for i in $(seq 1 33); do test_commit s$i; done &&
+   for i in $(test_seq 1 33); do test_commit s$i; done &&
 
# Add novel commits to upstream
git checkout master &&
@@ -289,7 +289,7 @@ test_expect_success 'setup repos for fetching with 
ref-in-want tests' '
git clone "file://$REPO" "$LOCAL_PRISTINE" &&
cd "$LOCAL_PRISTINE" &&
git checkout -b side &&
-   for i in $(seq 1 33); do test_commit s$i; done &&
+   for i in $(test_seq 1 33); do test_commit s$i; done &&
 
# Add novel commits to upstream
git checkout master &&
-- 
2.18.0.865.gffc8e1a3cd6

Re: Questions about the hash function transition

On Thu, Aug 23 2018, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  writes:
>
>>> - The trailer consists of the following:
>>>   - A copy of the 20-byte SHA-256 checksum at the end of the
>>> corresponding packfile.
>>>
>>>   - 20-byte SHA-256 checksum of all of the above.
>>
>> We need to update both of these to 32 byte, right? Or are we planning to
>> truncate the checksums?
>
> https://public-inbox.org/git/ca+55afwc7uq61ebnj36pfu_abcxgya4jut-tvppj21hkhre...@mail.gmail.com/

Thanks.

Yeah for this checksum purpose even 10 or 5 characters would do, but
since we'll need a new pack format anyway for SHA-256 why not just use
the full length of the SHA-256 here? We're using the full length of the
SHA-1.

I don't see it mattering for security / corruption detection purposes,
but just to avoid confusion. We'll have this one place left where
something looks like a SHA-1, but is actually a trunctated SHA-256.

Re: [PATCH v2 1/1] t2024: mark a `checkout -p` test as requiring Perl



On Thu, Aug 23 2018, Johannes Schindelin via GitGitGadget wrote:

> From: Johannes Schindelin 
>
> A recently-added test case tries to verify that the output of `checkout
> -p` contains a certain piece of advice.
>
> But if Git was built without Perl and therefore lacks support for `git
> add -i`, the error output contains the hint that `-p` is not even
> available instead.
>
> Let's just skip that test case altogether if Git was built with NO_PERL.
>
> Signed-off-by: Johannes Schindelin 
> ---
>  t/t2024-checkout-dwim.sh | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/t/t2024-checkout-dwim.sh b/t/t2024-checkout-dwim.sh
> index 26dc3f1fc0..29e1e25300 100755
> --- a/t/t2024-checkout-dwim.sh
> +++ b/t/t2024-checkout-dwim.sh
> @@ -76,7 +76,8 @@ test_expect_success 'checkout of branch from multiple 
> remotes fails #1' '
>   test_branch master
>  '
>
> -test_expect_success 'checkout of branch from multiple remotes fails with 
> advice' '
> +test_expect_success NO_PERL \
> + 'checkout of branch from multiple remotes fails with advice' '
>   git checkout -B master &&
>   test_might_fail git branch -D foo &&
>   test_must_fail git checkout foo 2>stderr &&

This issue is already fixed in master as 3338e9950e ("t2024: mark test
using "checkout -p" with PERL prerequisite", 2018-08-18).

Re: [PATCH/RFC] commit: new option to abort -a something is already staged

2018-08-23 Thread Duy Nguyen

On Thu, Aug 23, 2018 at 4:15 AM Jonathan Nieder  wrote:
> > The remaining question becomes scripts.  A script might do
> >
> >   ... modify old-file and create new-file ...
> >   git add new-file
> >   git commit -m "some great message"
> >
> > which would trip this error.  For that matter, humans might do that,
> > too.  Could the check detect this case (where the only changes in the
> > index are additions of new files) and treat it as non-destructive?
>
> (where the only changes in the index are additions of new files *that
> match the worktree*)

I don't think my change would affect this case. If "git commit" does
not change the index by itself, there's no point in stopping it.
-- 
Duy

[PATCH v2 0/1] Make t2024 NO_PERL-safe

2018-08-23 Thread Johannes Schindelin via GitGitGadget

While trying to run the build & test with NO_PERL, I noticed that t2024 had
a failing test case. This patch works around that failing test case by
skipping it when we know that the error message looks different than that
test case would expect.

Changes since v1 (which did not make it to the list due to
https://github.com/gitgitgadget/gitgitgadget/issues/29):

* reworded the commit message slightly.

Johannes Schindelin (1):
t2024: mark a `checkout -p` test as requiring Perl

t/t2024-checkout-dwim.sh | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

base-commit: 8d7b558baebe3abbbad4973ce1e1f87a7da17f47
Published-As:
https://github.com/gitgitgadget/git/releases/tags/pr-20%2Fdscho%2Fcheckout-default-remote-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git
pr-20/dscho/checkout-default-remote-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/20

Range-diff vs v1:

1: 619d7bcc31 ! 1: 8d46b31f5a t2024: mark a `checkout -p` test as requiring
Perl
@@ -3,9 +3,11 @@
t2024: mark a `checkout -p` test as requiring Perl

A recently-added test case tries to verify that the output of
`checkout
--p` contains a certain piece of advice. But if Git was built without
-Perl and therefore lacks support for `git add -i`, the error output
-contains the hint that `-p` is not even available instead.
+-p` contains a certain piece of advice.
+
+But if Git was built without Perl and therefore lacks support for `git
+add -i`, the error output contains the hint that `-p` is not even
+available instead.

Let's just skip that test case altogether if Git was built with
NO_PERL.

--
gitgitgadget

[PATCH v2 1/1] t2024: mark a `checkout -p` test as requiring Perl

2018-08-23 Thread Johannes Schindelin via GitGitGadget

From: Johannes Schindelin 

A recently-added test case tries to verify that the output of `checkout
-p` contains a certain piece of advice.

But if Git was built without Perl and therefore lacks support for `git
add -i`, the error output contains the hint that `-p` is not even
available instead.

Let's just skip that test case altogether if Git was built with NO_PERL.

Signed-off-by: Johannes Schindelin 
---
 t/t2024-checkout-dwim.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/t/t2024-checkout-dwim.sh b/t/t2024-checkout-dwim.sh
index 26dc3f1fc0..29e1e25300 100755
--- a/t/t2024-checkout-dwim.sh
+++ b/t/t2024-checkout-dwim.sh
@@ -76,7 +76,8 @@ test_expect_success 'checkout of branch from multiple remotes 
fails #1' '
test_branch master
 '
 
-test_expect_success 'checkout of branch from multiple remotes fails with 
advice' '
+test_expect_success NO_PERL \
+   'checkout of branch from multiple remotes fails with advice' '
git checkout -B master &&
test_might_fail git branch -D foo &&
test_must_fail git checkout foo 2>stderr &&
-- 
gitgitgadget

Re: [PATCH] range-diff: update stale summary of --no-dual-color

Hi Jonathan,

On Wed, 22 Aug 2018, Jonathan Nieder wrote:

>   OPT_INTEGER(0, "creation-factor", _factor,
>   N_("Percentage by which creation is weighted")),
> - OPT_BOOL(0, "no-dual-color", _color,
> - N_("color both diff and diff-between-diffs")),
> + OPT_BOOL(0, "dual-color", _color,
> + N_("color both diff and diff-between-diffs 
> (default)")),

There is one very good reason *not* to do that. And that reason is the
output of `git range-diff -h`. If anybody read that the option
`--dual-color` exists, they are prone to believe that the default is *not*
dual color. In contrast, when reading `--no-dual-color`, it is clear that
dual color mode is the default.

Ciao,
Dscho

>   OPT_END()
>   };
>   int i, j, res = 0;
> @@ -63,8 +63,8 @@ int cmd_range_diff(int argc, const char **argv, const char 
> *prefix)
>options + ARRAY_SIZE(options) - 1, /* OPT_END */
>builtin_range_diff_usage, 0);
>  
> - if (simple_color < 1) {
> - if (!simple_color)
> + if (dual_color != 0) {
> + if (dual_color > 0)
>   /* force color when --dual-color was used */
>   diffopt.use_color = 1;
>   diffopt.flags.dual_color_diffed_diffs = 1;
>

Re: What's cooking in git.git (Aug 2018, #05; Mon, 20)

Hi Junio,

On Wed, 22 Aug 2018, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> > FWIW I am a lot more bold about these builtins, and want to get them
> > into Git for Windows v2.19.0, either as full replacements, or like I
> > did with the difftool: by offering them as experimental options in the
> > installer.
> >
> > Maybe this will remain a dream, but maybe, just maybe, I will manage
> > to do this, with your help.
> 
> Heh, I do not think you need much help from me to cut GfW releases.

I know. And that's why I asked for your help with the three massive patch
sets in flight that convert scripted things to builtins instead.

> In any case, thanks for a quick "last mile" patch, which unblocked
> 'pu' that has been in sorry state for about a week or so.  I've
> inserted a new topic branch between -5 and -6 that (1) merges the
> other track into -5 and (2) applies your "call it in a new way"
> patch, and then rebased -6 on top of the result.  If you have a
> chance, it is appreciated if you can give a quick eyeballing near
> the tip of 'pu' today.

It looks good. Thanks.
Dscho

Re: Questions about the hash function transition

Ævar Arnfjörð Bjarmason  writes:

>> - The trailer consists of the following:
>>   - A copy of the 20-byte SHA-256 checksum at the end of the
>> corresponding packfile.
>>
>>   - 20-byte SHA-256 checksum of all of the above.
>
> We need to update both of these to 32 byte, right? Or are we planning to
> truncate the checksums?

https://public-inbox.org/git/ca+55afwc7uq61ebnj36pfu_abcxgya4jut-tvppj21hkhre...@mail.gmail.com/

Re: [PATCH] diff.c: pass sign_index to emit_line_ws_markup

Hi Stefan,

On Wed, 22 Aug 2018, Stefan Beller wrote:

> Instead of passing the sign directly to emit_line_ws_markup, pass only the
> index to lookup the sign in diff_options->output_indicators.
> 
> Signed-off-by: Stefan Beller 

Looks good to me!

> ---
>  diff.c | 12 +---
>  1 file changed, 5 insertions(+), 7 deletions(-)
>  
> So something like this on top of sb/range-diff-colors ?  If a resend is
> needed I'll squash this in (or carry it as a cleanup patch early in the
> series), otherwise we could put this on top.

I'd leave this as a separate commit.

Ciao,
Dscho

> 
> Thanks,
> Stefan
>  
> 
> diff --git a/diff.c b/diff.c
> index 03486c35b75..17e33d506a1 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -1199,10 +1199,11 @@ static void dim_moved_lines(struct diff_options *o)
>  static void emit_line_ws_markup(struct diff_options *o,
>   const char *set_sign, const char *set,
>   const char *reset,
> - char sign, const char *line, int len,
> + int sign_index, const char *line, int len,
>   unsigned ws_rule, int blank_at_eof)
>  {
>   const char *ws = NULL;
> + int sign = o->output_indicators[sign_index];
>  
>   if (o->ws_error_highlight & ws_rule) {
>   ws = diff_get_color_opt(o, DIFF_WHITESPACE);
> @@ -1282,8 +1283,7 @@ static void emit_diff_symbol_from_struct(struct 
> diff_options *o,
>   set = diff_get_color_opt(o, DIFF_FILE_OLD);
>   }
>   emit_line_ws_markup(o, set_sign, set, reset,
> - 
> o->output_indicators[OUTPUT_INDICATOR_CONTEXT],
> - line, len,
> + OUTPUT_INDICATOR_CONTEXT, line, len,
>   flags & (DIFF_SYMBOL_CONTENT_WS_MASK), 0);
>   break;
>   case DIFF_SYMBOL_PLUS:
> @@ -1327,8 +1327,7 @@ static void emit_diff_symbol_from_struct(struct 
> diff_options *o,
>   flags &= ~DIFF_SYMBOL_CONTENT_WS_MASK;
>   }
>   emit_line_ws_markup(o, set_sign, set, reset,
> - o->output_indicators[OUTPUT_INDICATOR_NEW],
> - line, len,
> + OUTPUT_INDICATOR_NEW, line, len,
>   flags & DIFF_SYMBOL_CONTENT_WS_MASK,
>   flags & DIFF_SYMBOL_CONTENT_BLANK_LINE_EOF);
>   break;
> @@ -1372,8 +1371,7 @@ static void emit_diff_symbol_from_struct(struct 
> diff_options *o,
>   set = diff_get_color_opt(o, DIFF_CONTEXT_DIM);
>   }
>   emit_line_ws_markup(o, set_sign, set, reset,
> - o->output_indicators[OUTPUT_INDICATOR_OLD],
> - line, len,
> + OUTPUT_INDICATOR_OLD, line, len,
>   flags & DIFF_SYMBOL_CONTENT_WS_MASK, 0);
>   break;
>   case DIFF_SYMBOL_WORDS_PORCELAIN:
> -- 
> 2.18.0.265.g16de1b435c9.dirty
> 
>

Questions about the hash function transition

I wanted to send another series to clarify things in
hash-function-transition.txt, but for some of the issues I don't know
the answer, and I had some questions after giving this another read.

So let's discuss that here first. Quoting from the document (available
at
https://github.com/git/git/blob/v2.19.0-rc0/Documentation/technical/hash-function-transition.txt)

> Git hash function transition
> 
>
> Objective
> -
> Migrate Git from SHA-1 to a stronger hash function.

Should way say "Migrate Git from SHA-1 to SHA-256" here instead?

Maybe it's overly specific, i.e. really we're also describnig how /any/
hash function transition might happen, but having just read this now
from start to finish it takes us a really long time to mention (and at
first, only offhand) that SHA-256 is the new hash.

> [...]
> Goals
> -
> 1. The transition to SHA-256 can be done one local repository at a time.
>a. Requiring no action by any other party.
>b. A SHA-256 repository can communicate with SHA-1 Git servers
>   (push/fetch).
>c. Users can use SHA-1 and SHA-256 identifiers for objects
>   interchangeably (see "Object names on the command line", below).
>d. New signed objects make use of a stronger hash function than
>   SHA-1 for their security guarantees.
> 2. Allow a complete transition away from SHA-1.
>a. Local metadata for SHA-1 compatibility can be removed from a
>   repository if compatibility with SHA-1 is no longer needed.
> 3. Maintainability throughout the process.
>a. The object format is kept simple and consistent.
>b. Creation of a generalized repository conversion tool.
>
> Non-Goals
> -
> 1. Add SHA-256 support to Git protocol. This is valuable and the
>logical next step but it is out of scope for this initial design.

This is a non-goal according to the docs, but now that we have protocol
v2 in git, perhaps we could start specifying or describing how this
protocol extension will work?

> [...]
> 3. Intermixing objects using multiple hash functions in a single
>repository.

But isn't that the goal now per "Translation table" & writing both SHA-1
and SHA-256 versions of objects?

> [...]
> Pack index
> ~~
> Pack index (.idx) files use a new v3 format that supports multiple
> hash functions. They have the following format (all integers are in
> network byte order):
>
> - A header appears at the beginning and consists of the following:
>   - The 4-byte pack index signature: '\377t0c'
>   - 4-byte version number: 3
>   - 4-byte length of the header section, including the signature and
> version number
>   - 4-byte number of objects contained in the pack
>   - 4-byte number of object formats in this pack index: 2
>   - For each object format:
> - 4-byte format identifier (e.g., 'sha1' for SHA-1)

So, given that we have 4-byte limit and have decided on SHA-256 are we
just going to call this 'sha2'? That might be confusingly ambiguous
since SHA2 is a standard with more than just SHA-256, maybe 's256', or
maybe we should give this 8 bytes with trailing \0s so we can have
"SHA-1\0\0\0" and "SHA-256\0"?

> [...]
> - The trailer consists of the following:
>   - A copy of the 20-byte SHA-256 checksum at the end of the
> corresponding packfile.
>
>   - 20-byte SHA-256 checksum of all of the above.

We need to update both of these to 32 byte, right? Or are we planning to
truncate the checksums?

This seems like just a mistake when we did s/NewHash/SHA-256/g, but then
again it was originally "20-byte NewHash checksum" ever since 752414ae43
("technical doc: add a design doc for hash function transition",
2017-09-27), so what do we mean here?

> Loose object index
> ~~
> A new file $GIT_OBJECT_DIR/loose-object-idx contains information about
> all loose objects. Its format is
>
>   # loose-object-idx
>   (sha256-name SP sha1-name LF)*
>
> where the object names are in hexadecimal format. The file is not
> sorted.
>
> The loose object index is protected against concurrent writes by a
> lock file $GIT_OBJECT_DIR/loose-object-idx.lock. To add a new loose
> object:
>
> 1. Write the loose object to a temporary file, like today.
> 2. Open loose-object-idx.lock with O_CREAT | O_EXCL to acquire the lock.
> 3. Rename the loose object into place.
> 4. Open loose-object-idx with O_APPEND and write the new object
> 5. Unlink loose-object-idx.lock to release the lock.
>
> To remove entries (e.g. in "git pack-refs" or "git-prune"):
>
> 1. Open loose-object-idx.lock with O_CREAT | O_EXCL to acquire the
>lock.
> 2. Write the new content to loose-object-idx.lock.
> 3. Unlink any loose objects being removed.
> 4. Rename to replace loose-object-idx, releasing the lock.

Do we expect multiple concurrent writers to poll the lock if they can't
aquire it right away? I.e. concurrent "git commit" would block? Has this
overall approach been benchmarked somewhere?

I wonder if some lock-less variant of this would

Re: [ANNOUNCE] Git v2.19.0-rc0

Jeff King  writes:

> Here are numbers for p0001.2 run against linux.git on a few
> versions. This is using -O2 with gcc 8.2.0.
>
>   Test v2.18.0 v2.19.0-rc0   HEAD
>   
> --
>   0001.2:  34.24(33.81+0.43)   34.83(34.42+0.40) +1.7%   33.90(33.47+0.42) 
> -1.0%

I see what you did to the formatting here, which is a topic of
another thread ;-).

Thanks, as Derrick also noted, I agree this is an appropriate
workaround for the upcoming release and we may want to explore
hasheq() and other solutions as part of the effort during the next
cycle (which can start now if people are bored---after all working
on the codebase is the best way to hunt for recent bugs).

Thanks, will queue.

> diff --git a/cache.h b/cache.h
> index b1fd3d58ab..4d014541ab 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1023,6 +1023,16 @@ extern const struct object_id null_oid;
>  
>  static inline int hashcmp(const unsigned char *sha1, const unsigned char 
> *sha2)
>  {
> + /*
> +  * This is a temporary optimization hack. By asserting the size here,
> +  * we let the compiler know that it's always going to be 20, which lets
> +  * it turn this fixed-size memcmp into a few inline instructions.
> +  *
> +  * This will need to be extended or ripped out when we learn about
> +  * hashes of different sizes.
> +  */
> + if (the_hash_algo->rawsz != 20)
> + BUG("hash size not yet supported by hashcmp");
>   return memcmp(sha1, sha2, the_hash_algo->rawsz);
>  }

Re: [ANNOUNCE] Git v2.19.0-rc0

Derrick Stolee  writes:

> I was thinking that having a mitigation for 2.19 is best, and then we
> can focus as part of the 2.20 cycle how we can properly avoid this
> cost, especially when 32 is a valid option.
> ...
> ... to
> take a 'hasheq' approach [2] like Peff suggested [3]. Since that
> requires auditing all callers of hashcmp to see if hasheq is
> appropriate, it is not a good solution for 2.19 but (in my opinion)
> should be evaluated as part of the 2.20 cycle.

Thanks for thoughtful comments.  I think it makes sense to go with
the "tell compiler hashcmp() currently is only about 20-byte array"
for now, as it is trivial to see why it cannot break things.

During 2.20 cycle, we may find out that hasheq() abstraction
performs better than hashcmp() with multiple lengths, and end up
doing that "audit and replace to use hasheq()".  But as you said,
it probably isnot a good idea to rush it in this cycle.

[PATCH v2] range-diff: update stale summary of --no-dual-color

2018-08-23 Thread Kyle Meyer

275267937b (range-diff: make dual-color the default mode, 2018-08-13)
replaced --dual-color with --no-dual-color but left the option's
summary untouched.  Rewrite the summary to describe --no-dual-color
rather than dual-color.

Helped-by: Jonathan Nieder 
Signed-off-by: Kyle Meyer 
---
 builtin/range-diff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/range-diff.c b/builtin/range-diff.c
index f52d45d9d6..057afdbf46 100644
--- a/builtin/range-diff.c
+++ b/builtin/range-diff.c
@@ -25,7 +25,7 @@ int cmd_range_diff(int argc, const char **argv, const char 
*prefix)
OPT_INTEGER(0, "creation-factor", _factor,
N_("Percentage by which creation is weighted")),
OPT_BOOL(0, "no-dual-color", _color,
-   N_("color both diff and diff-between-diffs")),
+   N_("color only based on the diff-between-diffs")),
OPT_END()
};
int i, j, res = 0;
-- 
2.18.0

Re: [PATCH v3 7/7] submodule: support reading .gitmodules even when it's not checked out

2018-08-23 Thread Antonio Ospite

On Wed, 22 Aug 2018 08:29:25 -0700
Junio C Hamano  wrote:

> Antonio Ospite  writes:
> 
[...]
> >> > > +  else if (get_oid(GITMODULES_HEAD, ) >= 0)
> >> > > +  config_source.blob = GITMODULES_HEAD;
> >> > 
> >> Would using ":.gitmodules" instead of "HEAD:.gitmodules" be enough?
> 
> Yeah, either "instead of", or "in addition" (i.e. "try the index
> version in addition, before falling further back to the HEAD
> version"), would be more consistent with the remainder of the system
> (or, at least where the remainder of the system wants to go).
>

OK, I now tested with both "rm .gitmodules" and "git rm .gitmodules"
and I see why one would want to try _both_ ":.gitmodules" and
"HEAD:.gitmodules".

I'll go with "in addition" then, adding tests for both the scenarios.

> >> If so, what name should I use instead of GITMODULES_HEAD?
> >> GITMODULES_BLOB is already taken for something different, maybe
> >> GITMODULES_REF or GITMODULES_OBJECT?
> 
> I do not know why you want to refrain from spelling them out as
> "HEAD:.gitmodules" and ":.gitmodules"; at least to me the extra
> layer of names do not look like they are making the code easier
> to understand that much.
> 

This is in the spirit of commit 4c0eeafe47 (cache.h: add
GITMODULES_FILE macro, 2017-08-02), IIRC this was done mainly to get
help from the preprocessor to spot typos: I caught myself writing
".gitmdoules" several times; GITMDOULES_FILE would not compile.

If this makes sense I'll use GITMODULES_INDEX and GITMODULES_HEAD.

Thanks,
   Antonio

-- 
Antonio Ospite
https://ao2.it
https://twitter.com/ao2it

A: Because it messes up the order in which people normally read text.
   See http://en.wikipedia.org/wiki/Posting_style
Q: Why is top-posting such a bad thing?

Re: [feature] how to output absolute paths in git diff? => --show-abs-path

2018-08-23 Thread Timothee Cour

> Wanting such a feature seems sensible

happy to hear that!

> So this seems to work for --no-index, or if it doesn't what situations 
> doesn't it work in?

cases where path arguments are already absolute (as I showed earlier)


> Is this a mistake, or would you only like --show-abs-paths to implicitly 
> supply --src-prefix, but not --dst-prefix? If so, why?

notice I didn't use `--show-abs-paths` in that example; I'm showing
what `git diff` currently outputs (the `could show` meant depending on
your use case; eg when `get_file1` returns an absolute path and
`get_file2` returns a relative one)

> Ah, so it's about supplying both the prefix *and* absolute paths, whereas I 
> see without --no-index we seem to handle this sort of thing just fine:

indeed, without `--no-index` things work just fine as I noted in
https://stackoverflow.com/questions/22698505/how-to-show-full-paths-in-git-diff.
The problem is with `--no-index`
I tried messing around with `--src-prefix` and `--dst-prefix` to
remedy this but as I showed, it can't work currently.

> without --no-index we seem to handle this sort of thing just fine:

you also need `--relative` in case you're not at repo root in your
snippet (that' what I'm using, without `--no-index`)


> This is because the default prefixes are a/ and b/, respectively

that seems buggy:
with default options I get:
--- a/Users/timothee/help0.txt
+++ b/help1.txt

with `--src-prefix=FOO ` and `--dst-prefix=FOO ` I get:
--- FOOUsers/timothee/help0.txt
+++ FOOhelp1.txt

this seems buggy because there's not good option for FOO:
when FOO = /, relative paths become a broken absolute path (/help1.txt)
when FOO = ./, absolute paths become a broken relative path
(./Users/timothee/help0.txt)

I propose instead to show:
with `--src-prefix=FOO ` and `--dst-prefix=FOO ` I get:
--- join(FOO,path1)
+++ join(FOO,path2)

where join(prefix, path) simply appends prefix to path, taking care of
avoiding a double `//` in case prefix ends in / and path starts with
/,

that way, the defauls (with a/, b/) are unchanged and we can have:
with `--src-prefix=` and `--dst-prefix=` (empty FOO):
--- /Users/timothee/help0.txt
+++ help1.txt
=> the paths are not broken

## summary:

* `--show-abs-paths` would be useful
* `--src-prefix=FOO ` and `--dst-prefix=FOO ` could use join(FOO,path)
instead of the currently used join(FOO,path1.removeLeadingSlash)
On Thu, Aug 23, 2018 at 2:42 AM Ævar Arnfjörð Bjarmason
 wrote:
>
> On Thu, Aug 23, 2018 at 11:16 AM Timothee Cour  
> wrote:
> >
> > This has all the context:
> > https://stackoverflow.com/questions/22698505/how-to-show-full-paths-in-git-diff
>
> It's helpful to copy it anyway, so we can discuss it here:
>
> QUOTE
>
> How do I show full paths in git diff? One can use '--dst-prefix=$PWD'
> and '--src-prefix=$PWD' but this is fragile as it won't work in many
> cases, eg with --no-index, or when running the commond from a
> subdirectory without using --relative=realpath_to_cwd
>
> END QUOTE
>
> Wanting such a feature seems sensible. But I'm unclear on the details.
>
> You say that --{src,dst}-prefix is fragile and doesn't work for
> --no-index. But if I do this:
>
> (
> cd /tmp &&
> echo foo >a &&
> echo bar >b &&
> git --no-pager diff --src-prefix=$PWD/ --dst-prefix=$PWD/ a b
> )
>
> I get this diff:
>
> diff --git /tmp/a /tmp/b
> new file mode 100644
> index 257cc56..5716ca5 100644
> --- /tmp/a
> +++ /tmp/b
> @@ -1 +1 @@
> -foo
> +bar
>
> So this seems to work for --no-index, or if it doesn't what situations
> doesn't it work in?
>
> > I'd like `--show-abs-path` to show absolute paths in:
> > git diff --show-abs-path args...
> >
> > eg:
> > git diff --no-index `get_file1` `get_file2`
> > could show:
> > --- a/Users/timothee/temp/ripgrep/help0.txt
> > +++ b/help1.txt
>
> Is this a mistake, or would you only like --show-abs-paths to
> implicitly supply --src-prefix, but not --dst-prefix? If so, why?
>
> > * passing '--dst-prefix=$PWD' and '--src-prefix=$PWD' doesn't help
> > because path arguments could be absolute, so it'll create
> > $PWD/Users/timothee/temp/ripgrep/help0.txt (wrong)
>
> Ah, so it's about supplying both the prefix *and* absolute paths,
> whereas I see without --no-index we seem to handle this sort of thing
> just fine:
>
> git diff --src-prefix=$PWD/ --dst-prefix=$PWD HEAD~.. $PWD/some-file
>
> > * passing '--dst-prefix=.' will behave weirdly, replacing leading `/`
> > by `.` (seems wrong)
> > diff --git .Users/timothee/temp/ripgrep/help0.txt b/help1.txt
>
> This is because the default prefixes are a/ and b/, respectively, and
> the option allows you to entirely replace them. E.g. imagine needing
> "../some-relative-path/"
>
> > NOTE: I'm invoking the `git diff` command via a more complicated case
> > (with multiple arguments including git diff flags and git diff files),
> > so it's awkward for me to parse which arguments correspond to a file
> > vs a flag (ie prevents easily converting

Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-23 Thread Derrick Stolee

On 8/23/2018 1:04 AM, Jeff King wrote:

On Thu, Aug 23, 2018 at 03:47:07AM +, brian m. carlson wrote:

I expect that's going to be the case as well. I have patches that
wire up actual SHA-256 support in my hash-impl branch.

However, having said that, I'm happy to defer to whatever everyone else
thinks is best for 2.19. The assert solution would be fine with me in
this situation, and if we need to pull it out in the future, that's okay
with me.

I don't really have a strong opinion on this either way, so if someone
else does, please say so. I have somewhat more limited availability
over the next couple days, as I'm travelling on business, but I'm happy
to review a patch (and it seems like Peff has one minus the actual
commit message).

I just posted the patch elsewhere in the thread.

Thank you for that!

I think you can safely
ignore the rest of it if you are otherwise occupied. Even if v2.19 ships
without some mitigation, I don't know that it's all that big a deal,
given the numbers I generated (which for some reason are less dramatic
than Stolee's).
My numbers may be more dramatic because my Linux environment is a
virtual machine.

I was thinking that having a mitigation for 2.19 is best, and then we
can focus as part of the 2.20 cycle how we can properly avoid this cost,
especially when 32 is a valid option.

Around the time that my proposed approaches were getting vetoed for
alignment issues, I figured I was out of my depth here. I reached out to
Daniel Lemire (of EWAH bitmap fame) on Twitter [1]. His blog is full of
posts of word-based approaches to different problems, so I thought he
might know something off the top of his head that would be applicable.
His conclusion (after looking only a short time) was to take a 'hasheq'
approach [2] like Peff suggested [3]. Since that requires auditing all
callers of hashcmp to see if hasheq is appropriate, it is not a good
solution for 2.19 but (in my opinion) should be evaluated as part of the
2.20 cycle.

Of course, if someone with knowledge of word-alignment issues across the
platforms we support knows how to enforce an alignment for object_id,
then something word-based like [4] could be reconsidered.

Thanks, everyone!
-Stolee

[1] https://twitter.com/stolee/status/1032312965754748930

[2]
https://lemire.me/blog/2018/08/22/avoid-lexicographical-comparisons-when-testing-for-string-equality/

[3]
https://public-inbox.org/git/20180822030344.ga14...@sigill.intra.peff.net/

[4]
https://public-inbox.org/git/7ea416cf-b043-1274-e161-85a8780b8...@gmail.com/

[PATCH] t/lib-rebase.sh: support explicit 'pick' commands in 'fake_editor.sh'

2018-08-23 Thread SZEDER Gábor

The verbose output of the test 'reword without issues functions as
intended' in 't3423-rebase-reword.sh', added in a9279c6785 (sequencer:
do not squash 'reword' commits when we hit conflicts, 2018-06-19),
contains the following error output:

  sed: -e expression #1, char 2: extra characters after command

This error comes from within the 'fake-editor.sh' script created by
'lib-rebase.sh's set_fake_editor() function, and the root cause is the
FAKE_LINES="pick 1 reword 2" variable in the test in question, in
particular the "pick" word.  'fake-editor.sh' assumes 'pick' to be the
default rebase command and doesn't support an explicit 'pick' command
in FAKE_LINES.  As a result, 'pick' will be used instead of a line
number when assembling the following 'sed' script:

  sed -n picks/^pick/pick/p

which triggers the aforementioned error.

Luckily, this didn't affect the test's correctness: the erroring 'sed'
command doesn't write anything to the todo script, and processing the
rest of FAKE_LINES generates the desired todo script, as if that
'pick' command were not there at all.

The minimal fix would be to remove the 'pick' word from FAKE_LINES,
but that would leave us susceptible to similar issues in the future.

Instead, teach the fake-editor script to recognize an explicit 'pick'
command, which is still a fairly trivial change.

In the future we might want to consider reinforcing this fake editor
script with an &&-chain and stricter parsing of the FAKE_LINES
variable (e.g. to error out when encountering unknown rebase commands
or commands and line numbers in the wrong order).

Signed-off-by: SZEDER Gábor 
---
 t/lib-rebase.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/lib-rebase.sh b/t/lib-rebase.sh
index 25a77ee5cb..592865d019 100644
--- a/t/lib-rebase.sh
+++ b/t/lib-rebase.sh
@@ -47,7 +47,7 @@ set_fake_editor () {
action=pick
for line in $FAKE_LINES; do
case $line in
-   squash|fixup|edit|reword|drop)
+   pick|squash|fixup|edit|reword|drop)
action="$line";;
exec*)
echo "$line" | sed 's/_/ /g' >> "$1";;
-- 
2.19.0.rc0.136.gd2dd172e64

Test failures on OpenBSD

This is on OpenBSD 6.2 amd64 with my "tests: fix and add lint for
non-portable head -c N" patch (which fixes one failure).

$ for t in t1305-config-include.sh t1308-config-set.sh 
t5004-archive-corner-cases.sh t5552-skipping-fetch-negotiator.sh; do ./$t 
--no-color -v -x 2>&1 | grep -B10 "^not ok"; done

+ cd bar
+ echo [includeIf "gitdir:bar/"]path=bar7
+ >> .git/config 
+ echo [test]seven=7
+ > .git/bar7 
+ echo 7
+ > expect 
+ git config test.seven
+ > actual 
error: last command exited with $?=1
not ok 27 - conditional include, gitdir matching symlink
--
+ cd bar
+ echo [includeIf "gitdir/i:BAR/"]path=bar8
+ >> .git/config 
+ echo [test]eight=8
+ > .git/bar8 
+ echo 8
+ > expect 
+ git config test.eight
+ > actual 
error: last command exited with $?=1
not ok 28 - conditional include, gitdir matching symlink, icase
test_cmp expect actual

+ echo Error (-1) reading configuration file a-directory.
+ > expect 
+ mkdir a-directory
+ test_expect_code 2 test-tool config configset_get_value foo.bar 
a-directory
+ 2> output 
Value not found for "foo.bar"
test_expect_code: command exited with 1, we wanted 2 test-tool config 
configset_get_value foo.bar a-directory
error: last command exited with $?=1
not ok 23 - proper error on directory "files"
+ make_dir extract
+ tar xf subtree-all.tar -C extract
tar: Cannot identify format. Searching...
tar: End of archive volume 1 reached
tar: Sorry, unable to determine archive format.
error: last command exited with $?=1
+ rm -rf extract
+ exit 1
+ eval_ret=1
+ :
not ok 9 - archive empty subtree with no pathspec
--
+ make_dir extract
+ tar xf subtree-path.tar -C extract
tar: Cannot identify format. Searching...
tar: End of archive volume 1 reached
tar: Sorry, unable to determine archive format.
error: last command exited with $?=1
+ rm -rf extract
+ exit 1
+ eval_ret=1
+ :
not ok 10 - archive empty subtree by direct pathspec
'git  [...] -- [...]'
fatal: ambiguous argument 'c7': unknown revision or path not in the working 
tree.
Use '--' to separate paths from revisions, like this:
'git  [...] -- [...]'
No have c7 (c7)
error: last command exited with $?=1
+ test_unconfig -C client fetch.negotiationalgorithm
+ exit 1
+ eval_ret=1
+ :
not ok 1 - commits with no parents are sent regardless of skip distance
--
 Author: A U Thor 
 1 file changed, 1 insertion(+)
 create mode 100644 to_fetch.t
+ git init client
Initialized empty Git repository in /home/avar/g/git/t/trash 
directory.t5552-skipping-fetch-negotiator/client/.git/
+ seq 11
./t5552-skipping-fetch-negotiator.sh: seq: not found
+ git -C client checkout c5
error: pathspec 'c5' did not match any file(s) known to git
error: last command exited with $?=1
not ok 3 - when two skips collide, favor the larger one
--
+ git init client
Initialized empty Git repository in /home/avar/g/git/t/trash 
directory.t5552-skipping-fetch-negotiator/client/.git/
+ seq 8
./t5552-skipping-fetch-negotiator.sh: seq: not found
+ seq 19
./t5552-skipping-fetch-negotiator.sh: seq: not found
+ pwd
+ git -C server fetch --no-tags /home/avar/g/git/t/trash 
directory.t5552-skipping-fetch-negotiator/client b1:refs/heads/b1
fatal: Couldn't find remote ref b1
error: last command exited with $?=128
not ok 6 - do not send "have" with ancestors of commits that server ACKed

Full output at https://gitlab.com/snippets/1747801

Some of this, like the t5552-skipping-fetch-negotiator.sh failures are
new in 2.19.0 (those go away with s/seq/test_seq). But some are
older. E.g. the archive corner cases failure is becuse that test wants
unzip & GNU tar, which OpenBSD upstream has patched already:
https://github.com/openbsd/ports/blob/master/devel/git/patches/patch-t_test-lib_sh

Re: [feature] how to output absolute paths in git diff? => --show-abs-path

On Thu, Aug 23, 2018 at 11:16 AM Timothee Cour  wrote:
>
> This has all the context:
> https://stackoverflow.com/questions/22698505/how-to-show-full-paths-in-git-diff

It's helpful to copy it anyway, so we can discuss it here:

QUOTE

How do I show full paths in git diff? One can use '--dst-prefix=$PWD'
and '--src-prefix=$PWD' but this is fragile as it won't work in many
cases, eg with --no-index, or when running the commond from a
subdirectory without using --relative=realpath_to_cwd

END QUOTE

Wanting such a feature seems sensible. But I'm unclear on the details.

You say that --{src,dst}-prefix is fragile and doesn't work for
--no-index. But if I do this:

(
cd /tmp &&
echo foo >a &&
echo bar >b &&
git --no-pager diff --src-prefix=$PWD/ --dst-prefix=$PWD/ a b
)

I get this diff:

diff --git /tmp/a /tmp/b
new file mode 100644
index 257cc56..5716ca5 100644
--- /tmp/a
+++ /tmp/b
@@ -1 +1 @@
-foo
+bar

So this seems to work for --no-index, or if it doesn't what situations
doesn't it work in?

> I'd like `--show-abs-path` to show absolute paths in:
> git diff --show-abs-path args...
>
> eg:
> git diff --no-index `get_file1` `get_file2`
> could show:
> --- a/Users/timothee/temp/ripgrep/help0.txt
> +++ b/help1.txt

Is this a mistake, or would you only like --show-abs-paths to
implicitly supply --src-prefix, but not --dst-prefix? If so, why?

> * passing '--dst-prefix=$PWD' and '--src-prefix=$PWD' doesn't help
> because path arguments could be absolute, so it'll create
> $PWD/Users/timothee/temp/ripgrep/help0.txt (wrong)

Ah, so it's about supplying both the prefix *and* absolute paths,
whereas I see without --no-index we seem to handle this sort of thing
just fine:

git diff --src-prefix=$PWD/ --dst-prefix=$PWD HEAD~.. $PWD/some-file

> * passing '--dst-prefix=.' will behave weirdly, replacing leading `/`
> by `.` (seems wrong)
> diff --git .Users/timothee/temp/ripgrep/help0.txt b/help1.txt

This is because the default prefixes are a/ and b/, respectively, and
the option allows you to entirely replace them. E.g. imagine needing
"../some-relative-path/"

> NOTE: I'm invoking the `git diff` command via a more complicated case
> (with multiple arguments including git diff flags and git diff files),
> so it's awkward for me to parse which arguments correspond to a file
> vs a flag (ie prevents easily converting input file arguments to
> absolute paths), but `git` could do it easily via a flag, eg
> `--show-abs-path`

[feature] how to output absolute paths in git diff? => --show-abs-path

2018-08-23 Thread Timothee Cour

This has all the context:
https://stackoverflow.com/questions/22698505/how-to-show-full-paths-in-git-diff

I'd like `--show-abs-path` to show absolute paths in:
git diff --show-abs-path args...

eg:
git diff --no-index `get_file1` `get_file2`
could show:
--- a/Users/timothee/temp/ripgrep/help0.txt
+++ b/help1.txt

* passing '--dst-prefix=$PWD' and '--src-prefix=$PWD' doesn't help
because path arguments could be absolute, so it'll create
$PWD/Users/timothee/temp/ripgrep/help0.txt (wrong)

* passing '--dst-prefix=.' will behave weirdly, replacing leading `/`
by `.` (seems wrong)
diff --git .Users/timothee/temp/ripgrep/help0.txt b/help1.txt

NOTE: I'm invoking the `git diff` command via a more complicated case
(with multiple arguments including git diff flags and git diff files),
so it's awkward for me to parse which arguments correspond to a file
vs a flag (ie prevents easily converting input file arguments to
absolute paths), but `git` could do it easily via a flag, eg
`--show-abs-path`

[PATCH] tests: fix and add lint for non-portable head -c N

The "head -c BYTES" option is non-portable (not in POSIX[1]). Change
such invocations to use the test_copy_bytes wrapper added in
48860819e8 ("t9300: factor out portable "head -c" replacement",
2016-06-30).

This fixes a test added in 9d2e330b17 ("ewah_read_mmap: bounds-check
mmap reads", 2018-06-14), which has been breaking
t5310-pack-bitmaps.sh on OpenBSD since 2.18.0. The OpenBSD ports
already have a similar workaround after their upgrade to 2.18.0[2].

I have not tested this on IRIX, but according to 4de0bbd898 ("t9300:
use perl "head -c" clone in place of "dd bs=1 count=16000" kluge",
2010-12-13) this invocation would have broken things there too.

Also, change a valgrind-specific codepath in test-lib.sh to use this
wrapper. Given where valgrind runs I don't think this would ever
become a portability issue in practice, but it's easier to just use
the wrapper than introduce some exception for the "make test-lint"
check being added here.

1. http://pubs.opengroup.org/onlinepubs/9699919799/utilities/head.html
2. 
https://github.com/openbsd/ports/commit/08d5d82eaefe5cf2f125ecc0c6a57df9cf91350c#diff-f7d3c4fabeed1691620d608f1534f5e5

Signed-off-by: Ævar Arnfjörð Bjarmason 
---

Junio: Even though this isn't a 2.19.0-rc0 regression I think it makes
sense for 2.19.0. The fix is trivial, and it'll unbreak (at least some
of) the tests on stock git on OpenBSD.

OpenBSD guys: If you CC the git mailing list when you find you need to
apply patches like these, we're happy to fix this more pro-actively. I
just happened to be testing the upcoming 2.19.0 on OpenBSD and spotted
this.

 t/check-non-portable-shell.pl | 1 +
 t/t5310-pack-bitmaps.sh   | 2 +-
 t/test-lib.sh | 4 ++--
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/t/check-non-portable-shell.pl b/t/check-non-portable-shell.pl
index d5823f71d8..94a7e6165e 100755
--- a/t/check-non-portable-shell.pl
+++ b/t/check-non-portable-shell.pl
@@ -35,6 +35,7 @@ sub err {
chomp;
}
 
+   /\bhead\s+-c\b/ and err 'head -c is not portable (use test_copy_bytes 
BYTES out)';
/\bsed\s+-i/ and err 'sed -i is not portable';
/\becho\s+-[neE]/ and err 'echo with option is not portable (use 
printf)';
/^\s*declare\s+/ and err 'arrays/declare not portable';
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index 557bd0d0c0..7bff7923f2 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -335,7 +335,7 @@ test_expect_success 'truncated bitmap fails gracefully' '
git rev-list --use-bitmap-index --count --all >expect &&
bitmap=$(ls .git/objects/pack/*.bitmap) &&
test_when_finished "rm -f $bitmap" &&
-   head -c 512 <$bitmap >$bitmap.tmp &&
+   test_copy_bytes 512 <$bitmap >$bitmap.tmp &&
mv -f $bitmap.tmp $bitmap &&
git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
test_cmp expect actual &&
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 8bb0f4348e..44288cbb59 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -867,7 +867,7 @@ then
# handle only executables, unless they are shell libraries that
# need to be in the exec-path.
test -x "$1" ||
-   test "# " = "$(head -c 2 <"$1")" ||
+   test "# " = "$(test_copy_bytes 2 <"$1")" ||
return;
 
base=$(basename "$1")
@@ -882,7 +882,7 @@ then
# do not override scripts
if test -x "$symlink_target" &&
test ! -d "$symlink_target" &&
-   test "#!" != "$(head -c 2 < "$symlink_target")"
+   test "#!" != "$(test_copy_bytes 2 <"$symlink_target")"
then
symlink_target=../valgrind.sh
fi
-- 
2.18.0.865.gffc8e1a3cd6

how to output absolute paths in git diff?

2018-08-23 Thread Timothee Cour

This has all the context:
https://stackoverflow.com/questions/22698505/how-to-show-full-paths-in-git-diff

I'd like `--show-abs-path` to show absolute paths in:
git diff --show-abs-path args...

eg:
git diff --no-index `get_file1` `get_file2`
could show:
--- a/Users/timothee/temp/ripgrep/help0.txt
+++ b/help1.txt

* passing '--dst-prefix=$PWD' and '--src-prefix=$PWD' doesn't help
because path arguments could be absolute, so it'll create
$PWD/Users/timothee/temp/ripgrep/help0.txt (wrong)

* passing '--dst-prefix=.' will behave weirdly, replacing leading `/`
by `.` (seems wrong)
diff --git .Users/timothee/temp/ripgrep/help0.txt b/help1.txt

NOTE: I'm invoking the `git diff` command via a more complicated case
(with multiple arguments including git diff flags and git diff files),
so it's awkward for me to parse which arguments correspond to a file
vs a flag (ie prevents easily converting input file arguments to
absolute paths), but `git` could do it easily via a flag, eg
`--show-abs-path`

Re: [PATCH] range-diff: update stale summary of --no-dual-color