[PATCH] gitk: fix MacOS 10.14 "Mojave" crash on launch

2018-07-23 Thread Eric Sunshine
On MacOS, a "wish" application started from the terminal opens in the
background, thus doesn't match user expectation that a newly-launched
application ought to be placed in the foreground. To address this
shortcoming, both gitk and git-gui use Apple Events to send a message to
"System Events" instructing it to foreground the "wish" application by
PID.

Unfortunately, MacOS 10.14 tightens restrictions on Apple Events,
requiring explicit granting of permission to control applications in
this fashion, and apparently such granting for "Automation" is not
allowed at all[1]. As a consequence gitk crashes outright at launch time
with a "Not authorized to send Apple events to System Events" error,
thus is entirely unusable on "Mojave".

In contrast, git-gui does not crash since it deliberately[2] catches and
ignores Apple Events errors. This does mean that git-gui will not
automatically become the foreground application on "Mojave", which is a
minor inconvenience but far better than crashing outright as gitk does.

Update gitk to catch and ignore Apple Events errors, mirroring git-gui's
behavior, to avoid this crash.

(Finding and implementing an alternate approach to foregrounding the
"wish" application on "Mojave" may be desirable but is outside the scope
of this crash fix.)

[1]: https://public-inbox.org/git/d295145e-7596-4409-9681-d8adbb9eb...@me.com/
[2]: 
https://public-inbox.org/git/CABNJ2G+h3zh+=wLA0KHjUn8TsfhqUK1Kn-1_=6hnxvrjuph...@mail.gmail.com/

Reported-by: Evgeny Cherpak 
Signed-off-by: Eric Sunshine 
---

Lack of 'catch' in the 'osascript' invocation in gitk was recognized as
a potential problem as far back as June 2013 [3,4]; now it's biting
"Mojave" users, making gitk unusable.

[3]: https://public-inbox.org/git/7vk3m7yukc@alter.siamese.dyndns.org/
[4]: 
https://public-inbox.org/git/1l424u5.uk987q18u3oxfm%25li...@haller-berlin.de/

 gitk-git/gitk | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/gitk-git/gitk b/gitk-git/gitk
index a14d7a16b2..f13d1807bb 100755
--- a/gitk-git/gitk
+++ b/gitk-git/gitk
@@ -12210,11 +12210,13 @@ if {[catch {package require Tk 8.4} err]} {
 
 # on OSX bring the current Wish process window to front
 if {[tk windowingsystem] eq "aqua"} {
-exec osascript -e [format {
-tell application "System Events"
-set frontmost of processes whose unix id is %d to true
-end tell
-} [pid] ]
+catch {
+   exec osascript -e [format {
+   tell application "System Events"
+   set frontmost of processes whose unix id is %d to true
+   end tell
+   } [pid] ]
+}
 }
 
 # Unset GIT_TRACE var if set
-- 
2.18.0.345.g5c9ce644c3



Re: [PATCH v2] Makefile: add a DEVOPTS flag to get pedantic compilation

2018-07-23 Thread Beat Bolli
On 23.07.18 20:53, Junio C Hamano wrote:
> Beat Bolli  writes:
> 
>> In the interest of code hygiene, make it easier to compile Git with the
>> flag -pedantic.
>>
>> Pure pedantic compilation with GCC 7.3 results in one warning per use of
>> the translation macro `N_`:
>>
>> warning: array initialized from parenthesized string constant 
>> [-Wpedantic]
>>
>> Therefore also disable the parenthesising of i18n strings with
>> -DUSE_PARENS_AROUND_GETTEXT_N=no.
>>
>> Signed-off-by: Beat Bolli 
>> ---
>>
>> This is the convenience knob for all developers that led to the series
>> bb/pedantic[1]. It does not depend on this series, though.
> 
> Yup, but "make DEVELOPER=Yes" build won't pass unless this patch is
> queued after those clean-up ;-)

Then there's a bug in this patch. It should only have an effect if we
"make DEVELOPER=Yes DEVOPTS=pedantic". Did you try this?

> Remind me if I forget to tweak =no back to =0 before pushing the
> result out.

No problem, I can send a v3 with this change reverted.

Beat

> 
> Thanks.
> 
>> [1] 
>> https://public-inbox.org/git/20180708144342.11922-1-dev+...@drbeat.li/T/#u
>>
>>  Makefile   | 6 ++
>>  config.mak.dev | 5 +
>>  2 files changed, 11 insertions(+)
>>
>> diff --git a/Makefile b/Makefile
>> index 0cb6590f24..2bfc051652 100644
>> --- a/Makefile
>> +++ b/Makefile
>> @@ -484,6 +484,12 @@ all::
>>  #The DEVELOPER mode enables -Wextra with a few exceptions. By
>>  #setting this flag the exceptions are removed, and all of
>>  #-Wextra is used.
>> +#
>> +#pedantic:
>> +#
>> +#Enable -pedantic compilation. This also disables
>> +#USE_PARENS_AROUND_GETTEXT_N to produce only relevant warnings.
>>  
>>  GIT-VERSION-FILE: FORCE
>>  @$(SHELL_PATH) ./GIT-VERSION-GEN
>> diff --git a/config.mak.dev b/config.mak.dev
>> index 2d244ca470..e11dd94741 100644
>> --- a/config.mak.dev
>> +++ b/config.mak.dev
>> @@ -1,6 +1,11 @@
>>  ifeq ($(filter no-error,$(DEVOPTS)),)
>>  CFLAGS += -Werror
>>  endif
>> +ifneq ($(filter pedantic,$(DEVOPTS)),)
>> +CFLAGS += -pedantic
>> +# don't warn for each N_ use
>> +CFLAGS += -DUSE_PARENS_AROUND_GETTEXT_N=no
>> +endif
>>  CFLAGS += -Wdeclaration-after-statement
>>  CFLAGS += -Wno-format-zero-length
>>  CFLAGS += -Wold-style-definition



Re: [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc)

2018-07-23 Thread Junio C Hamano
Ben Peart  writes:

>> Hmm.. this means cache-tree is fully valid, unless you have changes in
>> index. We're quite aggressive in repairing cache-tree since aecf567cbf
>> (cache-tree: create/update cache-tree on checkout - 2014-07-05). If we
>> have very good cache-tree records and still spend 33s on
>> traverse_trees, maybe there's something else.
>>
>
> I'm not at all familiar with the cache-tree and couldn't find any
> documentation on it other than index-format.txt which says "it helps
> speed up tree object generation for a new commit."  In this particular
> case, no new commit is being created so I don't know that the
> cache-tree would help.

cache-tree is an index extension that records tree object names for
subdirectories you see in the index.  Every time you write the
contents of the index as a tree object, we need to collect the
object name for each top-level paths and write a new top-level tree
object out, after doing the same recursively for any modified
subdirectory.  Whenever you add, remove or modify a path in the
index, the cache-tree entry for enclosing directories are
invalidated, so a cache-tree entry that is still valid means that
all the paths in the index under that directory match the contents
of the tree object that the cache-tree entry holds.

And that property is used by "diff-index --cached $TREE" that is run
internally.  When we find that the subdirectory "D"'s cache-tree
entry is valid in the index, and the tree object recorded in the
cache-tree for that subdirectory matches the subtree D in the tree
object $TREE, then "diff-index --cached" ignores the entire
subdirectory D (which saves relatively little in the index as it
only needs to scan what is already in the memory forward, but on the
$TREE traversal side, it does not have to even open a subtree, that
can save a lot), and with a well-populated cache-tree, it can save a
significant processing.

I think that is what Duy meant to refer to while looking at the
numbers.

> After a quick look at the code, the only place I can find that tries
> to use cache_tree_matches_traversal() is in unpack_callback() and that
> only happens if n == 1 and in the "git checkout" case, n == 2. Am I
> missing something?


Re: [PATCH] pack-protocol: mention and point to docs for protocol v2

2018-07-23 Thread Jonathan Nieder
Hi,

Brandon Williams wrote:

> --- a/Documentation/technical/pack-protocol.txt
> +++ b/Documentation/technical/pack-protocol.txt
> @@ -50,7 +50,8 @@ Each Extra Parameter takes the form of `=` or 
> ``.
>  
>  Servers that receive any such Extra Parameters MUST ignore all
>  unrecognized keys. Currently, the only Extra Parameter recognized is
> -"version=1".
> +"version" with a vlue of '1' or '2'.  See protocol-v2.txt for more

value?

> +information on protocol version 2.

Thanks.  Some thoughts on other parts of this document that may need
updating:

- the whole document assumes that 0 and 1 are the only protocol
  versions.  E.g. the discussion of the version number line in the
  response when "version=1" is sent as an Extra Paramter should probably
  apply to version 2, too.

- because the document was written before protocol v2, it describes the
  more complicated v1 that many readers shouldn't have to care about

- there is no one document that describes v2 in a self contained way,
  since protocol-v2.txt makes reference to protocol v1.

- the description of pkt-line format in protocol-common.txt is missing
  a discussion of delim-pkt.

Not about this patch, but I wonder if an organization along the
following lines would make sense?

 1. Rename pack-protocol.txt to protocol-v1.txt.  Rename
protocol-v2.txt to pack-protocol.txt.

 2. Make pack-protocol.txt self-contained, and remove any redundant
sections from protocol-v1.txt.

 3. Add a new protocol-v2.txt that briefly describes the benefits and
highlights of protocol v2, referring to pack-protocol.txt for
details.

That way, newcomers of the future could read pack-protocol.txt and
quickly glean the main protocol in (then) current use.

What do you think?

Thanks,
Jonathan


Re: [PATCH] Documentation/diff-options: explain different diff algorithms

2018-07-23 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> As a user I wondered what the diff algorithms are about. Offer at least
> a basic explanation on the differences of the diff algorithms.

Interesting.  Let's see.

[...]
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -94,16 +94,34 @@ diff" algorithm internally.
>   Choose a diff algorithm. The variants are as follows:
>  +
>  --
> -`default`, `myers`;;
> - The basic greedy diff algorithm. Currently, this is the default.
>  `minimal`;;
> + A diff as produced by the basic greedy algorithm described in

Why this reordering?

> + link:http://www.xmailserver.org/diff2.pdf[An O(ND) Difference Algorithm 
> and its Variations]

This URL doesn't seem likely to stay stable.  Are appearances
deceiving?  (Maybe so, since that's the same host as libxdiff's
homepage .)  It might be
worth mentioning the author and date just in case.

> - Spend extra time to make sure the smallest possible diff is
> - produced.
> +`default`, `myers`;;
> + The same algorithm as `minimal`, extended with a heuristic to
> + reduce extensive searches. Currently, this is the default.

Amusing --- this means that the Myers diff is spelled as "minimal"
instead of "myers".

>  `patience`;;
> - Use "patience diff" algorithm when generating patches.
> + Use "patience diff" algorithm when generating patches. This
> + matches the longest common subsequence of unique lines on
> + both sides, recursively. It obtained its name by the way the
> + longest subsequence is found, as that is a byproduct of the
> + patience sorting algorithm. If there are no unique lines left
> + it falls back to `myers`. Empirically this algorithm produces
> + a more readable output for code, but it does not garantuee
> + the shortest output.

Probably also worth mentioning that this was invented by Bram Cohen
for the bazaar vcs.

>  `histogram`;;
> - This algorithm extends the patience algorithm to "support
> - low-occurrence common elements".
> + This algorithm re-implements the `patience` algorithm with

What does reimplements mean here?  Do you mean that it is a variation
on patience?

> + "support of low-occurrence common elements" and only picks
> + one element of the LCS for the recursion. It is often the

Does LCS mean longest common subsequence or longest common substring
here?  Probably best to spell it out to avoid confusion.

> + fastest, but in cornercases (when there are many longest

s/cornercases/corner cases/

> + common subsequences of the same length) it produces bad
> + results as seen in:
> +
> + seq 1 100 >one
> + echo 99 > two
> + seq 1 2 98 >>two
> + git diff --no-index --histogram one two

I wonder if these details (about all the algorithms) should go in a
separate Diff Algorithms section and this section could focus on
what use cases warrant using each, referring to that section for
details.  What do you think?

Thanks and hope that helps,
Jonathan


Re: [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc)

2018-07-23 Thread Jeff King
On Mon, Jul 23, 2018 at 07:03:16PM +0200, Duy Nguyen wrote:

> > > Try bumping core.deltaBaseCacheLimit to see if that has any impact. It's
> > > 96MB by default.
> > >
> > > There may also be some possible work in making it more aggressive about
> > > storing the intermediate results. I seem to recall from past
> > > explorations that it doesn't keep everything, and I don't know if its
> > > heuristics have ever been proven sane.
> 
> Could we be a bit more flexible about cache size? Say if we know
> there's 8 GB memory still available, we should be able to use like 1
> GB at least (and that's done automatically without tinkering with
> config).

I have mixed feelings on that kind of auto-scaling for caches. Git isn't
always the only program running (or maybe you even have several git
operations running at once). So in many cases you'd want a more holistic
view of the system, and what resources are available.

The OS already does OK scheduling CPU and the block cache for our mmap'd
files. I don't know if there's a way to communicate with it about this
kind of cache. I guess asking "what memory is free" is one way to do
that. But it's not always the best answer (because we might be happy
trade off some block cache, etc). On the other hand, that would always
give us a conservative value, so if we picked min(96MB, free_mem /
nr_cpu) or something, that might be an OK rule of thumb.

-Peff


Re: [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc)

2018-07-23 Thread Jeff King
On Mon, Jul 23, 2018 at 04:51:38PM -0400, Ben Peart wrote:

> > Hmm.. this means cache-tree is fully valid, unless you have changes in
> > index. We're quite aggressive in repairing cache-tree since aecf567cbf
> > (cache-tree: create/update cache-tree on checkout - 2014-07-05). If we
> > have very good cache-tree records and still spend 33s on
> > traverse_trees, maybe there's something else.
> 
> I'm not at all familiar with the cache-tree and couldn't find any
> documentation on it other than index-format.txt which says "it helps speed
> up tree object generation for a new commit."  In this particular case, no
> new commit is being created so I don't know that the cache-tree would help.

It's basically an index extension that mirrors the tree structure within
the index, telling you the sha1 of the three that _would_ be generated
from any particular path. So any time you're walking a tree alongside
the index, in theory you should be able to say "the cache-tree for this
subset of the index matches the tree" and skip over a bunch of entries.

At least that's my view of it. unpack_trees() has always been a
terrifying beast that I've avoided looking too closely at.

> After a quick look at the code, the only place I can find that tries to use
> cache_tree_matches_traversal() is in unpack_callback() and that only happens
> if n == 1 and in the "git checkout" case, n == 2. Am I missing something?

Looks like it's trying to special-case "diff-index --cached". Which
kind-of makes sense. In the non-cached case, we're thinking not only
about the relationship between the index and the tree, but also whether
the on-disk files are up to date.

And that would be the same for checkout. We want to know not only
whether there are changes to make to the index, but also whether the
on-disk files need to be updated from the index.

But I assume in your case that we've just refreshed the index quickly
using fsmonitor. So I think in the long run what you want is:

  1. fsmonitor tells us which index entries are not clean

  2. based on the unclean list, we invalidate cache-tree entries for
 those paths

  3. if we have a valid cache-tree entry, we should be able to skip
 digging into that tree; if not, then we walk the index and tree as
 normal, adding/deleting index entries and updating (or complaining
 about) modified on-disk files

I think the "n" adds an extra layer of complexity. n==2 means we're
doing a "2-way" merge. Moving from tree X to tree Y, and dealing with
the index as we go. Naively I _think_ we'd be OK to just extend the rule
to "if both subtrees match each other _and_ match the valid cache-tree,
then we can skip".

Again, I'm a little out of my area of expertise here, but cargo-culting
like this:

diff --git a/sha1-file.c b/sha1-file.c
index de4839e634..c105af70ce 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -1375,6 +1375,7 @@ static void *read_object(const unsigned char *sha1, enum 
object_type *type,
 
if (oid_object_info_extended(the_repository, &oid, &oi, 0) < 0)
return NULL;
+   trace_printf("reading %s %s", type_name(*type), sha1_to_hex(sha1));
return content;
 }
 
diff --git a/unpack-trees.c b/unpack-trees.c
index 66741130ae..cfdad4133d 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1075,6 +1075,23 @@ static int unpack_callback(int n, unsigned long mask, 
unsigned long dirmask, str
o->cache_bottom += matches;
return mask;
}
+   } else if (n == 2 && S_ISDIR(names->mode) &&
+  names[0].mode == names[1].mode &&
+  !strcmp(names[0].path, names[1].path) &&
+  !oidcmp(names[0].oid, names[1].oid)
+  /* && somehow account for modified on-disk files */) 
{
+   int matches;
+
+   /*
+* we know that the two trees have the same oid, so we
+* only need to look at one of them
+*/
+   matches = 
cache_tree_matches_traversal(o->src_index->cache_tree,
+  names, info);
+   if (matches) {
+   o->cache_bottom += matches;
+   return mask;
+   }
}
 
if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,

seems to avoid the tree reads when running "GIT_TRACE=1 git checkout".
It also totally empties the index. ;) So clearly we have to do a bit
more there. Probably rather than just bumping o->cache_bottom forward,
we'd need to actually move those entries into the new index. Or maybe
it's something else entirely (I did say cargo-culting, right?).

-Peff


Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems

2018-07-23 Thread Junio C Hamano
Jeff King  writes:

> If I understand the situation correctly, Junio is saying that he will
> continue to produce the amlog mapping, and that it contains sufficient
> information to produce the reverse mapping (which, as an aside, I did
> not even know existed -- I mostly want to go the other way, from digging
> in history to a mailing list conversation).

Yes, the reverse mapping in amlog was an experiment that did not
work well in the end.

When I use "git am" to make a commit out of a message, a
post-applypatch hook picks up the "Message-Id:" from the original
message and adds a git note to the resulting commit.  This is in
line with how the notes are meant to be used.  We have a commit
object, and a piece of information that we want to associate with
the commit object, which is not recorded as a part of the commit
object.  So we say "git notes add -m 'that piece of info' $commit"
(the message-id happens to be that piece of info in this example).

And with notes.rewriteRef, "git commit --amend" etc. would copy the
piece of info about the original commit to the rewritten commit.

Side Note: there are a few workflow elements I do want to
keep using but they currently *lose* the mapping info.  An
obvious one is

  $ git checkout -b to/pic master &&
  ... review in MUA and then ...
  $ git am -s mbox &&
  ... review in tree, attempt to build, tweak, etc.
  $ git format-patch --stdout master..to/pic >P &&
  $ edit P &&
  $ git reset --hard master &&
  $ git am P

which is far more versatile and efficient when doing certain
transformations on the series than running "rebase -i" and
reopening and editing the target files of the patches one by
one in each step.  But because format-patch does not
generate Message-Id header of the original one out of the
commit, the post-applypatch hook run by "am" at the end of
the steps would not have a chance to record that for the
newly created commit.

For this one, I think I can use "format-patch --notes=amlog"
to produce the patch file and then teach post-applypatch
script to pay attention to the Notes annotation without
changing anything else to record the message id of the
original.  Other workflow elements that lose the notes need
to be identified and either a fix implemented or a
workaround found for each of them.  For example, I suspect
there is no workaround for "cherry-pick" and it would take a
real fix.

A reverse mapping entry used to get created by post-applypatch to
map the blob that represents the notes text added to the $commit to
another text blob that contains the 40-hex of the commit object.
This is the experiment that did not work well.  As none of the later
integrator's work e.g. "commit --amend", "rebase", "cherry-pick",
etc. is about rewriting that blob, notes.rewriteRef mechanism would
not kick in, and that is understandasble.

And these (incomplete) reverse mapping entries get in the way to
maintain and correct the forward mapping.  When a commit that got
unreachable gets expired, I want "git notes prune" to remove notes
on them, and I do not want to even think about what should happen to
the entries in the notes tree that abuse the mechanism to map blobs
that are otherwise *not* even reachable from the main history.

A much more important task is to make sure that the forward mapping
that annotates invidual commits reachable from 'pu' and/or 'master' 
is maintained correctly by various tools.  From a correctly maintained
forward mapping, it should be straight forward to get a reverse mapping
if needed.

> Though personally, I do not know if there is much point in pushing it
> out, given that receivers can reverse the mapping themselves.

Before this thread, I was planning to construct and publish the
reverse mapping at the end of the day, but do so on a separate notes
ref (see above---the hacky abuse gets in the way of maintaining and
debugging the forward mapping, but a separate notes-ref that only
contains hacks is less worrysome).  But I have changed my mind and
decided not to generate or publish one.  It is sort of similar to
the way the pack .idx is constructed only by the receiver [*1*].

> Or is there some argument that there is information in the reverse map
> that _cannot_ be generated from the forward map?

I know there is no information loss (after all I was the only one
who ran that experimental hack), but there is one objection that is
still possible, even though I admit that is a weak argument.

If a plumbing "diff-{files,tree,index}" family had a sibling
"diff-notes" to compare two notes-shaped trees while pretending that
the object-name fan-out did not exist (i.e. instead, the trees being
compared is without a subtree and full of 40-hex filenames), then it
would be less cumbersome to incrementally update the revers

Re: [PATCH v4 16/21] range-diff --dual-color: fix bogus white-space warning

2018-07-23 Thread Junio C Hamano
Junio C Hamano  writes:

> It is pleasing to see that with a surprisingly clean and small
> change like this we can exempt the initial space byte from
> SP-before-HT check and from Indent-with-non-tab at the same time.
>
> Very nice.
>
> One reason why a surprisingly small special case is required is
> perhaps because we are blessed with the original code being clean
> [*1*], and the fact that a line[0] that is not ' ' will not trigger
> any indentation related whitespace errors without this special case,
> I guess.

Having said good things about the patch, I unfortunately realized
that we weren't that lucky.  As we do want to see whitespace errors
on lines with line[0] != ' ' to be flagged.  So "... will not
trigger" in the above is not a blessing, but something that further
needs to be fixed.  The special case should also be made for line[0]
that is '+' and possibly '-' (and I also suspect that the changes in
this patch may mostly be reusable with little tweaks if any).

Imagine we start from this commit that "git show" shows us like so:

 int main(int ac, char **av)
 {
   printf("Hello");
+  putchar(',');
+  putchar(' ');
   printf("World\n");
   return 0;
 }

I've drawn a horizontal-tab as long underscore to clarify in the
above picture.  If you have "core.whitespace=indent-with-non-tab"
"git show" would paint the line that adds " " as violating (as it
types 10 SPs, when it could have been a tab and 2 SPs), but it does
not highlight the line with "World" or "return", which is sensible
as they are pre-existing violations.

Then imagine we did "git commit --amend" and "git show" would give
this instead:

 int main(int ac, char **av)
 {
   printf("Hello");
+  putchar(',');
+  putchar(' ');
   printf("World\n");
   return 0;
 }

That is, relative to the previous attempt, we stopped introducing
new indent-with-non-tab violation to the line that adds " ", but
added a new violation to the line that adds ",".

After such "git commit --amend", what do we want to see in the
output from "git range-diff @{1}..."?

My quick test of the current code does not show any whitespace
breakage for either versions.  I *think* what we want to actually
see is

 - just like WS_IGNORE_FIRST_SPACE logic shifted the column by
   incrementing written (and final condition) for the loop in your
   patch for line[0]==' ', detect the overlong run of SP correctly
   by ignoring the first column that is '+', and complaining that
   the new commit is typing 10 SPs before putchar(',').

 - Earlier I thought that lines with '-' in the outer diff should
   become exempt from whitespace-error highlighting, but I think
   that is a mistake.  If a line in diff-of-diff that begins with
   "-+" has whitespace violation (e.g "-+" followed by 10 SPs), that
   is "the old version of the patch used to introduce whitespace
   violation", which is a useful piece of information when we look
   at the corresponding line in the same diff-of-diff that begins
   with "++".  We could either say "and you cleaned that mistake up
   in your newer patch", or "and you still have that mistake in your
   newer patch" (another possibility is there is no whitespace error
   on a "-+" line, and the corresponding "++" line has one---"you
   got it right in the previous round, but you somehow made it
   worse").

Here is the reproduction of the sample data I based on the above
thought experiment on.

diff --git a/script.sh b/script.sh
new file mode 100755
index 000..0090661
--- /dev/null
+++ b/script.sh
@@ -0,0 +1,48 @@
+#!/bin/sh
+
+git init
+git config core.whitespace indent-with-non-tab
+
+tr _ '\011' <<\EOF >hello.c
+int main(int ac, char **av)
+{
+_  printf("Hello");
+  printf("World\n");
+  return 0;
+}
+EOF
+
+git add hello.c && git commit -m initial
+
+tr _ '\011' <<\EOF >hello.c
+int main(int ac, char **av)
+{
+_  printf("Hello");
+  putchar(',');
+_  putchar(' ');
+  printf("World\n");
+  return 0;
+}
+EOF
+
+git commit -a -m second
+
+tr _ '\011' <<\EOF >hello.c
+int main(int ac, char **av)
+{
+_  printf("Hello");
+_  putchar(',');
+  putchar(' ');
+  printf("World\n");
+  return 0;
+}
+EOF
+
+git commit -a --amend -m third
+
+
+git show @{1}
+git show HEAD
+
+git range-diff ..@{1} @{1}..
+
-- 
2.18.0-232-gb7bd9486b0





[PATCH] Documentation/diff-options: explain different diff algorithms

2018-07-23 Thread Stefan Beller
As a user I wondered what the diff algorithms are about. Offer at least
a basic explanation on the differences of the diff algorithms.

Signed-off-by: Stefan Beller 
---
 Documentation/diff-options.txt | 32 +---
 1 file changed, 25 insertions(+), 7 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index f466600972f..0d765482027 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -94,16 +94,34 @@ diff" algorithm internally.
Choose a diff algorithm. The variants are as follows:
 +
 --
-`default`, `myers`;;
-   The basic greedy diff algorithm. Currently, this is the default.
 `minimal`;;
-   Spend extra time to make sure the smallest possible diff is
-   produced.
+   A diff as produced by the basic greedy algorithm described in
+   link:http://www.xmailserver.org/diff2.pdf[An O(ND) Difference Algorithm 
and its Variations]
+`default`, `myers`;;
+   The same algorithm as `minimal`, extended with a heuristic to
+   reduce extensive searches. Currently, this is the default.
 `patience`;;
-   Use "patience diff" algorithm when generating patches.
+   Use "patience diff" algorithm when generating patches. This
+   matches the longest common subsequence of unique lines on
+   both sides, recursively. It obtained its name by the way the
+   longest subsequence is found, as that is a byproduct of the
+   patience sorting algorithm. If there are no unique lines left
+   it falls back to `myers`. Empirically this algorithm produces
+   a more readable output for code, but it does not garantuee
+   the shortest output.
 `histogram`;;
-   This algorithm extends the patience algorithm to "support
-   low-occurrence common elements".
+   This algorithm re-implements the `patience` algorithm with
+   "support of low-occurrence common elements" and only picks
+   one element of the LCS for the recursion. It is often the
+   fastest, but in cornercases (when there are many longest
+   common subsequences of the same length) it produces bad
+   results as seen in:
+
+   seq 1 100 >one
+   echo 99 > two
+   seq 1 2 98 >>two
+   git diff --no-index --histogram one two
+
 --
 +
 For instance, if you configured diff.algorithm variable to a
-- 
2.18.0.345.g5c9ce644c3-goog



Re: [PATCH 07/14] range-diff: respect diff_option.file rather than assuming 'stdout'

2018-07-23 Thread Eric Sunshine
On Mon, Jul 23, 2018 at 6:59 PM Stefan Beller  wrote:
> On Sun, Jul 22, 2018 at 2:58 AM Eric Sunshine  wrote:
> > The actual diffs output by range-diff respect diff_option.file, which
> > range-diff passes down the call-chain, thus are destination-agnostic.
> > However, output_pair_header() is hard-coded to emit to 'stdout'. Fix
> > this by making output_pair_header() respect diff_option.file, as well.
> >
> > Signed-off-by: Eric Sunshine 
>
> Depending how much the range diff series has progressed
> already, it might make sense to squash it there?

It could be done that way, though it would be nice for Dscho's
range-diff series to land without another re-roll, and it looks like
I'll probably be re-rolling this series, anyhow, to move
show_interdiff() to diff-lib.c and to fix its signature. Junio could
manage it as a "squash", but that's probably more work for him than
for me to retain it in this series. *And*, this change _is_ required
for this series, whereas it doesn't really matter for Dscho's series,
even though it's an obvious "fix".

Anyhow, whichever way Junio and Dscho would like to play it is fine with me.


Re: [PATCH] fetch-pack: mark die strings for translation

2018-07-23 Thread Brandon Williams
On 07/23, Junio C Hamano wrote:
> Junio C Hamano  writes:
> 
> > Brandon Williams  writes:
> >
> >> Signed-off-by: Brandon Williams 
> >> ---
> >>  fetch-pack.c | 16 
> >>  1 file changed, 8 insertions(+), 8 deletions(-)
> >
> > I do think this is a good idea, but what release does this target?
> > It does not seem to apply to master@{4.hours.ago}, and I do not
> > think a change as trivial like this wants to be taken of a hostage
> > of _all_ topics that are in 'next'.
> >
> 
> Answering myself, this would come on top of (and would become part
> of) bw/ref-in-want topic, I would think.  And I am perfectly OK for
> this change to be hostage of that topic ;-)

haha yes sorry for not mention it, but yes I built it on top of
bw/ref-in-want because it was in response to this topic :)
-- 
Brandon Williams


Re: [PATCH 07/14] range-diff: respect diff_option.file rather than assuming 'stdout'

2018-07-23 Thread Stefan Beller
On Sun, Jul 22, 2018 at 2:58 AM Eric Sunshine  wrote:
>
> The actual diffs output by range-diff respect diff_option.file, which
> range-diff passes down the call-chain, thus are destination-agnostic.
> However, output_pair_header() is hard-coded to emit to 'stdout'. Fix
> this by making output_pair_header() respect diff_option.file, as well.
>
> Signed-off-by: Eric Sunshine 

Depending how much the range diff series has progressed
already, it might make sense to squash it there?

Thanks,
Stefan

> ---
>  range-diff.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/range-diff.c b/range-diff.c
> index 347b4a79f2..76e053c2b2 100644
> --- a/range-diff.c
> +++ b/range-diff.c
> @@ -328,7 +328,7 @@ static void output_pair_header(struct diff_options 
> *diffopt,
> }
> strbuf_addf(buf, "%s\n", color_reset);
>
> -   fwrite(buf->buf, buf->len, 1, stdout);
> +   fwrite(buf->buf, buf->len, 1, diffopt->file);
>  }
>
>  static struct userdiff_driver no_func_name = {
> --
> 2.18.0.345.g5c9ce644c3
>


Re: [PATCH] fetch-pack: mark die strings for translation

2018-07-23 Thread Junio C Hamano
Junio C Hamano  writes:

> Brandon Williams  writes:
>
>> Signed-off-by: Brandon Williams 
>> ---
>>  fetch-pack.c | 16 
>>  1 file changed, 8 insertions(+), 8 deletions(-)
>
> I do think this is a good idea, but what release does this target?
> It does not seem to apply to master@{4.hours.ago}, and I do not
> think a change as trivial like this wants to be taken of a hostage
> of _all_ topics that are in 'next'.
>

Answering myself, this would come on top of (and would become part
of) bw/ref-in-want topic, I would think.  And I am perfectly OK for
this change to be hostage of that topic ;-)


Re: [PATCH] fetch-pack: mark die strings for translation

2018-07-23 Thread Junio C Hamano
Brandon Williams  writes:

> Signed-off-by: Brandon Williams 
> ---
>  fetch-pack.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)

I do think this is a good idea, but what release does this target?
It does not seem to apply to master@{4.hours.ago}, and I do not
think a change as trivial like this wants to be taken of a hostage
of _all_ topics that are in 'next'.


> diff --git a/fetch-pack.c b/fetch-pack.c
> index 0b4a9f288f..51abee6181 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -1245,13 +1245,13 @@ static int process_section_header(struct 
> packet_reader *reader,
>   int ret;
>  
>   if (packet_reader_peek(reader) != PACKET_READ_NORMAL)
> - die("error reading section header '%s'", section);
> + die(_("error reading section header '%s'"), section);
>  
>   ret = !strcmp(reader->line, section);
>  
>   if (!peek) {
>   if (!ret)
> - die("expected '%s', received '%s'",
> + die(_("expected '%s', received '%s'"),
>   section, reader->line);
>   packet_reader_read(reader);
>   }
> @@ -1289,12 +1289,12 @@ static int process_acks(struct packet_reader *reader, 
> struct oidset *common)
>   continue;
>   }
>  
> - die("unexpected acknowledgment line: '%s'", reader->line);
> + die(_("unexpected acknowledgment line: '%s'"), reader->line);
>   }
>  
>   if (reader->status != PACKET_READ_FLUSH &&
>   reader->status != PACKET_READ_DELIM)
> - die("error processing acks: %d", reader->status);
> + die(_("error processing acks: %d"), reader->status);
>  
>   /* return 0 if no common, 1 if there are common, or 2 if ready */
>   return received_ready ? 2 : (received_ack ? 1 : 0);
> @@ -1331,7 +1331,7 @@ static void receive_shallow_info(struct fetch_pack_args 
> *args,
>  
>   if (reader->status != PACKET_READ_FLUSH &&
>   reader->status != PACKET_READ_DELIM)
> - die("error processing shallow info: %d", reader->status);
> + die(_("error processing shallow info: %d"), reader->status);
>  
>   setup_alternate_shallow(&shallow_lock, &alternate_shallow_file, NULL);
>   args->deepen = 1;
> @@ -1346,7 +1346,7 @@ static void receive_wanted_refs(struct packet_reader 
> *reader, struct ref *refs)
>   struct ref *r = NULL;
>  
>   if (parse_oid_hex(reader->line, &oid, &end) || *end++ != ' ')
> - die("expected wanted-ref, got '%s'", reader->line);
> + die(_("expected wanted-ref, got '%s'"), reader->line);
>  
>   for (r = refs; r; r = r->next) {
>   if (!strcmp(end, r->name)) {
> @@ -1356,11 +1356,11 @@ static void receive_wanted_refs(struct packet_reader 
> *reader, struct ref *refs)
>   }
>  
>   if (!r)
> - die("unexpected wanted-ref: '%s'", reader->line);
> + die(_("unexpected wanted-ref: '%s'"), reader->line);
>   }
>  
>   if (reader->status != PACKET_READ_DELIM)
> - die("error processing wanted refs: %d", reader->status);
> + die(_("error processing wanted refs: %d"), reader->status);
>  }
>  
>  enum fetch_state {


Re: [PATCH v4 14/21] diff: add an internal option to dual-color diffs of diffs

2018-07-23 Thread Stefan Beller
> > - fputs(diff_line_prefix(o), file);
> > + if (first)
> > + fputs(diff_line_prefix(o), file);
> > + else if (!len)
> > + return;
>
> Can you explain this hunk in the log message?  I am not sure how the
> description in the log message relates to this change.  Is the idea
> of this change essentially "all the existing callers that aren't
> doing the diff-of-diffs send a non-NUL first character, and for them
> this change is a no-op.  New callers share most of the remainder of
> emit_line_0() logic but do not want to show the prefix, so the
> support for it is piggy-backing by a special case where first could
> be NUL"?

All but two caller have 'reverse' set to 0, using the arguments as before.

The other two callers are using the function twice to get the prefix
and set sign going, and then the second call to get the rest of the
line going (which needs to omit the prefix as that was done in the
first call) :

+   /* Emit just the prefix, then the rest. */
+   emit_line_0(o, set_sign ? set_sign : set, !!set_sign, reset,
+   sign, "", 0);
+   emit_line_0(o, set, 0, reset, 0, line, len);

I attempted to clean it up on top, but likely got it wrong as we have
no tests for colored range diffs, yet.
https://public-inbox.org/git/20180710174552.30123-3-sbel...@google.com/
My suggestion would be to first clarify emit_line_0 and have its arguments
and its execution map better to each other, (and as a result only needing to
have one call of emit_line_0 instead of two)

That is my understanding of the situation.

Thanks,
Stefan


Re: [PATCH v4 16/21] range-diff --dual-color: fix bogus white-space warning

2018-07-23 Thread Junio C Hamano
"Johannes Schindelin via GitGitGadget" 
writes:

> @@ -177,8 +178,16 @@ static unsigned ws_check_emit_1(const char *line, int 
> len, unsigned ws_rule,
>   if (trailing_whitespace == -1)
>   trailing_whitespace = len;
>  
> + if ((ws_rule & WS_IGNORE_FIRST_SPACE) && len && line[0] == ' ') {
> + if (stream)
> + fwrite(line, 1, 1, stream);
> + written++;
> + if (!trailing_whitespace)
> + trailing_whitespace++;
> + }
> +
>   /* Check indentation */
> - for (i = 0; i < trailing_whitespace; i++) {
> + for (i = written; i < trailing_whitespace; i++) {

It is pleasing to see that with a surprisingly clean and small
change like this we can exempt the initial space byte from
SP-before-HT check and from Indent-with-non-tab at the same time.

Very nice.

One reason why a surprisingly small special case is required is
perhaps because we are blessed with the original code being clean
[*1*], and the fact that a line[0] that is not ' ' will not trigger
any indentation related whitespace errors without this special case,
I guess.

>   if (line[i] == ' ')
>   continue;
>   if (line[i] != '\t')


[Footnote]

*1* ws.c used to be almost all my code long time ago, but most of
the shape of the current whitespace_error checking code comes from
c1795bb08aa which is not mine, and I can say good things about it
without feeling embarrassingly boasty ;-)


Re: [PATCH v4 14/21] diff: add an internal option to dual-color diffs of diffs

2018-07-23 Thread Junio C Hamano
"Johannes Schindelin via GitGitGadget" 
writes:

> From: Johannes Schindelin 
>
> When diffing diffs, it can be quite daunting to figure out what the heck
> is going on, as there are nested +/- signs.
>
> Let's make this easier by adding a flag in diff_options that allows
> color-coding the outer diff sign with inverted colors, so that the
> preimage and postimage is colored like the diff it is.
>
> Of course, this really only makes sense when the preimage and postimage
> *are* diffs. So let's not expose this flag via a command-line option for
> now.
>
> This is a feature that was invented by git-tbdiff, and it will be used
> by `git range-diff` in the next commit, by offering it via a new option:
> `--dual-color`.
>
> Signed-off-by: Johannes Schindelin 
> ---
>  diff.c | 83 +++---
>  diff.h |  1 +
>  2 files changed, 69 insertions(+), 15 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index a94a8214f..e163bc8a3 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -563,14 +563,18 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t 
> *mf2,
>   ecbdata->blank_at_eof_in_postimage = (at - l2) + 1;
>  }
>  
> -static void emit_line_0(struct diff_options *o, const char *set, const char 
> *reset,
> +static void emit_line_0(struct diff_options *o,
> + const char *set, unsigned reverse, const char *reset,
>   int first, const char *line, int len)
>  {
>   int has_trailing_newline, has_trailing_carriage_return;
>   int nofirst;
>   FILE *file = o->file;
>  
> - fputs(diff_line_prefix(o), file);
> + if (first)
> + fputs(diff_line_prefix(o), file);
> + else if (!len)
> + return;

Can you explain this hunk in the log message?  I am not sure how the
description in the log message relates to this change.  Is the idea
of this change essentially "all the existing callers that aren't
doing the diff-of-diffs send a non-NUL first character, and for them
this change is a no-op.  New callers share most of the remainder of
emit_line_0() logic but do not want to show the prefix, so the
support for it is piggy-backing by a special case where first could
be NUL"?


Re: [PATCH v4 16/21] range-diff --dual-color: fix bogus white-space warning

2018-07-23 Thread Stefan Beller
On Sat, Jul 21, 2018 at 3:05 PM Johannes Schindelin via GitGitGadget
 wrote:
>
> From: Johannes Schindelin 
>
> When displaying a diff of diffs, it is possible that there is an outer
> `+` before a context line. That happens when the context changed between
> old and new commit. When that context line starts with a tab (after the
> space that marks it as context line), our diff machinery spits out a
> white-space error (space before tab), but in this case, that is
> incorrect.
>
> Fix this by adding a specific whitespace flag that simply ignores the
> first space in the output.

That sounds like a simple (not easy) solution, which sounds acceptable
to me here.

I guess you dropped all ideas that I originally proposed for the cleanup
regarding ws. that is fine, I can roll the cleanup on top of your patches
here.

> Of course, this flag is *really* specific to the "diff of diffs" use
> case. The original idea was to simply skip the space from the output,
> but that workaround was rejected by the Git maintainer as causing
> headaches.

By that you mean
https://public-inbox.org/git/xmqqr2kb9jk2@gitster-ct.c.googlers.com/
?

> Note: as the original code did not leave any space in the bit mask
> before the WSEH_* bits, the diff of this commit looks unnecessarily
> involved: the diff is dominated by making room for one more bit to be
> used by the whitespace rules.

It took me some minutes, but I am reasonably convinced this patch
is correct (and doesn't collide with other series in flight, sb/diff-color-more
adds another flag to move detection in another bit field at (1<<23))

Thanks for writing this patch instead of the other, though I'll leave
it to Junio to weigh in if this approach is the best design.

Stefan

>
> Signed-off-by: Johannes Schindelin 
> ---
>  cache.h |  3 ++-
>  diff.c  | 15 ---
>  diff.h  |  6 +++---
>  ws.c| 11 ++-
>  4 files changed, 23 insertions(+), 12 deletions(-)
>
> diff --git a/cache.h b/cache.h
> index 8b447652a..8abfbeb73 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1681,11 +1681,12 @@ void shift_tree_by(const struct object_id *, const 
> struct object_id *, struct ob
>  #define WS_CR_AT_EOL   01000
>  #define WS_BLANK_AT_EOF02000
>  #define WS_TAB_IN_INDENT   04000
> +#define WS_IGNORE_FIRST_SPACE 01
>  #define WS_TRAILING_SPACE  (WS_BLANK_AT_EOL|WS_BLANK_AT_EOF)
>  #define WS_DEFAULT_RULE (WS_TRAILING_SPACE|WS_SPACE_BEFORE_TAB|8)
>  #define WS_TAB_WIDTH_MASK077
>  /* All WS_* -- when extended, adapt diff.c emit_symbol */
> -#define WS_RULE_MASK   0
> +#define WS_RULE_MASK   01
>  extern unsigned whitespace_rule_cfg;
>  extern unsigned whitespace_rule(const char *);
>  extern unsigned parse_whitespace_rule(const char *);
> diff --git a/diff.c b/diff.c
> index e163bc8a3..03ed235c7 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -650,14 +650,14 @@ enum diff_symbol {
>  };
>  /*
>   * Flags for content lines:
> - * 0..12 are whitespace rules
> - * 13-15 are WSEH_NEW | WSEH_OLD | WSEH_CONTEXT
> - * 16 is marking if the line is blank at EOF
> + * 0..14 are whitespace rules
> + * 14-16 are WSEH_NEW | WSEH_OLD | WSEH_CONTEXT
> + * 17 is marking if the line is blank at EOF
>   */
> -#define DIFF_SYMBOL_CONTENT_BLANK_LINE_EOF (1<<16)
> -#define DIFF_SYMBOL_MOVED_LINE (1<<17)
> -#define DIFF_SYMBOL_MOVED_LINE_ALT (1<<18)
> -#define DIFF_SYMBOL_MOVED_LINE_UNINTERESTING   (1<<19)
> +#define DIFF_SYMBOL_CONTENT_BLANK_LINE_EOF (1<<17)
> +#define DIFF_SYMBOL_MOVED_LINE (1<<18)
> +#define DIFF_SYMBOL_MOVED_LINE_ALT (1<<19)
> +#define DIFF_SYMBOL_MOVED_LINE_UNINTERESTING   (1<<20)
>  #define DIFF_SYMBOL_CONTENT_WS_MASK (WSEH_NEW | WSEH_OLD | WSEH_CONTEXT | 
> WS_RULE_MASK)
>
>  /*
> @@ -1094,6 +1094,7 @@ static void emit_diff_symbol_from_struct(struct 
> diff_options *o,
> set = diff_get_color_opt(o, DIFF_FRAGINFO);
> else if (c != '+')
> set = diff_get_color_opt(o, DIFF_CONTEXT);
> +   flags |= WS_IGNORE_FIRST_SPACE;
> }
> emit_line_ws_markup(o, set, reset, line, len, set_sign, '+',
> flags & DIFF_SYMBOL_CONTENT_WS_MASK,
> diff --git a/diff.h b/diff.h
> index 79beb6eea..892416a14 100644
> --- a/diff.h
> +++ b/diff.h
> @@ -160,9 +160,9 @@ struct diff_options {
> int abbrev;
> int ita_invisible_in_index;
>  /* white-space error highlighting */
> -#define WSEH_NEW (1<<12)
> -#define WSEH_CONTEXT (1<<13)
> -#define WSEH_OLD (1<<14)
> +#define WSEH_NEW (1<<13)
> +#define WSEH_CONTEXT (1<<14)
> +#define WSEH_OLD (1<<15)
> unsigned ws_error_highlight;
> const char *prefix;
> int prefix_length;
> diff --git a/ws.c b/ws.c
> index a07caedd5..e02365a6a 100644
> --- a/ws.c
> +++ b/ws.c
> @@ -20,6 +20,7 @@ static struct whitespace_rule {
> { "bla

Re: [PATCH v4 00/21] Add `range-diff`, a `tbdiff` lookalike

2018-07-23 Thread Junio C Hamano
Stefan Beller  writes:

> On Sat, Jul 21, 2018 at 3:04 PM Johannes Schindelin via GitGitGadget
>  wrote:
>
>> Side note: I work on implementing range-diff not only to make life easier 
>> for reviewers who have to suffer through v2, v3, ... of my patch series, but 
>> also to verify my changes before submitting a new iteration. And also, maybe 
>> even more importantly, I plan to use it to verify my merging-rebases of Git 
>> for
>> Windows (for which I previously used to redirect the pre-rebase/post-rebase 
>> diffs vs upstream and then compare them using `git diff --no-index`). And of 
>> course any interested person can see what changes were necessary e.g. in the 
>> merging-rebase of Git for Windows onto v2.17.0 by running a command like:
>
> Thanks for making tools that makes the life of a Git developer easier!
> (Just filed https://github.com/gitgitgadget/gitgitgadget/issues/26
> which asks to break lines for this cover letter)

Thanks.  These cover letters are unreadable without W Q
(gnus-article-fill-long-lines)


Re: [PATCH 1/9] contrib: add a script to initialize VS Code configuration

2018-07-23 Thread Junio C Hamano
Junio C Hamano  writes:

> "Johannes Schindelin via GitGitGadget" 
> writes:
>
>> diff --git a/contrib/vscode/init.sh b/contrib/vscode/init.sh
>> new file mode 100755
>> index 0..3cc93243f
>> --- /dev/null
>> +++ b/contrib/vscode/init.sh
>> @@ -0,0 +1,165 @@
>> +#!/bin/sh
>
> This is caued by our usual "git apply --whitespace=warn" as it

"caught"; sorry for a typo.


Re: [PATCH] pack-objects: fix performance issues on packing large deltas

2018-07-23 Thread Jeff King
On Sat, Jul 21, 2018 at 06:23:32AM +0200, Duy Nguyen wrote:

> > I'm not sure I completely agree with this. While 4GB deltas should be
> > pretty rare, the nice thing about 64-bit is that you never have to even
> > think about whether the variable is large enough. I think the 2^32
> > limitations on Windows are something we should be fixing in the long
> > term (though there it is even worse because it is not just deltas, but
> > entire objects).
> 
> I guess that means we stick to uint64_t then? It does increase more
> memory on 32-bit architecture (and probably processing cost as well
> because 64-bit uses up 2 registers).

Yes, but if we switch away from an array to a hash, then we get the best
of both worlds: we are using 64-bits to store the size, but we only need
an entry for deltas that are actually big.

Then the common small-delta case remains efficient in both CPU and
memory, and we pay the costs only in proportion to the number of large
deltas (though the hash is a slightly higher cost for those large deltas
than an array).

> > This is new in this iteration. I guess this is to cover the case where
> > we are built with pthread support, but --threads=1?
> 
> If you mean the "lock_initialized" variable, not really. the other
> _lock() macros in builtin/ call pthread_mutex_lock() unconditionally,
> which is fine. But I feel a bit uncomfortable doing the same in
> pack-objects.h which technically is library code (but yes practically
> is a long arm of builtin/pack-objects.c), so lock_initialized is there
> to make sure we don't touch uninitialized locks if somebody forgets to
> init them first.

I think the ones in builtin/ check threads_active to avoid actually
locking. And that's set in init_thread(), which we do not call if we are
using a single thread. So I think this is basically doing the same
thing, but with a separate flag (since the library code does not know
about threads_active).

> > Your original patch had to copy the oe_* helpers into the file to handle
> > that. But I think we're essentially just locking the whole functions:
> 
> I'll try to avoid this lock when deltas are small and see if it helps
> the linux.git case on Elijah's machine. So we may end up locking just
> a part of these functions.

Yeah, I think my suggestion doesn't work once we start doing more
complex locking logic. Let's just forget it. I think the
"lock_initialized" thing is probably the right approach.

It might be worth getting rid of builtin/pack-objects.c's local
threads_active variable, and just using to_pack.threads_active. The two
flag would always want to match.

-Peff


Re: [PATCH 02/14] format-patch: add --interdiff option to embed diff in cover letter

2018-07-23 Thread Junio C Hamano
Eric Sunshine  writes:

> On Mon, Jul 23, 2018 at 12:03 PM Duy Nguyen  wrote:
>> On Sun, Jul 22, 2018 at 11:57 AM Eric Sunshine  
>> wrote:
>> > @@ -0,0 +1,17 @@
>> > +void show_interdiff(struct rev_info *rev)
>> > +{
>> > +   struct diff_options opts;
>> > +
>> > +   memcpy(&opts, &rev->diffopt, sizeof(opts));
>> > +   opts.output_format = DIFF_FORMAT_PATCH;
>> > +   diff_setup_done(&opts);
>> > +
>> > +   diff_tree_oid(rev->idiff_oid1, rev->idiff_oid2, "", &opts);
>> > +   diffcore_std(&opts);
>> > +   diff_flush(&opts);
>> > +}
>>
>> Is it worth adding a new file just for a single function? I haven't
>> read the rest of the series, but the cover letter's diffstat suggests
>> this is it. Is interdiff intended to become a lot more complicated in
>> the future? If not maybe just add this function in diff-lib.c
>
> Good question. The functionality originally lived in builtin/log.c but
> moved to log-tree.c when I added the ability to embed an interdiff in
> a single patch. However, it didn't "feel" right in log-tree.c, so I
> moved it to its own file to mirror how the range-diff engine resides
> in its own file.

> And, the function actually did several more things as originally
> implemented. For instance, it took care of not clobbering global
> diff-queue state, and consulting 'reroll_count' and printing the
> "Interdiff:" header, but those bits eventually moved to live at more
> "correct" locations, leaving this relatively minimal function behind.
> It does get a bit more complex in a later patch, but not significantly
> so.
>
> I wasn't aware of diff-lib.c, but it does seem like show_interdiff()
> could be at home there.

Yeah, diff-lib.c is meant to be a home for the implementation of low
level "diff-{files,tree,index}" plumbing that can be called from
other routines; if you are adding "take two things, compare them"
that can be reused from places, then it is a good match.


Re: [PATCH 1/1] t7406: avoid failures solely due to timing issues

2018-07-23 Thread Junio C Hamano
Stefan Beller  writes:

> This test was last touched in c66410ed32a (submodule init:
> redirect stdout to stderr, 2016-05-02), merged as f2c96ceb57a
> (Merge branch 'sb/submodule-init', 2016-05-17)
>
> The parallelizing effort was made before that
> (2335b870fa7 (submodule update: expose parallelism to the
> user, 2016-02-29) via bdebbeb3346 (Merge branch
> 'sb/submodule-parallel-update', 2016-04-06)), but did not
> review existing tests for the newly introduced raciness.

Ah, that explains what I was scratching my head about.  I somehow
thought that the parallel update broke an otherwise working test
written back in the serial days, but it seems it was the other way
around.  The test was simply careless.

Nice to have the two-year old issue fixed.  Thanks, all.


Re: [PATCH v2] pack-objects: fix performance issues on packing large deltas

2018-07-23 Thread Jeff King
On Mon, Jul 23, 2018 at 08:49:59PM +0200, Duy Nguyen wrote:

> On Mon, Jul 23, 2018 at 8:38 PM Duy Nguyen  wrote:
> > I will have to study the thread dispatch code more to have a better
> > answer, unfortunately.
> 
> Well.. I thought I would need this weekend for this, but a quick look
> and ll_find_deltas() suggests that what we're doing is safe. At least
> you and Jeff are way to familiar with the delta window concept in
> pack-objects. So in multithread mode, we have a big list of all
> objects, then the list is cut in N sublists and each is owned entirely
> by one thread. Each thread then can slide a window over this sublist
> to search for the best delta.
> 
> Because of this partitioning, if trg_entry is being processed now, it
> will not be accessed by any other thread. It's owned by this thread
> and will be accessed again as src_entry when the next entry becomes
> the delta target in the same thread. As long as we don't access
> objects of another thread (and my v1 violates this) we should be ok.

Yes, that matches my knowledge of how this all works. And if it didn't,
I think the code would have been racy even _before_ your patches.

The only thing that this pack->delta_size approach is changing is that
managing that array needs to happen under lock, because it touches the
whole list.

And as long as we back-fill it from any arbitrary e->delta_size_, that
means that touching e->delta_size_ needs to be done under lock. That's
why I think keeping the individual valid flag in each entry makes the
most sense. Then whenever we overflow a particular e->delta_size_, we
don't have to care about anybody else's size.

Which I think is what your v2 patch is doing.

-Peff


Re: [PATCH] fetch-pack: mark die strings for translation

2018-07-23 Thread Jonathan Nieder
Brandon Williams wrote:

> Signed-off-by: Brandon Williams 
> ---
>  fetch-pack.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)

Reviewed-by: Jonathan Nieder 

Thanks.


Re: [PATCH v6 8/8] fetch-pack: implement ref-in-want

2018-07-23 Thread Jonathan Nieder
Hi,

Duy Nguyen wrote:
> On Mon, Jul 23, 2018 at 7:53 PM Brandon Williams  wrote:

>> What criteria is used to determine if something should be translated?
[...]
> Besides drawing the line "benefit from (not) being translated" varies
> from one developer to another. I think it's just easier and more
> consistent to stick to "if it's not machine-readable (or really meant
> for devs, like BUG()), translate it" and leave it to translators to
> decide.

Yup, this is my understanding too: everything user-facing (as opposed
to machine-facing) that isn't a "this can't happen" local assertion
error (BUG) should be translated.

Protocol and format errors aren't BUG (assertion error) because they
reflect something that shouldn't be possible to happen due to someone
*else* behaving badly.  BUG is for errors that shouldn't be possible
due to the version of Git that produces the message behaving badly.

po/README says

 - Don't mark everything for translation, only strings which will be
   read by humans (the porcelain interface) should be translated.

   The output from Git's plumbing utilities will primarily be read by
   programs and would break scripts under non-C locales if it was
   translated. Plumbing strings should not be translated, since
   they're part of Git's API.

I kind of wish we had a rule that was less fuzzy, since then we could
enforce it.  For example, JGit has a rule that *all* strings should be
translatable unless they have a comment explaining why they shouldn't
be (which goes a little too far --- there are those comments all over
the place).

Thanks,
Jonathan


Re: [PATCH v4 11/21] range-diff: add tests

2018-07-23 Thread Stefan Beller
> +test_expect_success 'simple B...C (unmodified)' '
> +   git range-diff --no-color

I wonder if we want to have tests for colors, too, eventually.
(Feel free to push back on it or put it into the left over hashtag.
But given how much time we (or I) spent discussing colors,
this would be a welcome extension for the tests)

Stefan


Re: [PATCH v4 00/21] Add `range-diff`, a `tbdiff` lookalike

2018-07-23 Thread Stefan Beller
On Sat, Jul 21, 2018 at 3:04 PM Johannes Schindelin via GitGitGadget
 wrote:

> Side note: I work on implementing range-diff not only to make life easier for 
> reviewers who have to suffer through v2, v3, ... of my patch series, but also 
> to verify my changes before submitting a new iteration. And also, maybe even 
> more importantly, I plan to use it to verify my merging-rebases of Git for
> Windows (for which I previously used to redirect the pre-rebase/post-rebase 
> diffs vs upstream and then compare them using `git diff --no-index`). And of 
> course any interested person can see what changes were necessary e.g. in the 
> merging-rebase of Git for Windows onto v2.17.0 by running a command like:

Thanks for making tools that makes the life of a Git developer easier!
(Just filed https://github.com/gitgitgadget/gitgitgadget/issues/26
which asks to break lines for this cover letter)

> base-commit: b7bd9486b055c3f967a870311e704e3bb0654e4f
> Published-As: 
> https://github.com/gitgitgadget/git/releases/tags/pr-1%2Fdscho%2Fbranch-diff-v4
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
> pr-1/dscho/branch-diff-v4
> Pull-Request: https://github.com/gitgitgadget/git/pull/1

I just realized that if I had ideal memory of the previous review,
I could contain my whole review answer to this email only.

>
> Range-diff vs v3:
>
>   1:  39272eefc !  1:  f7e70689e linear-assignment: a function to solve 
> least-cost assignment problems
>  @@ -223,9 +223,7 @@
>   + BUG("negative j: %d", j);
>   + i = pred[j];
>   + column2row[j] = i;
>  -+ k = j;
>  -+ j = row2column[i];
>  -+ row2column[i] = k;
>  ++ SWAP(j, row2column[i]);

The dual color option (as a default) really helps here. Thanks for that!
Does it have to be renamed though? (It's more than two colors; originally
it was inverting the beginning signs)

Maybe --color=emphasize-later
assuming there will be other modes for coloring, such as "diff-only",
which would correspond with --no-dual-color, or "none" that will correspond
would be --no-color. I imagine there could be more fancy things, hence I would
propose a mode rather than a switch.
(Feel free to push back if discussing naming here feels like bike shedding)


2:  7f15b26d4ea !  82:  88134121d2a Introduce `range-diff` to compare
iterations of a topic branch
[...]
>   diff --git a/Makefile b/Makefile
>   --- a/Makefile
>   +++ b/Makefile

The line starting with --- is red (default removed color) and the line
with +++ is green (default add color).

Ideally these two lines and the third line above starting with "diff --git"
would render in GIT_COLOR_BOLD ("METAINFO").

>   3:  076e1192d !  3:  4e3fb47a1 range-diff: first rudimentary implementation
>  @@ -4,7 +4,7 @@
>
>   At this stage, `git range-diff` can determine corresponding commits
>   of two related commit ranges. This makes use of the recently 
> introduced
>  -implementation of the Hungarian algorithm.
>  +implementation of the linear assignment algorithm.
>
>   The core of this patch is a straight port of the ideas of tbdiff, 
> the
>   apparently dormant project at https://github.com/trast/tbdiff.
>  @@ -51,19 +51,17 @@
>   + int res = 0;
>   + struct strbuf range1 = STRBUF_INIT, range2 = STRBUF_INIT;
>
>  -- argc = parse_options(argc, argv, NULL, options,
>  --  builtin_range_diff_usage, 0);
>  -+ argc = parse_options(argc, argv, NULL, options, 
> builtin_range_diff_usage,
>  -+  0);
>  +  argc = parse_options(argc, argv, NULL, options,
>  +   builtin_range_diff_usage, 0);

This is really nice in colors when viewed locally.

>  16:  dfa7b1e71 <  -:  - range-diff --dual-color: work around bogus 
> white-space warning
>   -:  - > 16:  f4252f2b2 range-diff --dual-color: fix bogus 
> white-space warning

Ah; here my initial assumption of only reviewing the range-diff breaks down now.
I'll dig into patch 16 separately.

Maybe it is worth having an option to expand all "new" patches.
(Given that the range-diff pr-1/dscho/branch-diff-v3...pr-1/dscho/branch-diff-v4
told me you used a different base, this is a hard problem, as I certainly
would want to skip over all new base commits, but this one is interesting
to look at. An easy way out: Maybe an option to expand any new
commits/patches after the first expansion? Asking for opinions rather
than implementing it)

>  19:  144363006 <  -:  - range-diff: left-pad patch numbers
>   -:  - > 19:  07ec215e8 range-diff: left-pad patch numbers

>   -:  - > 21:  d8498fb32 range-diff: use dim/bold cues to improve 
> dual color mode

Those are interesting, I'll look at them separately, too.

Thanks,
Stefan


Re: [PATCH 1/1] t7406: avoid failures solely due to timing issues

2018-07-23 Thread Stefan Beller
On Mon, Jul 23, 2018 at 12:14 PM Junio C Hamano  wrote:
>
> "Johannes Schindelin via GitGitGadget" 
> writes:
>
> > To prevent such false positives from unnecessarily costing time when
> > investigating test failures, let's take the exact order of the lines out
> > of the equation by sorting them before comparing them.
> > ...
> >  cat  > +Cloning into '$pwd/recursivesuper/super/merging'...
> > +Cloning into '$pwd/recursivesuper/super/none'...
> > +Cloning into '$pwd/recursivesuper/super/rebasing'...
> > +Cloning into '$pwd/recursivesuper/super/submodule'...
> >  Submodule 'merging' ($pwd/merging) registered for path '../super/merging'
> >  Submodule 'none' ($pwd/none) registered for path '../super/none'
> >  Submodule 'rebasing' ($pwd/rebasing) registered for path 
> > '../super/rebasing'
> >  Submodule 'submodule' ($pwd/submodule) registered for path 
> > '../super/submodule'
>
> Thanks.  It obviously goes in the right direction to loosen the
> ordering requirement that fundamentally cannot be met, as the log
> message cleanly analizes.
>
> Do we know that the write(2)s that show these lines are atomic, or
> are we merely lucky that we don't see them intermixed in the middle
> of lines (sbeller cc'ed).  I _think_ they are but just wanted to
> double check.

Not just the lines, but the whole message per submodule (i.e.
the "Cloning... " lione and its accompanying "done") is one
atomic unit. These messages are from processes invoked via
run_processes_parallel (in run-command.h), which buffers
all output per process.

This test was last touched in c66410ed32a (submodule init:
redirect stdout to stderr, 2016-05-02), merged as f2c96ceb57a
(Merge branch 'sb/submodule-init', 2016-05-17)

The parallelizing effort was made before that
(2335b870fa7 (submodule update: expose parallelism to the
user, 2016-02-29) via bdebbeb3346 (Merge branch
'sb/submodule-parallel-update', 2016-04-06)), but did not
review existing tests for the newly introduced raciness.

Thanks for the fix!
Stefan


Re: [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc)

2018-07-23 Thread Ben Peart




On 7/23/2018 1:03 PM, Duy Nguyen wrote:

On Mon, Jul 23, 2018 at 5:50 PM Ben Peart  wrote:

Anyway, on to the actual discussion:


Here is a checkout command with tracing turned on to demonstrate where the
time is spent.  Note, this is somewhat of a �best case� as I�m simply
checking out the current commit:

benpeart@gvfs-perf MINGW64 /f/os/src (official/rs_es_debug_dev)
$ /usr/src/git/git.exe checkout
12:31:50.419016 read-cache.c:2006   performance: 1.180966800 s: read cache 
.git/index
12:31:51.184636 name-hash.c:605 performance: 0.664575200 s: initialize 
name hash
12:31:51.200280 preload-index.c:111 performance: 0.019811600 s: preload 
index
12:31:51.294012 read-cache.c:1543   performance: 0.094515600 s: refresh 
index
12:32:29.731344 unpack-trees.c:1358 performance: 33.889840200 s: 
traverse_trees
12:32:37.512555 read-cache.c:2541   performance: 1.564438300 s: write 
index, changed mask = 28
12:32:44.918730 unpack-trees.c:1358 performance: 7.243155600 s: 
traverse_trees
12:32:44.965611 diff-lib.c:527  performance: 7.374729200 s: diff-index
Waiting for GVFS to parse index and update placeholder files...Succeeded
12:32:46.824986 trace.c:420 performance: 57.715656000 s: git 
command: 'C:\git-sdk-64\usr\src\git\git.exe' checkout


What's the current state of the index before this checkout?


This was after running "git checkout" multiple times so there was really
nothing for git to do.


Hmm.. this means cache-tree is fully valid, unless you have changes in
index. We're quite aggressive in repairing cache-tree since aecf567cbf
(cache-tree: create/update cache-tree on checkout - 2014-07-05). If we
have very good cache-tree records and still spend 33s on
traverse_trees, maybe there's something else.



I'm not at all familiar with the cache-tree and couldn't find any 
documentation on it other than index-format.txt which says "it helps 
speed up tree object generation for a new commit."  In this particular 
case, no new commit is being created so I don't know that the cache-tree 
would help.


After a quick look at the code, the only place I can find that tries to 
use cache_tree_matches_traversal() is in unpack_callback() and that only 
happens if n == 1 and in the "git checkout" case, n == 2. Am I missing 
something?


Re: [PATCH v2 17/18] commit-reach: make can_all_from_reach... linear

2018-07-23 Thread Jonathan Tan
> + if (parse_commit(list[i]) ||
> + list[i]->generation < min_generation)

Here...

> + if (parse_commit(parent->item) ||
> + parent->item->date < 
> min_commit_date ||
> + parent->item->generation < 
> min_generation)

...and here, would parse_commit_or_die() be better? I think that a
function that returns a definitive answer (either the commits are
reachable or not) should die when the commits cannot be parsed.

Other than that, I've compared the commits in this version to v1, and
all my review comments have been addressed, thanks. (With the exception
of the skip_prefix() one, but that is a minor matter - I suggested that
to make it easier to implement my "Ancestor:" and "Descendant:"
suggestion which Stolee disagreed on with reason.)

[1] 
https://public-inbox.org/git/20180716230019.257742-1-jonathanta...@google.com/


Re: [PATCH v2 15/18] test-reach: test commit_contains

2018-07-23 Thread Jonathan Tan
> + } else if (!strcmp(av[1], "commit_contains")) {
> + struct ref_filter filter;
> + struct contains_cache cache;
> + init_contains_cache(&cache);
> +
> + if (ac > 2 && !strcmp(av[2], "--tag"))
> + filter.with_commit_tag_algo = 1;
> + else
> + filter.with_commit_tag_algo = 0;
> +
> + printf("%s(_,A,X,_):%d\n", av[1], commit_contains(&filter, A, 
> X, &cache));

Should we initialize filter (with {NULL} or some equivalent)?


Re: [PATCH 00/14] format-patch: add --interdiff and --range-diff options

2018-07-23 Thread Eric Sunshine
On Mon, Jul 23, 2018 at 12:32 PM Duy Nguyen  wrote:
> On Sun, Jul 22, 2018 at 11:57 AM Eric Sunshine  
> wrote:
> > When re-submitting a patch series, it is often helpful (for reviewers)
> > to include an interdiff or range-diff against the previous version.
> > Doing so requires manually running git-diff or git-range-diff and
> > copy/pasting the result into the cover letter of the new version.
> >
> > This series automates the process by introducing git-format-patch
> > options --interdiff and --range-diff which insert such a diff into the
> > cover-letter or into the commentary section of the lone patch of a
> > 1-patch series. In the latter case, the interdiff or range-diff is
> > indented to avoid confusing git-am and human readers.
>
> I gave up after 10/14. But what I've seen is nice (yes I have a couple
> comments here and there but you probably won't need to update
> anything).

Thanks for the review comments.


Re: [PATCH 10/14] format-patch: add --range-diff option to embed diff in cover letter

2018-07-23 Thread Eric Sunshine
On Mon, Jul 23, 2018 at 12:28 PM Duy Nguyen  wrote:
> On Sun, Jul 22, 2018 at 11:58 AM Eric Sunshine  
> wrote:
> > diff --git a/Documentation/git-format-patch.txt 
> > b/Documentation/git-format-patch.txt
> > index f8a061794d..e7f404be3d 100644
> > --- a/Documentation/git-format-patch.txt
> > +++ b/Documentation/git-format-patch.txt
> > @@ -24,6 +24,7 @@ SYNOPSIS
> >[--to=] [--cc=]
> >[--[no-]cover-letter] [--quiet] [--notes[=]]
> >[--interdiff=]
> > +  [--range-diff=]
>
> I wonder if people will use both --interdiff= and
> --range-diff= often enough to justify a shortcut
> "--all-kinds-of-diff=" so that we don't have to type 
> twice. But I guess we don't have to worry about this right now.

My original thought was that --interdiff and --range-diff would be
mutually exclusive, however, I quickly realized that some people might
like to use both options together since each format has its strengths
and weaknesses. (I've used both types of diffs together when preparing
rerolls of my own series and found that, together, they provided a
better picture of the reroll than either would have alone.)

Based upon experience on this mailing list, I'd guess that most people
would use only one or the other, though that doesn't speak for other
projects. And, as you note, it's something that can be added later if
warranted (plus, this series is already long and I'd like to avoid
making it longer for something like this whose value is only
speculative).


Re: [PATCH 1/9] contrib: add a script to initialize VS Code configuration

2018-07-23 Thread Junio C Hamano
"Johannes Schindelin via GitGitGadget" 
writes:

> diff --git a/contrib/vscode/init.sh b/contrib/vscode/init.sh
> new file mode 100755
> index 0..3cc93243f
> --- /dev/null
> +++ b/contrib/vscode/init.sh
> @@ -0,0 +1,165 @@
> +#!/bin/sh

This is caued by our usual "git apply --whitespace=warn" as it
contains lines indented by spaces not tabs (perhaps the json
literals?)  Can we have contrib/vscode/.gitattributes to tweak the
whitespace breakage checking rules so that this file is excempt?

I'll just shut my eyes while applying this series for today, though
;-)


Re: [PATCH 06/14] format-patch: allow --interdiff to apply to a lone-patch

2018-07-23 Thread Eric Sunshine
On Mon, Jul 23, 2018 at 12:22 PM Duy Nguyen  wrote:
> On Sun, Jul 22, 2018 at 11:58 AM Eric Sunshine  
> wrote:
> > +   if (cmit_fmt_is_mail(ctx.fmt) && opt->idiff_oid1) {
>
> OK putting idiff stuff in rev_info is probably the right choice. But
> we all three fields prefixed with idiff_, maybe you could just add a
> new struct "idiff_options" to contain them and add a pointer to that
> struct in rev_info. Then "opt->idiff" is enough to know if idiff is
> requested instead of relying on idiff_oid1 (seems too random).

I have mixed feelings about this suggestion for the following reasons:

* 'struct rev_info' already contains a number of specialized fields
which apply in only certain use cases but not others, and those fields
often are grouped textually to show relationship rather than being
bundled in a struct.

* These new fields are very specific to this particular operation and
are unlikely to ever have wider use, so adding the extra level of
indirection/abstraction (whatever you'd call it) feels overkill and,
while nice theoretically, adds complexity without much practical
value.

* Bundling these fields in a struct might incorrectly imply to readers
that these items, collectively, can be used in some general-purpose
fashion, which isn't at all the case; they are very specific to this
operation and that struct would never be used elsewhere or for any
other purpose.

The upside of bundling them in a struct, as you mention, is that
"opt->idiff" would be slightly more obvious than "opt->idiff_oid1" as
a "should we print an interdiff?" conditional. On the other hand, this
case is so specific and narrow that I'm not sure it warrants such
generality.


Re: [PATCH 0/5] Misc Coccinelle-related improvements

2018-07-23 Thread Junio C Hamano
SZEDER Gábor  writes:

> Just a couple of minor Coccinelle-related improvements:
>
>   - The first two patches are small cleanups.
>
>   - The last three could make life perhaps just a tad bit easier for
> devs running 'make coccicheck'.

Thanks.  All 5 patches make sense.

Queued.


Re: [PATCH 2/5] coccinelle: use $(addsuffix) in 'coccicheck' make target

2018-07-23 Thread Junio C Hamano
SZEDER Gábor  writes:

> The dependencies of the 'coccicheck' make target are listed with the
> help of the $(patsubst) make function, which in this case doesn't do
> any pattern substitution, but only adds the '.patch' suffix.
>
> Use the shorter and more idiomatic $(addsuffix) make function instead.

Makes sense.

>
> Signed-off-by: SZEDER Gábor 
> ---
>  Makefile | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/Makefile b/Makefile
> index 27bfc196dd..8f509576e9 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -2683,7 +2683,7 @@ C_SOURCES = $(patsubst %.o,%.c,$(C_OBJ))
>   then \
>   echo '' SPATCH result: $@; \
>   fi
> -coccicheck: $(patsubst %.cocci,%.cocci.patch,$(wildcard 
> contrib/coccinelle/*.cocci))
> +coccicheck: $(addsuffix .patch,$(wildcard contrib/coccinelle/*.cocci))
>  
>  .PHONY: coccicheck


Re: [PATCH 1/5] coccinelle: mark the 'coccicheck' make target as .PHONY

2018-07-23 Thread Junio C Hamano
SZEDER Gábor  writes:

> The 'coccicheck' target doesn't create a file with the same name, so
> mark it as .PHONY.
>
> Signed-off-by: SZEDER Gábor 
> ---

Good.  It is customary to do so immediately before the target, not
after a blank line, though.

>  Makefile | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/Makefile b/Makefile
> index e4b503d259..27bfc196dd 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -2685,6 +2685,8 @@ C_SOURCES = $(patsubst %.o,%.c,$(C_OBJ))
>   fi
>  coccicheck: $(patsubst %.cocci,%.cocci.patch,$(wildcard 
> contrib/coccinelle/*.cocci))
>  
> +.PHONY: coccicheck
> +
>  ### Installation rules
>  
>  ifneq ($(filter /%,$(firstword $(template_dir))),)


Re: [PATCH 03/14] format-patch: teach --interdiff to respect -v/--reroll-count

2018-07-23 Thread Eric Sunshine
On Mon, Jul 23, 2018 at 12:12 PM Duy Nguyen  wrote:
> On Sun, Jul 22, 2018 at 11:57 AM Eric Sunshine  
> wrote:
> > @@ -215,6 +215,7 @@ struct rev_info {
> > /* interdiff */
> > const struct object_id *idiff_oid1;
> > const struct object_id *idiff_oid2;
> > +   const char *idiff_title;
>
> I feel we're abusing struct rev_info a bit for this since this
> interdiff thing is very builtin/log.c's business and not at all
> related to rev walk. Is it possible (and easy) to just pass
> idfff_title from cmd_format_patch to make_cover_letter()? If it's a
> lot of code, then I guess we can just leave it here.

As originally implemented, this information was passed directly to
make_cover_letter(), however, as you discovered in your review of
patch 6/14[1], which makes it possible to embed an interdiff in the
commentary section of a lone patch, 'struct rev_info' is the only way
to pass this information down that deeply in the patch-generation
process. So, yes, this pretty much needs to use 'struct rev_info',
even if that need doesn't exist at this early step in the series.

[1]: 
https://public-inbox.org/git/CACsJy8Aw6R8-3kDfhCqunXziajCg9O_1WrEYc4rfKa+-=m1...@mail.gmail.com/


Re: [PATCH 02/14] format-patch: add --interdiff option to embed diff in cover letter

2018-07-23 Thread Eric Sunshine
On Mon, Jul 23, 2018 at 12:03 PM Duy Nguyen  wrote:
> On Sun, Jul 22, 2018 at 11:57 AM Eric Sunshine  
> wrote:
> > @@ -0,0 +1,17 @@
> > +void show_interdiff(struct rev_info *rev)
> > +{
> > +   struct diff_options opts;
> > +
> > +   memcpy(&opts, &rev->diffopt, sizeof(opts));
> > +   opts.output_format = DIFF_FORMAT_PATCH;
> > +   diff_setup_done(&opts);
> > +
> > +   diff_tree_oid(rev->idiff_oid1, rev->idiff_oid2, "", &opts);
> > +   diffcore_std(&opts);
> > +   diff_flush(&opts);
> > +}
>
> Is it worth adding a new file just for a single function? I haven't
> read the rest of the series, but the cover letter's diffstat suggests
> this is it. Is interdiff intended to become a lot more complicated in
> the future? If not maybe just add this function in diff-lib.c

Good question. The functionality originally lived in builtin/log.c but
moved to log-tree.c when I added the ability to embed an interdiff in
a single patch. However, it didn't "feel" right in log-tree.c, so I
moved it to its own file to mirror how the range-diff engine resides
in its own file.

And, the function actually did several more things as originally
implemented. For instance, it took care of not clobbering global
diff-queue state, and consulting 'reroll_count' and printing the
"Interdiff:" header, but those bits eventually moved to live at more
"correct" locations, leaving this relatively minimal function behind.
It does get a bit more complex in a later patch, but not significantly
so.

I wasn't aware of diff-lib.c, but it does seem like show_interdiff()
could be at home there.


Re: [PATCH 1/1] t7406: avoid failures solely due to timing issues

2018-07-23 Thread Junio C Hamano
"Johannes Schindelin via GitGitGadget" 
writes:

> To prevent such false positives from unnecessarily costing time when
> investigating test failures, let's take the exact order of the lines out
> of the equation by sorting them before comparing them.
> ...
>  cat  +Cloning into '$pwd/recursivesuper/super/merging'...
> +Cloning into '$pwd/recursivesuper/super/none'...
> +Cloning into '$pwd/recursivesuper/super/rebasing'...
> +Cloning into '$pwd/recursivesuper/super/submodule'...
>  Submodule 'merging' ($pwd/merging) registered for path '../super/merging'
>  Submodule 'none' ($pwd/none) registered for path '../super/none'
>  Submodule 'rebasing' ($pwd/rebasing) registered for path '../super/rebasing'
>  Submodule 'submodule' ($pwd/submodule) registered for path 
> '../super/submodule'

Thanks.  It obviously goes in the right direction to loosen the
ordering requirement that fundamentally cannot be met, as the log
message cleanly analizes.

Do we know that the write(2)s that show these lines are atomic, or
are we merely lucky that we don't see them intermixed in the middle
of lines (sbeller cc'ed).  I _think_ they are but just wanted to
double check.




Re: [PATCH] l10n: de.po: translate 108 new messages

2018-07-23 Thread Matthias Rüster

Acked-by: Matthias Rüster 

Thanks!



Am 19.07.2018 um 19:15 schrieb Ralf Thielow:

Translate 108 new messages came from git.pot update in 9b7388a85 (l10n:
git.pot: v2.18.0 round 1 (108 new, 14 removed)).

Signed-off-by: Ralf Thielow 
---
  po/de.po | 373 +--
  1 file changed, 194 insertions(+), 179 deletions(-)

diff --git a/po/de.po b/po/de.po
index bdc5830e1..47986814c 100644
--- a/po/de.po
+++ b/po/de.po
@@ -10,25 +10,25 @@ msgstr ""
  "POT-Creation-Date: 2018-05-31 23:32+0800\n"
  "PO-Revision-Date: 2016-11-28 18:10+0100\n"
  "Last-Translator: Ralf Thielow \n"
  "Language-Team: German <>\n"
  "Language: de\n"
  "MIME-Version: 1.0\n"
  "Content-Type: text/plain; charset=UTF-8\n"
  "Content-Transfer-Encoding: 8bit\n"
  "Plural-Forms: nplurals=2; plural=(n!=1);\n"
  
  #: advice.c:92

-#, fuzzy, c-format
+#, c-format
  msgid "%shint: %.*s%s\n"
-msgstr "Hinweis: %.*s\n"
+msgstr "%sHinweis: %.*s%s\n"
  
  #: advice.c:137

  msgid "Cherry-picking is not possible because you have unmerged files."
  msgstr ""
  "Cherry-Picken ist nicht möglich, weil Sie nicht zusammengeführte Dateien "
  "haben."
  
  #: advice.c:139

  msgid "Committing is not possible because you have unmerged files."
  msgstr ""
  "Committen ist nicht möglich, weil Sie nicht zusammengeführte Dateien haben."
@@ -1284,44 +1284,52 @@ msgstr "%s %s ist kein Commit!"
  #: commit.c:182
  msgid ""
  "Support for /info/grafts is deprecated\n"
  "and will be removed in a future Git version.\n"
  "\n"
  "Please use \"git replace --convert-graft-file\"\n"
  "to convert the grafts into replace refs.\n"
  "\n"
  "Turn this message off by running\n"
  "\"git config advice.graftFileDeprecated false\""
  msgstr ""
+"Die Unterstützung für /info/grafts ist veraltet\n"
+"und wird in zukünftigen Git Versionen entfernt.\n"
+"\n"
+"Bitte benutzen Sie \"git replace --convert-graft-file\"\n"
+"zum Konvertieren der künstlichen Vorgänger (\"grafts\")\n"
+"in ersetzende Referenzen.<\n"
+"\n"
+"Sie können diese Meldung unterdrücken, indem Sie\n"
+"\"git config advice.graftFileDeprecated false\" ausführen."
  
  #: commit.c:1537

  msgid ""
  "Warning: commit message did not conform to UTF-8.\n"
  "You may want to amend it after fixing the message, or set the config\n"
  "variable i18n.commitencoding to the encoding your project uses.\n"
  msgstr ""
  "Warnung: Die Commit-Beschreibung ist nicht UTF-8 konform.\n"
  "Sie können das nachbessern, nachdem Sie die Beschreibung korrigiert haben,\n"
  "oder Sie setzen die Konfigurationsvariable i18n.commitencoding auf das "
  "Encoding,\n"
  "welches von ihrem Projekt verwendet wird.\n"
  
  #: commit-graph.c:669

  #, c-format
  msgid "the commit graph format cannot write %d commits"
-msgstr ""
+msgstr "Das Commit-Graph Format kann nicht %d Commits schreiben."
  
  #: commit-graph.c:696

-#, fuzzy
  msgid "too many commits to write graph"
-msgstr "nur Commits anzeigen, die sich nicht im ersten Branch befinden"
+msgstr "Zu viele Commits zum Schreiben des Graphen."
  
  #: commit-graph.c:707 builtin/init-db.c:516 builtin/init-db.c:521

  #, c-format
  msgid "cannot mkdir %s"
  msgstr "kann Verzeichnis %s nicht erstellen"
  
  #: compat/obstack.c:405 compat/obstack.c:407

  msgid "memory exhausted"
  msgstr "Speicher verbraucht"
  
  #: config.c:187

@@ -1554,56 +1562,61 @@ msgstr "LF würde in %s durch CRLF ersetzt werden."
  msgid ""
  "LF will be replaced by CRLF in %s.\n"
  "The file will have its original line endings in your working directory."
  msgstr ""
  "LF wird in %s durch CRLF ersetzt.\n"
  "Die Datei wird ihre ursprünglichen Zeilenenden im Arbeitsverzeichnis "
  "behalten."
  
  #: convert.c:279

  #, c-format
  msgid "BOM is prohibited in '%s' if encoded as %s"
-msgstr ""
+msgstr "BOM ist in '%s' unzulässig, wenn als %s codiert."
  
  #: convert.c:286

  #, c-format
  msgid ""
  "The file '%s' contains a byte order mark (BOM). Please use UTF-%s as 
working-"
  "tree-encoding."
  msgstr ""
+"Die Datei '%s' enthält ein Byte-Order-Mark (BOM). Bitte benutzen Sie UTF-%s\n"
+"als Codierung im Arbeitsverzeichnis."
  
  #: convert.c:304

  #, c-format
  msgid "BOM is required in '%s' if encoded as %s"
-msgstr ""
+msgstr "BOM ist erforderlich in '%s', wenn als %s codiert."
  
  #: convert.c:306

  #, c-format
  msgid ""
  "The file '%s' is missing a byte order mark (BOM). Please use UTF-%sBE or 
UTF-"
  "%sLE (depending on the byte order) as working-tree-encoding."
  msgstr ""
+"Der Datei '%s' fehlt ein Byte-Order-Mark (BOM). Bitte benutzen Sie UTF-%sBE\n"
+"oder UTF-%sLE (abhängig von der Byte-Reihenfolge) als Codierung im\n"
+"Arbeitsverzeichnis."
  
  #: convert.c:424

-#, fuzzy, c-format
+#, c-format
  msgid "failed to encode '%s' from %s to %s"
-msgstr "Fehler beim Kopieren der Notizen von '%s' nach '%s'"
+msgstr "Fehler beim Codieren von '%s' von %s nach %s."
  
  #: convert.c:467

  #, c-format
  msgid "encoding '%s' from %s to %s and back is not the same"
-msgst

Re: [PATCH 0/1] t7406: avoid failures solely due to timing issues

2018-07-23 Thread Junio C Hamano
"Johannes Schindelin via GitGitGadget" 
writes:

> This fixes a regression test that produces false positives occasionally: 
> https://git-for-windows.visualstudio.com/git/_build/results?buildId=14035&view=logs
>

[jc: forwrding to Torsten for <208b2ede-4833-f062-16f2-f35b8a8ce...@web.de>]

> Johannes Schindelin (1):
>   t7406: avoid failures solely due to timing issues
>
>  t/t7406-submodule-update.sh | 11 ++-
>  1 file changed, 6 insertions(+), 5 deletions(-)
>
>
> base-commit: b7bd9486b055c3f967a870311e704e3bb0654e4f
> Published-As: 
> https://github.com/gitgitgadget/git/releases/tags/pr-12%2Fdscho%2Fmore-robust-t7406-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
> pr-12/dscho/more-robust-t7406-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/12


Re: [PATCH 3/5] coccinelle: exclude sha1dc source files from static analysis

2018-07-23 Thread Eric Sunshine
On Mon, Jul 23, 2018 at 2:44 PM SZEDER Gábor  wrote:
> On Mon, Jul 23, 2018 at 8:28 PM Eric Sunshine  wrote:
> > On Mon, Jul 23, 2018 at 9:51 AM SZEDER Gábor  wrote:
> > > +ifdef DC_SHA1_SUBMODULE
> > > +COCCI_SOURCES = $(filter-out sha1collisiondetection/%,$(C_SOURCES))
> > > +else
> > > +COCCI_SOURCES = $(filter-out sha1dc/%,$(C_SOURCES))
> > > +endif
> >
> > Can't you just filter out both of these unconditionally without
> > worrying about DC_SHA1_SUBMODULE?
>
> I'm not sure what you mean by that.  Like this perhaps?
>
>   COCCI_SOURCES = $(filter-out sha1collisiondetection/%,$(filter-out
> sha1dc/%,$(C_SOURCES)))
>
> While it's only a single line, I don't think it's any easier on the
> eyes.

I wasn't worried about readability or one or two lines (indeed, you
could still do the filtering over two statements).

What I meant was that sha1dc/ contains files whether DC_SHA1_SUBMODULE
is defined or not. If the idea of this change is that there's no point
in having Coccinelle check those foreign, imported files (and waste
time in the process), then I was thinking that you'd want to omit
sha1dc/* regardless of whether DC_SHA1_SUBMODULE is defined.

Looking more closely at the Makefile, however, I see that C_SOURCES
holds only one or the other of sha1dc/* or
sha1collisiondetection/lib/*, so my concern is unfounded, which
explains why my question confused you.


Re: [PATCH v2] Makefile: add a DEVOPTS flag to get pedantic compilation

2018-07-23 Thread Junio C Hamano
Beat Bolli  writes:

> In the interest of code hygiene, make it easier to compile Git with the
> flag -pedantic.
>
> Pure pedantic compilation with GCC 7.3 results in one warning per use of
> the translation macro `N_`:
>
> warning: array initialized from parenthesized string constant [-Wpedantic]
>
> Therefore also disable the parenthesising of i18n strings with
> -DUSE_PARENS_AROUND_GETTEXT_N=no.
>
> Signed-off-by: Beat Bolli 
> ---
>
> This is the convenience knob for all developers that led to the series
> bb/pedantic[1]. It does not depend on this series, though.

Yup, but "make DEVELOPER=Yes" build won't pass unless this patch is
queued after those clean-up ;-)

Remind me if I forget to tweak =no back to =0 before pushing the
result out.

Thanks.

> [1] https://public-inbox.org/git/20180708144342.11922-1-dev+...@drbeat.li/T/#u
>
>  Makefile   | 6 ++
>  config.mak.dev | 5 +
>  2 files changed, 11 insertions(+)
>
> diff --git a/Makefile b/Makefile
> index 0cb6590f24..2bfc051652 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -484,6 +484,12 @@ all::
>  #The DEVELOPER mode enables -Wextra with a few exceptions. By
>  #setting this flag the exceptions are removed, and all of
>  #-Wextra is used.
> +#
> +#pedantic:
> +#
> +#Enable -pedantic compilation. This also disables
> +#USE_PARENS_AROUND_GETTEXT_N to produce only relevant warnings.
>  
>  GIT-VERSION-FILE: FORCE
>   @$(SHELL_PATH) ./GIT-VERSION-GEN
> diff --git a/config.mak.dev b/config.mak.dev
> index 2d244ca470..e11dd94741 100644
> --- a/config.mak.dev
> +++ b/config.mak.dev
> @@ -1,6 +1,11 @@
>  ifeq ($(filter no-error,$(DEVOPTS)),)
>  CFLAGS += -Werror
>  endif
> +ifneq ($(filter pedantic,$(DEVOPTS)),)
> +CFLAGS += -pedantic
> +# don't warn for each N_ use
> +CFLAGS += -DUSE_PARENS_AROUND_GETTEXT_N=no
> +endif
>  CFLAGS += -Wdeclaration-after-statement
>  CFLAGS += -Wno-format-zero-length
>  CFLAGS += -Wold-style-definition


Re: [RFC PATCH 3/5] pack-objects: add delta-islands support

2018-07-23 Thread Stefan Beller
On Sat, Jul 21, 2018 at 10:49 PM Christian Couder
 wrote:
>
> From: Jeff King 
>
> Implement support for delta islands in git pack-objects
> and document how delta islands work in
> "Documentation/git-pack-objects.txt".
>
> Signed-off-by: Jeff King 
> Signed-off-by: Christian Couder 
> ---
>  Documentation/git-pack-objects.txt |  88 +++
>  builtin/pack-objects.c | 130 -
>  2 files changed, 177 insertions(+), 41 deletions(-)
>
> diff --git a/Documentation/git-pack-objects.txt 
> b/Documentation/git-pack-objects.txt
> index d95b472d16..7b7a36056f 100644
> --- a/Documentation/git-pack-objects.txt
> +++ b/Documentation/git-pack-objects.txt
> @@ -289,6 +289,94 @@ Unexpected missing object will raise an error.
>  --unpack-unreachable::
> Keep unreachable objects in loose form. This implies `--revs`.
>
> +--delta-islands::
> +   Restrict delta matches based on "islands". See DELTA ISLANDS
> +   below.
> +
> +
> +DELTA ISLANDS
> +-
> +
> +When possible, `pack-objects` tries to reuse existing on-disk deltas to
> +avoid having to search for new ones on the fly. This is an important
> +optimization for serving fetches, because it means the server can avoid
> +inflating most objects at all and just send the bytes directly from
> +disk. This optimization can't work when an object is stored as a delta
> +against a base which the receiver does not have (and which we are not
> +already sending). In that case the server "breaks" the delta and has to
> +find a new one, which has a high CPU cost. Therefore it's important for
> +performance that the set of objects in on-disk delta relationships match
> +what a client would fetch.
> +
> +In a normal repository, this tends to work automatically. The objects
> +are mostly reachable from the branches and tags, and that's what clients
> +fetch. Any deltas we find on the server are likely to be between objects
> +the client has or will have.
> +
> +But in some repository setups, you may have several related but separate
> +groups of ref tips, with clients tending to fetch those groups
> +independently. For example, imagine that you are hosting several "forks"
> +of a repository in a single shared object store, and letting clients
> +view them as separate repositories through `GIT_NAMESPACE` or separate
> +repos using the alternates mechanism. A naive repack may find that the
> +optimal delta for an object is against a base that is only found in
> +another fork. But when a client fetches, they will not have the base
> +object, and we'll have to find a new delta on the fly.
> +
> +A similar situation may exist if you have many refs outside of
> +`refs/heads/` and `refs/tags/` that point to related objects (e.g.,
> +`refs/pull` or `refs/changes` used by some hosting providers). By
> +default, clients fetch only heads and tags, and deltas against objects
> +found only in those other groups cannot be sent as-is.
> +
> +Delta islands solve this problem by allowing you to group your refs into
> +distinct "islands". Pack-objects computes which objects are reachable
> +from which islands, and refuses to make a delta from an object `A`
> +against a base which is not present in all of `A`'s islands. This
> +results in slightly larger packs (because we miss some delta
> +opportunities), but guarantees that a fetch of one island will not have
> +to recompute deltas on the fly due to crossing island boundaries.
> +
> +Islands are configured via the `pack.island` option, which can be
> +specified multiple times. Each value is a left-anchored regular
> +expressions matching refnames. For example:
> +
> +---
> +[pack]
> +island = refs/heads/
> +island = refs/tags/
> +---
> +
> +puts heads and tags into an island (whose name is the empty string; see
> +below for more on naming). Any refs which do not match those regular
> +expressions (e.g., `refs/pull/123`) is not in any island. Any object
> +which is reachable only from `refs/pull/` (but not heads or tags) is
> +therefore not a candidate to be used as a base for `refs/heads/`.
> +
> +Refs are grouped into islands based on their "names", and two regexes
> +that produce the same name are considered to be in the same island. The
> +names are computed from the regexes by concatenating any capture groups
> +from the regex (and if there are none, then the name is the empty
> +string, as in the above example). This allows you to create arbitrary
> +numbers of islands. For example, imagine you store the refs for each
> +fork in `refs/virtual/ID`, where `ID` is a numeric identifier. You might
> +then configure:
> +
> +---
> +[pack]
> +island = refs/virtual/([0-9]+)/heads/
> +island = refs/virtual/([0-9]+)/tags/
> +island = refs/virtual/([0-9]+)/(pull)/
> +---
> +
> +That puts the heads and tags for each fork in their

Re: [PATCH] Makefile: add a DEVOPTS flag to get pedantic compilation

2018-07-23 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

>> +CFLAGS += -pedantic
>> +# don't warn for each N_ use
>> +CFLAGS += -DUSE_PARENS_AROUND_GETTEXT_N=0
>> +endif
>
> ...and set this to "no" not "0" since we document that that's the way to
> toggle it off in the Makefile, i.e. let's be consistent.

The Make variable USE_PARENS_AROUND_GETTEXT_N is described as taking
"yes" or "no".

# Define USE_PARENS_AROUND_GETTEXT_N to "yes" if your compiler happily
# compiles the following initialization:
#
#   static const char s[] = ("FOO");
#
# and define it to "no" if you need to remove the parentheses () around the
# constant.  The default is "auto", which means to use parentheses if your
# compiler is detected to support it.

But the knob on the CFLAGS set by these variables take 1 or 0

ifeq (yes,$(USE_PARENS_AROUND_GETTEXT_N))
BASIC_CFLAGS += -DUSE_PARENS_AROUND_GETTEXT_N=1
else
ifeq (no,$(USE_PARENS_AROUND_GETTEXT_N))
BASIC_CFLAGS += -DUSE_PARENS_AROUND_GETTEXT_N=0
endif
endif

And the code that uses the CFLAGS knob 

/* Mark msgid for translation but do not translate it. */
#if !USE_PARENS_AROUND_GETTEXT_N
#define N_(msgid) msgid
#else
...
#define N_(msgid) (msgid)
#endif

pays attention to the truth/false in usual C preprocessor sense.
Your "no" happens to serve as 0 just like "yes" would.

So I think you suggestion is a bad one that makes a misleading
result.

[Footnote]

*1* The following shows all "not X" except for "not one".

#include 

#define ZERO 0
#define ONE 1
#define YES yes
#define NO no
#undef UNDEF

const char *msgs[] = {
#if !ZERO
"not zero",
#endif
#if !ONE
"not one",
#endif
#if !YES
"not yes",
#endif
#if !NO
"not no",
#endif
#if !UNDEF
"not undef",
#endif
NULL
};

int main(int ac, char **av)
{
const char **cp = msgs;

while (*cp) {
printf("%s\n", *cp);
cp++;
}
return 0;
}






Re: [PATCH v2] pack-objects: fix performance issues on packing large deltas

2018-07-23 Thread Duy Nguyen
On Mon, Jul 23, 2018 at 8:38 PM Duy Nguyen  wrote:
> I will have to study the thread dispatch code more to have a better
> answer, unfortunately.

Well.. I thought I would need this weekend for this, but a quick look
and ll_find_deltas() suggests that what we're doing is safe. At least
you and Jeff are way to familiar with the delta window concept in
pack-objects. So in multithread mode, we have a big list of all
objects, then the list is cut in N sublists and each is owned entirely
by one thread. Each thread then can slide a window over this sublist
to search for the best delta.

Because of this partitioning, if trg_entry is being processed now, it
will not be accessed by any other thread. It's owned by this thread
and will be accessed again as src_entry when the next entry becomes
the delta target in the same thread. As long as we don't access
objects of another thread (and my v1 violates this) we should be ok.

At the end of ll_find_deltas() though, we have the "stealing" stuff,
but it's protected by progress_lock() already, outside try_delta() so
we're sure nobody is updating any object_entry when the stealing
happens.

I will double check again this weekend just to be sure.
-- 
Duy


Re: [PATCH 3/5] coccinelle: exclude sha1dc source files from static analysis

2018-07-23 Thread SZEDER Gábor
On Mon, Jul 23, 2018 at 8:28 PM Eric Sunshine  wrote:
>
> On Mon, Jul 23, 2018 at 9:51 AM SZEDER Gábor  wrote:
> > sha1dc is an external library, that we carry in-tree for convenience
> > or grab as a submodule, so there is no use in applying our semantic
> > patches to its source files.
> >
> > Therefore, exclude sha1dc's source files from Coccinelle's static
> > analysis.
> >
> > Signed-off-by: SZEDER Gábor 
> > ---
> > diff --git a/Makefile b/Makefile
> > @@ -2666,10 +2666,16 @@ check: command-list.h
> > +ifdef DC_SHA1_SUBMODULE
> > +COCCI_SOURCES = $(filter-out sha1collisiondetection/%,$(C_SOURCES))
> > +else
> > +COCCI_SOURCES = $(filter-out sha1dc/%,$(C_SOURCES))
> > +endif
>
> Can't you just filter out both of these unconditionally without
> worrying about DC_SHA1_SUBMODULE?

I'm not sure what you mean by that.  Like this perhaps?

  COCCI_SOURCES = $(filter-out sha1collisiondetection/%,$(filter-out
sha1dc/%,$(C_SOURCES)))

While it's only a single line, I don't think it's any easier on the
eyes.


Re: [PATCH v2] pack-objects: fix performance issues on packing large deltas

2018-07-23 Thread Duy Nguyen
On Mon, Jul 23, 2018 at 8:04 PM Junio C Hamano  wrote:
>
> Nguyễn Thái Ngọc Duy   writes:
>
> > Access to e->delta_size_ (and by extension
> > pack->delta_size[e - pack->objects]) is unprotected as before, the
> > thread scheduler in pack-objects must make sure "e" is never updated
> > by two different threads.
>
> OK.  Do we need to worry about "e" (e.g. "e->delta_size_valid")
> being accessed while/before it is set by another thread?

I left some details out because I think it's getting to a gray area.
We already do this

if (!DELTA(trg_entry)) {
max_size = trg_size/2 - the_hash_algo->rawsz;
ref_depth = 1;
} else {
max_size = DELTA_SIZE(trg_entry);
...
SET_DELTA(trg_entry, src_entry);
SET_DELTA_SIZE(trg_entry, delta_size);

if the bottom half is running on one thread and stopped in the middle
while the top half is running in another thread, we have a problem.
But perhaps max_size is not that big a deal because incorrect max_size
may produce a bad pack but can't corrupt it.

I will have to study the thread dispatch code more to have a better
answer, unfortunately.
--
Duy


Re: Hash algorithm analysis

2018-07-23 Thread Jonathan Nieder
Hi Yves,

demerphq wrote:
> On Sun, 22 Jul 2018 at 01:59, brian m. carlson
>  wrote:

>> I will admit that I don't love making this decision by myself, because
>> right now, whatever I pick, somebody is going to be unhappy.
[...]
> I do not envy you this decision.
>
> Personally I would aim towards pushing this decision out to the git
> user base and facilitating things so we can choose whatever hash
> function (and config) we wish, including ones not invented yet.

There are two separate pieces to this.

One is configurability at compile time.  So far that has definitely
been a goal, because we want to be ready to start the transition to
another hash, and quickly, as soon as the new hash is discovered to be
weak.  This also means that people can experiment with new hashes and
in a controlled environment (where the users can afford to build from
source), some users might prefer some bespoke hash for reasons only
known to them. ;-)

Another piece is configurability at run time.  This is a harder sell
because it has some negative effects in the ecosystem:

 - performance impact from users having to maintain a translation table
   between the different hash functions in use

 - security impact, in the form of downgrade attacks

 - dependency bloat, from Git having to be able to compute all hash
   functions permitted in that run-time configuration

The security impact can be mitigated by keeping the list of supported
hashes small (i.e. two or three instead of 10ish).  Each additional
hash function is a potential liability (just as in SSL), so they have
to earn their keep.

The performance impact is unavoidable if we encourage Git servers to
pick their favorite hash function instead of making a decision in the
project.  This can in turn affect security, since it would increase
the switching cost away from SHA-1, with the likely effect being that
most users stay on SHA-1.  I don't want to go there.

So I would say, support for arbitrary hash functions at compile time
and in file formats is important and I encourage you to hold us to
that (when reviewing patches, etc).  But in the standard Git build
configuration that most people run, I believe it is best to support
only SHA-1 + our chosen replacement hash.

Thanks,
Jonathan


Re: [PATCH 3/5] coccinelle: exclude sha1dc source files from static analysis

2018-07-23 Thread Eric Sunshine
On Mon, Jul 23, 2018 at 9:51 AM SZEDER Gábor  wrote:
> sha1dc is an external library, that we carry in-tree for convenience
> or grab as a submodule, so there is no use in applying our semantic
> patches to its source files.
>
> Therefore, exclude sha1dc's source files from Coccinelle's static
> analysis.
>
> Signed-off-by: SZEDER Gábor 
> ---
> diff --git a/Makefile b/Makefile
> @@ -2666,10 +2666,16 @@ check: command-list.h
> +ifdef DC_SHA1_SUBMODULE
> +COCCI_SOURCES = $(filter-out sha1collisiondetection/%,$(C_SOURCES))
> +else
> +COCCI_SOURCES = $(filter-out sha1dc/%,$(C_SOURCES))
> +endif

Can't you just filter out both of these unconditionally without
worrying about DC_SHA1_SUBMODULE?


Re: Hash algorithm analysis

2018-07-23 Thread Linus Torvalds
On Mon, Jul 23, 2018 at 5:48 AM Sitaram Chamarty  wrote:
>
> I would suggest (a) hash size of 256 bits and (b) choice of any hash
> function that can produce such a hash.  If people feel strongly that 256
> bits may also turn out to be too small (really?) then a choice of 256 or
> 512, but not arbitrary sizes.

Honestly, what's the expected point of 512-bit hashes?

The _only_ point of a 512-bit hash is that it's going to grow objects
in incompressible ways, and use more memory. Just don't do it.

If somebody can break a 256-bit hash, you have two choices:

 (a) the hash function itself was broken, and 512 bits isn't the
solution to it anyway, even if it can certainly hide the problem

 (b) you had some "new math" kind of unexpected breakthrough, which
means that 512 bits might not be much  better either.

Honestly, the number of particles in the observable universe is on the
order of 2**256. It's a really really big number.

Don't make the code base more complex than it needs to be. Make a
informed technical decision, and say "256 bits is a *lot*".

The difference between engineering and theory is that engineering
makes trade-offs. Good software is well *engineered*, not theorized.

Also, I would suggest that git default to "abbrev-commit=40", so that
nobody actually *sees* the new bits by default. So the perl scripts
etc that use "[0-9a-f]{40}" as a hash pattern would just silently
continue to work.

Because backwards compatibility is important (*)

 Linus

(*) And 2**160 is still a big big number, and hasn't really been a
practical problem, and SHA1DC is likely a good hash for the next
decade or longer.


Re: [PATCH 1/2] t3507: add a testcase showing failure with sparse checkout

2018-07-23 Thread Ben Peart




On 7/23/2018 2:09 PM, Eric Sunshine wrote:

On Mon, Jul 23, 2018 at 9:12 AM Ben Peart  wrote:

On 7/21/2018 3:21 AM, Eric Sunshine wrote:

On Sat, Jul 21, 2018 at 2:34 AM Elijah Newren  wrote:

+   rm .git/info/sparse-checkout


Should this cleanup be done by test_when_finished()?


I think trying to use test_when_finished() for this really degrades the
readability of the test.  See below:

test_expect_success 'failed cherry-pick with sparse-checkout' '
 pristine_detach initial &&
 test_config core.sparsecheckout true &&
 echo /unrelated >.git/info/sparse-checkout &&
 git read-tree --reset -u HEAD &&
 test_when_finished "echo \"/*\" >.git/info/sparse-checkout && git
read-tree --reset -u HEAD && rm .git/info/sparse-checkout" &&
 test_must_fail git cherry-pick -Xours picked>actual &&
 test_i18ngrep ! "Changes not staged for commit:" actual
'

Given it takes multiple commands, I'd prefer to keep the setup and
cleanup of the sparse checkout settings symmetrical.


Some observations:

The test_when_finished() ought to be called before the initial
git-read-tree, otherwise you risk leaving a .git/info/sparse-checkout
sitting around if git-read-tree fails.

The tear-down code could be moved to a function, in which case,
test_when_finished() would simply call that function.

Multi-line quoted strings are valid, so you don't need to string out
all the tear-down steps on a single line like that, and instead spread
them across multiple lines to improve readability.

test_when_finished() doesn't expect just a single quoted string as
argument. In fact, it can take many (unquoted) arguments, which also
allows you to spread the tear-down steps over multiple lines to
improve readability.

Multiple test_when_finished() invocations are allowed, so you could
spread out the tear-down commands that way (though they'd have to be
in reverse order, which would be bad for readability in this case,
thus not recommended).

Correctness ought to trump readability, not the other way around.

So, one possibility, which seems pretty readable to me:

 test_expect_failure 'failed cherry-pick with sparse-checkout' '
pristine_detach initial &&
test_config core.sparseCheckout true &&
test_when_finished "
echo \"/*\" >.git/info/sparse-checkout
git read-tree --reset -u HEAD
rm .git/info/sparse-checkout" &&
echo /unrelated >.git/info/sparse-checkout &&
git read-tree --reset -u HEAD &&
test_must_fail git cherry-pick -Xours picked>actual &&
test_i18ngrep ! "Changes not staged for commit:" actual &&
 '



Minus the trailing && on the last line, that works for me.  Thank you - 
readability and correctness.



Notice that I dropped the internal &&-chain in test_when_finish() to
ensure that the final 'rm' is invoked even if the cleanup
git-read-tree fails (though all bets are probably off, anyhow, if it
does fail).



Re: What's cooking in git.git (Jul 2018, #02; Wed, 18)

2018-07-23 Thread Jonathan Tan
> Here are the topics that have been cooking.  Commits prefixed with
> '-' are only in 'pu' (proposed updates) while commits prefixed with
> '+' are in 'next'.  The ones marked with '.' do not appear in any of
> the integration branches, but I am still holding onto them.

What do you think about my fixes to protocol v2 tag following [1]? There
was some discussion about correctness vs the drop in performance, but it
seems to me that there is some consensus that the drop in performance is
OK.

[1] https://public-inbox.org/git/cover.1528234587.git.jonathanta...@google.com/


Re: [PATCH] fetch-pack: mark die strings for translation

2018-07-23 Thread Stefan Beller
On Mon, Jul 23, 2018 at 10:56 AM Brandon Williams  wrote:
>

fetch-pack is listed as a plumbing command, which means its prime consumer
is supposedly a machine; But fetch-pack is also used by git-fetch that
is invoked
by a human, who prefers translations.

.. goes reads code...

This translates protocol v2 messages, and p0 messages are translated already,
so this just aligns the new protocol to the old protocol w.r.t. i18n.

Sounds good,

Thanks,
Stefan

[ This message is a gentle hint for better commit messages. ;-) ]

> Signed-off-by: Brandon Williams 
> ---
>  fetch-pack.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/fetch-pack.c b/fetch-pack.c
> index 0b4a9f288f..51abee6181 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -1245,13 +1245,13 @@ static int process_section_header(struct 
> packet_reader *reader,
> int ret;
>
> if (packet_reader_peek(reader) != PACKET_READ_NORMAL)
> -   die("error reading section header '%s'", section);
> +   die(_("error reading section header '%s'"), section);
>
> ret = !strcmp(reader->line, section);
>
> if (!peek) {
> if (!ret)
> -   die("expected '%s', received '%s'",
> +   die(_("expected '%s', received '%s'"),
> section, reader->line);
> packet_reader_read(reader);
> }
> @@ -1289,12 +1289,12 @@ static int process_acks(struct packet_reader *reader, 
> struct oidset *common)
> continue;
> }
>
> -   die("unexpected acknowledgment line: '%s'", reader->line);
> +   die(_("unexpected acknowledgment line: '%s'"), reader->line);
> }
>
> if (reader->status != PACKET_READ_FLUSH &&
> reader->status != PACKET_READ_DELIM)
> -   die("error processing acks: %d", reader->status);
> +   die(_("error processing acks: %d"), reader->status);
>
> /* return 0 if no common, 1 if there are common, or 2 if ready */
> return received_ready ? 2 : (received_ack ? 1 : 0);
> @@ -1331,7 +1331,7 @@ static void receive_shallow_info(struct fetch_pack_args 
> *args,
>
> if (reader->status != PACKET_READ_FLUSH &&
> reader->status != PACKET_READ_DELIM)
> -   die("error processing shallow info: %d", reader->status);
> +   die(_("error processing shallow info: %d"), reader->status);
>
> setup_alternate_shallow(&shallow_lock, &alternate_shallow_file, NULL);
> args->deepen = 1;
> @@ -1346,7 +1346,7 @@ static void receive_wanted_refs(struct packet_reader 
> *reader, struct ref *refs)
> struct ref *r = NULL;
>
> if (parse_oid_hex(reader->line, &oid, &end) || *end++ != ' ')
> -   die("expected wanted-ref, got '%s'", reader->line);
> +   die(_("expected wanted-ref, got '%s'"), reader->line);
>
> for (r = refs; r; r = r->next) {
> if (!strcmp(end, r->name)) {
> @@ -1356,11 +1356,11 @@ static void receive_wanted_refs(struct packet_reader 
> *reader, struct ref *refs)
> }
>
> if (!r)
> -   die("unexpected wanted-ref: '%s'", reader->line);
> +   die(_("unexpected wanted-ref: '%s'"), reader->line);
> }
>
> if (reader->status != PACKET_READ_DELIM)
> -   die("error processing wanted refs: %d", reader->status);
> +   die(_("error processing wanted refs: %d"), reader->status);
>  }
>
>  enum fetch_state {
> --
> 2.18.0.233.g985f88cf7e-goog
>


Re: [PATCH v6 8/8] fetch-pack: implement ref-in-want

2018-07-23 Thread Duy Nguyen
On Mon, Jul 23, 2018 at 7:53 PM Brandon Williams  wrote:
>
> On 07/22, Duy Nguyen wrote:
> > On Thu, Jun 28, 2018 at 12:33 AM Brandon Williams  wrote:
> > > +static void receive_wanted_refs(struct packet_reader *reader, struct ref 
> > > *refs)
> > > +{
> > > +   process_section_header(reader, "wanted-refs", 0);
> > > +   while (packet_reader_read(reader) == PACKET_READ_NORMAL) {
> > > +   struct object_id oid;
> > > +   const char *end;
> > > +   struct ref *r = NULL;
> > > +
> > > +   if (parse_oid_hex(reader->line, &oid, &end) || *end++ != 
> > > ' ')
> > > +   die("expected wanted-ref, got '%s'", 
> > > reader->line);
> >
> > Could you do a follow and wrap all these strings in _() since this one
> > is already in 'next'?
>
> What criteria is used to determine if something should be translated?
> To me, this looks like a wire-protocol error which would benefit from
> not being translated because it would be easier to grep for if it
> occurs.  That and if a user sees this sort of error I don't think that
> they could really do anything about it anyway.

Devs are users too and for me, even if I can read English just fine, I
prefer fully translated interface, not a mix of non-English and
English. Users can still google around and find out about wanted-ref
(at least linux users 10 years ago did). If they show up here asking
for support, we can ask them to translate back if needed (or look into
.po files). We have the same problem anyway if their bug reports
contain other non-English strings.

Besides drawing the line "benefit from (not) being translated" varies
from one developer to another. I think it's just easier and more
consistent to stick to "if it's not machine-readable (or really meant
for devs, like BUG()), translate it" and leave it to translators to
decide.
-- 
Duy


Re: [PATCH 1/2] t3507: add a testcase showing failure with sparse checkout

2018-07-23 Thread Junio C Hamano
Ben Peart  writes:

> On 7/21/2018 2:34 AM, Elijah Newren wrote:
>> From: Ben Peart 
>>
>> Signed-off-by: Elijah Newren 
>> ---
>> Testcase provided by Ben, so committing with him as the author.  Just need
>> a sign off from him.
>
> Thanks Elijah, consider it
>
> Signed-off-by: Ben Peart 
>

Thanks; I'll also tweak the in-body From: line while applying to
match the Sign-off.



Re: [PATCH 1/2] t3507: add a testcase showing failure with sparse checkout

2018-07-23 Thread Eric Sunshine
On Mon, Jul 23, 2018 at 9:12 AM Ben Peart  wrote:
> On 7/21/2018 3:21 AM, Eric Sunshine wrote:
> > On Sat, Jul 21, 2018 at 2:34 AM Elijah Newren  wrote:
> >> +   rm .git/info/sparse-checkout
> >
> > Should this cleanup be done by test_when_finished()?
>
> I think trying to use test_when_finished() for this really degrades the
> readability of the test.  See below:
>
> test_expect_success 'failed cherry-pick with sparse-checkout' '
> pristine_detach initial &&
> test_config core.sparsecheckout true &&
> echo /unrelated >.git/info/sparse-checkout &&
> git read-tree --reset -u HEAD &&
> test_when_finished "echo \"/*\" >.git/info/sparse-checkout && git
> read-tree --reset -u HEAD && rm .git/info/sparse-checkout" &&
> test_must_fail git cherry-pick -Xours picked>actual &&
> test_i18ngrep ! "Changes not staged for commit:" actual
> '
>
> Given it takes multiple commands, I'd prefer to keep the setup and
> cleanup of the sparse checkout settings symmetrical.

Some observations:

The test_when_finished() ought to be called before the initial
git-read-tree, otherwise you risk leaving a .git/info/sparse-checkout
sitting around if git-read-tree fails.

The tear-down code could be moved to a function, in which case,
test_when_finished() would simply call that function.

Multi-line quoted strings are valid, so you don't need to string out
all the tear-down steps on a single line like that, and instead spread
them across multiple lines to improve readability.

test_when_finished() doesn't expect just a single quoted string as
argument. In fact, it can take many (unquoted) arguments, which also
allows you to spread the tear-down steps over multiple lines to
improve readability.

Multiple test_when_finished() invocations are allowed, so you could
spread out the tear-down commands that way (though they'd have to be
in reverse order, which would be bad for readability in this case,
thus not recommended).

Correctness ought to trump readability, not the other way around.

So, one possibility, which seems pretty readable to me:

test_expect_failure 'failed cherry-pick with sparse-checkout' '
   pristine_detach initial &&
   test_config core.sparseCheckout true &&
   test_when_finished "
   echo \"/*\" >.git/info/sparse-checkout
   git read-tree --reset -u HEAD
   rm .git/info/sparse-checkout" &&
   echo /unrelated >.git/info/sparse-checkout &&
   git read-tree --reset -u HEAD &&
   test_must_fail git cherry-pick -Xours picked>actual &&
   test_i18ngrep ! "Changes not staged for commit:" actual &&
'

Notice that I dropped the internal &&-chain in test_when_finish() to
ensure that the final 'rm' is invoked even if the cleanup
git-read-tree fails (though all bets are probably off, anyhow, if it
does fail).


Re: [PATCH v2] pack-objects: fix performance issues on packing large deltas

2018-07-23 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> Access to e->delta_size_ (and by extension
> pack->delta_size[e - pack->objects]) is unprotected as before, the
> thread scheduler in pack-objects must make sure "e" is never updated
> by two different threads.

OK.  Do we need to worry about "e" (e.g. "e->delta_size_valid")
being accessed while/before it is set by another thread?

oe_delta_size() makes unprotected accesses to .delta_size_ and
pack->delta_size[e - pack->objects], so we apparently do not, and
oe_set_delta_size() only protects the allocation call and does not
prevent a reader in oe_delta_size() from first reading the _valid
field, noticing that it is 0 as initialized, and goes on to read the
pack->delta_size[] slot for the entry, while the writer is setting
the size to .delta_size_ field and flipping _valid bit, without ever
storing the size in the pack->delta_size[] array.

> @@ -130,6 +131,7 @@ struct packing_data {
>   uint32_t index_size;
>  
>   unsigned int *in_pack_pos;
> + unsigned long *delta_size;
>  
>   /*
>* Only one of these can be non-NULL and they have different
> @@ -140,10 +142,29 @@ struct packing_data {
>   struct packed_git **in_pack_by_idx;
>   struct packed_git **in_pack;
>  
> +#ifndef NO_PTHREADS
> + pthread_mutex_t lock;

I am wondering if we want the variable to say what data it is
protecting from simultaneous accesses, or leave it as generic so
that any new caller that wants to lock any (new) thing that is
associated with a packing_data structure can grab it for other
purposes.  The design of this patch clearly is the latter, which is
OK for now, I think.

> @@ -332,18 +353,34 @@ static inline unsigned long oe_delta_size(struct 
> packing_data *pack,
>  {
>   if (e->delta_size_valid)
>   return e->delta_size_;
> - return oe_size(pack, e);
> +
> + /*
> +  * pack->detla_size[] can't be NULL because oe_set_delta_size()
> +  * must have been called when a new delta is saved with
> +  * oe_set_delta().
> +  * If oe_delta() returns NULL (i.e. default state, which means
> +  * delta_size_valid is also false), then the caller must never
> +  * call oe_delta_size().
> +  */
> + return pack->delta_size[e - pack->objects];
>  }
>  
>  static inline void oe_set_delta_size(struct packing_data *pack,
>struct object_entry *e,
>unsigned long size)
>  {
> - e->delta_size_ = size;
> - e->delta_size_valid = e->delta_size_ == size;
> - if (!e->delta_size_valid && size != oe_size(pack, e))
> - BUG("this can only happen in check_object() "
> - "where delta size is the same as entry size");
> + if (size < pack->oe_delta_size_limit) {
> + e->delta_size_ = size;
> + e->delta_size_valid = 1;
> + } else {
> + packing_data_lock(pack);
> + if (!pack->delta_size)
> + ALLOC_ARRAY(pack->delta_size, pack->nr_alloc);
> + packing_data_unlock(pack);
> +
> + pack->delta_size[e - pack->objects] = size;
> + e->delta_size_valid = 0;
> + }
>  }


Re: Hash algorithm analysis

2018-07-23 Thread Stefan Beller
On Mon, Jul 23, 2018 at 5:41 AM demerphq  wrote:
>
> On Sun, 22 Jul 2018 at 01:59, brian m. carlson
>  wrote:
> > I will admit that I don't love making this decision by myself, because
> > right now, whatever I pick, somebody is going to be unhappy.  I want to
> > state, unambiguously, that I'm trying to make a decision that is in the
> > interests of the Git Project, the community, and our users.
> >
> > I'm happy to wait a few more days to see if a consensus develops; if so,
> > I'll follow it.  If we haven't come to one by, say, Wednesday, I'll make
> > a decision and write my patches accordingly.  The community is free, as
> > always, to reject my patches if taking them is not in the interest of
> > the project.
>
> Hi Brian.
>
> I do not envy you this decision.
>
> Personally I would aim towards pushing this decision out to the git
> user base and facilitating things so we can choose whatever hash
> function (and config) we wish, including ones not invented yet.

By Git user base you actually mean millions of people?
(And they'll have different opinions and needs)

One of the goals of the hash transition is to pick a hash function
such that git repositories are compatible.
If users were to pick their own hashes, we would need to not just give
a SHA-1 ->  transition plan, but we'd have to make sure the
full matrix of possible hashes is interchangeable as we have no idea
of what the user would think of "safer". For example one server operator
might decide to settle on SHA2 and another would settle on blake2,
whereas a user that uses both servers as remotes settles with k12.

Then there would be a whole lot of conversion going on (you cannot talk
natively to a remote with a different hash; checking pgp signatures is
also harder as you have an abstraction layer in between).

I would rather just have the discussion now and then provide only one
conversion tool which might be easy to adapt, but after the majority
converted, it is rather left to bitrot instead of needing to support ongoing
conversions.

On the other hand, even if we'd provide a "different hashes are fine"
solution, I would think the network effect would make sure that eventually
most people end up with one hash.

One example of using different hashes successfully are transports,
like TLS, SSH. The difference there is that it is a point-to-point communication
whereas a git repository needs to be read by many parties involved; also
a communication over TLS/SSH is ephemeral unlike objects in Git.

> Failing that I would aim towards a hashing strategy which has the most
> flexibility. Keccak for instance has the interesting property that its
> security level is tunable, and that it can produce aribitrarily long
> hashes.  Leaving aside other concerns raised elsewhere in this thread,
> these two features alone seem to make it a superior choice for an
> initial implementation. You can find bugs by selecting unusual hash
> sizes, including very long ones, and you can provide ways to tune the
> function to peoples security and speed preferences.  Someone really
> paranoid can specify an unusually large round count and a very long
> hash.

I do not object to this in theory, but I would rather not want to burden
the need to write code for this on the community.

> I am not a cryptographer.

Same here. My personal preference would be blake2b
as that is the fastest IIRC.

Re-reading brians initial mail, I think we should settle on
SHA-256, as that is a conservative choice for security and
the winner in HW accelerated setups, and not to shabby in
a software implementation; it is also widely available.

Stefan


[PATCH] fetch-pack: mark die strings for translation

2018-07-23 Thread Brandon Williams
Signed-off-by: Brandon Williams 
---
 fetch-pack.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index 0b4a9f288f..51abee6181 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1245,13 +1245,13 @@ static int process_section_header(struct packet_reader 
*reader,
int ret;
 
if (packet_reader_peek(reader) != PACKET_READ_NORMAL)
-   die("error reading section header '%s'", section);
+   die(_("error reading section header '%s'"), section);
 
ret = !strcmp(reader->line, section);
 
if (!peek) {
if (!ret)
-   die("expected '%s', received '%s'",
+   die(_("expected '%s', received '%s'"),
section, reader->line);
packet_reader_read(reader);
}
@@ -1289,12 +1289,12 @@ static int process_acks(struct packet_reader *reader, 
struct oidset *common)
continue;
}
 
-   die("unexpected acknowledgment line: '%s'", reader->line);
+   die(_("unexpected acknowledgment line: '%s'"), reader->line);
}
 
if (reader->status != PACKET_READ_FLUSH &&
reader->status != PACKET_READ_DELIM)
-   die("error processing acks: %d", reader->status);
+   die(_("error processing acks: %d"), reader->status);
 
/* return 0 if no common, 1 if there are common, or 2 if ready */
return received_ready ? 2 : (received_ack ? 1 : 0);
@@ -1331,7 +1331,7 @@ static void receive_shallow_info(struct fetch_pack_args 
*args,
 
if (reader->status != PACKET_READ_FLUSH &&
reader->status != PACKET_READ_DELIM)
-   die("error processing shallow info: %d", reader->status);
+   die(_("error processing shallow info: %d"), reader->status);
 
setup_alternate_shallow(&shallow_lock, &alternate_shallow_file, NULL);
args->deepen = 1;
@@ -1346,7 +1346,7 @@ static void receive_wanted_refs(struct packet_reader 
*reader, struct ref *refs)
struct ref *r = NULL;
 
if (parse_oid_hex(reader->line, &oid, &end) || *end++ != ' ')
-   die("expected wanted-ref, got '%s'", reader->line);
+   die(_("expected wanted-ref, got '%s'"), reader->line);
 
for (r = refs; r; r = r->next) {
if (!strcmp(end, r->name)) {
@@ -1356,11 +1356,11 @@ static void receive_wanted_refs(struct packet_reader 
*reader, struct ref *refs)
}
 
if (!r)
-   die("unexpected wanted-ref: '%s'", reader->line);
+   die(_("unexpected wanted-ref: '%s'"), reader->line);
}
 
if (reader->status != PACKET_READ_DELIM)
-   die("error processing wanted refs: %d", reader->status);
+   die(_("error processing wanted refs: %d"), reader->status);
 }
 
 enum fetch_state {
-- 
2.18.0.233.g985f88cf7e-goog



Re: [PATCH v6 8/8] fetch-pack: implement ref-in-want

2018-07-23 Thread Brandon Williams
On 07/22, Duy Nguyen wrote:
> On Thu, Jun 28, 2018 at 12:33 AM Brandon Williams  wrote:
> > +static void receive_wanted_refs(struct packet_reader *reader, struct ref 
> > *refs)
> > +{
> > +   process_section_header(reader, "wanted-refs", 0);
> > +   while (packet_reader_read(reader) == PACKET_READ_NORMAL) {
> > +   struct object_id oid;
> > +   const char *end;
> > +   struct ref *r = NULL;
> > +
> > +   if (parse_oid_hex(reader->line, &oid, &end) || *end++ != ' 
> > ')
> > +   die("expected wanted-ref, got '%s'", reader->line);
> 
> Could you do a follow and wrap all these strings in _() since this one
> is already in 'next'?

What criteria is used to determine if something should be translated?
To me, this looks like a wire-protocol error which would benefit from
not being translated because it would be easier to grep for if it
occurs.  That and if a user sees this sort of error I don't think that
they could really do anything about it anyway.

Of course it appears as if all other 'die' calls in fetch-pack have been
marked for translation so I guess my though process doesn't hold :)

-- 
Brandon Williams


[PATCH] pack-protocol: mention and point to docs for protocol v2

2018-07-23 Thread Brandon Williams
Signed-off-by: Brandon Williams 
---
 Documentation/technical/pack-protocol.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/technical/pack-protocol.txt 
b/Documentation/technical/pack-protocol.txt
index 7fee6b780a..25acd9edb1 100644
--- a/Documentation/technical/pack-protocol.txt
+++ b/Documentation/technical/pack-protocol.txt
@@ -50,7 +50,8 @@ Each Extra Parameter takes the form of `=` or 
``.
 
 Servers that receive any such Extra Parameters MUST ignore all
 unrecognized keys. Currently, the only Extra Parameter recognized is
-"version=1".
+"version" with a vlue of '1' or '2'.  See protocol-v2.txt for more
+information on protocol version 2.
 
 Git Transport
 -
-- 
2.18.0.233.g985f88cf7e-goog



Re: [PATCH 2/9] vscode: hard-code a couple defines

2018-07-23 Thread Jonathan Nieder
Hi,

Johannes Schindelin via GitGitGadget wrote:

> From: Johannes Schindelin 
>
> Sadly, we do not get all of the definitions via ALL_CFLAGS. Some defines
> are passed to GCC *only* when compiling specific files, such as git.o.
>
> Let's just hard-code them into the script for the time being.

Could we move these to ALL_CFLAGS?  Is there any downside other than
increased rebuilds when options change?

Thanks,
Jonathan


Re: [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc)

2018-07-23 Thread Duy Nguyen
On Mon, Jul 23, 2018 at 5:50 PM Ben Peart  wrote:
> > Anyway, on to the actual discussion:
> >
> >> Here is a checkout command with tracing turned on to demonstrate where the
> >> time is spent.  Note, this is somewhat of a �best case� as I�m simply
> >> checking out the current commit:
> >>
> >> benpeart@gvfs-perf MINGW64 /f/os/src (official/rs_es_debug_dev)
> >> $ /usr/src/git/git.exe checkout
> >> 12:31:50.419016 read-cache.c:2006   performance: 1.180966800 s: read 
> >> cache .git/index
> >> 12:31:51.184636 name-hash.c:605 performance: 0.664575200 s: 
> >> initialize name hash
> >> 12:31:51.200280 preload-index.c:111 performance: 0.019811600 s: 
> >> preload index
> >> 12:31:51.294012 read-cache.c:1543   performance: 0.094515600 s: 
> >> refresh index
> >> 12:32:29.731344 unpack-trees.c:1358 performance: 33.889840200 s: 
> >> traverse_trees
> >> 12:32:37.512555 read-cache.c:2541   performance: 1.564438300 s: write 
> >> index, changed mask = 28
> >> 12:32:44.918730 unpack-trees.c:1358 performance: 7.243155600 s: 
> >> traverse_trees
> >> 12:32:44.965611 diff-lib.c:527  performance: 7.374729200 s: 
> >> diff-index
> >> Waiting for GVFS to parse index and update placeholder files...Succeeded
> >> 12:32:46.824986 trace.c:420 performance: 57.715656000 s: git 
> >> command: 'C:\git-sdk-64\usr\src\git\git.exe' checkout
> >
> > What's the current state of the index before this checkout?
>
> This was after running "git checkout" multiple times so there was really
> nothing for git to do.

Hmm.. this means cache-tree is fully valid, unless you have changes in
index. We're quite aggressive in repairing cache-tree since aecf567cbf
(cache-tree: create/update cache-tree on checkout - 2014-07-05). If we
have very good cache-tree records and still spend 33s on
traverse_trees, maybe there's something else.

> >> ODB cache
> >> =
> >> Since traverse_trees() hits the ODB for each tree object (of which there 
> >> are
> >> over 500K in this repo) I wrote and tested having an in-memory ODB cache
> >> that cached all tree objects.  This resulted in a > 50% hit ratio (largely
> >> due to the fact we traverse the tree twice during checkout) but resulted in
> >> only a minimal savings (1.3 seconds).
> >
> > In my experience, one major cost of object access is decompression, both
> > delta and zlib. Trees in particular tend to delta very well across
> > versions. We have a cache to try to reuse intermediate delta results,
> > but the default size is probably woefully undersized for your repository
> > (I know from past tests it's undersized a bit even for the linux
> > kernel).
> >
> > Try bumping core.deltaBaseCacheLimit to see if that has any impact. It's
> > 96MB by default.
> >
> > There may also be some possible work in making it more aggressive about
> > storing the intermediate results. I seem to recall from past
> > explorations that it doesn't keep everything, and I don't know if its
> > heuristics have ever been proven sane.

Could we be a bit more flexible about cache size? Say if we know
there's 8 GB memory still available, we should be able to use like 1
GB at least (and that's done automatically without tinkering with
config).
-- 
Duy


Re: t7406-submodule-update shaky ?

2018-07-23 Thread Stefan Beller
On Sun, Jul 22, 2018 at 2:05 PM Torsten Bögershausen  wrote:
>
> It seems that t7406 is sometimes shaky - when checking stderr in test
> case 4.
>
> The order of the submodules may vary, sorting the stderr output makes it
>
> more reliable (and somewhat funny to read).
>
> Does anybody have a better idea ?

DScho just posted a fix for this, see:
https://public-inbox.org/git/pull.12.git.gitgitgad...@gmail.com/


Re: [PATCH 1/9] contrib: add a script to initialize VS Code configuration

2018-07-23 Thread Jonathan Nieder
Hi,

Thanks for working on this.

Johannes Schindelin via GitGitGadget wrote:

> From: Johannes Schindelin 
>
> VS Code is a lightweight but powerful source code editor which runs on
> your desktop and is available for Windows, macOS and Linux. Among other
> languages, it has support for C/C++ via an extension.

This doesn't really seem relevant to the change.  The relevant bits
are that (1) VS Code is a popular source code editor, and that (2)
it's one worth recommending to newcomers.

> To start developing Git with VS Code, simply run the Unix shell script
> contrib/vscode/init.sh, which creates the configuration files in
> .vscode/ that VS Code consumes.
>
> Signed-off-by: Johannes Schindelin 

This doesn't tell me what the patch will actually do.

VSCode is an editor.  Is the idea that this will help configure the
editor to do the right thing with whitespace or something?  Or does it
configure IDE features to build git?

Thanks,
Jonathan


Re: [PATCH 00/14] format-patch: add --interdiff and --range-diff options

2018-07-23 Thread Duy Nguyen
On Sun, Jul 22, 2018 at 11:57 AM Eric Sunshine  wrote:
>
> When re-submitting a patch series, it is often helpful (for reviewers)
> to include an interdiff or range-diff against the previous version.
> Doing so requires manually running git-diff or git-range-diff and
> copy/pasting the result into the cover letter of the new version.
>
> This series automates the process by introducing git-format-patch
> options --interdiff and --range-diff which insert such a diff into the
> cover-letter or into the commentary section of the lone patch of a
> 1-patch series. In the latter case, the interdiff or range-diff is
> indented to avoid confusing git-am and human readers.

I gave up after 10/14. But what I've seen is nice (yes I have a couple
comments here and there but you probably won't need to update
anything).
-- 
Duy


Re: [PATCH 0/5] Misc Coccinelle-related improvements

2018-07-23 Thread René Scharfe
Am 23.07.2018 um 15:50 schrieb SZEDER Gábor:
> Just a couple of minor Coccinelle-related improvements:
> 
>- The first two patches are small cleanups.
> 
>- The last three could make life perhaps just a tad bit easier for
>  devs running 'make coccicheck'.
> 
> 
> SZEDER Gábor (5):
>coccinelle: mark the 'coccicheck' make target as .PHONY
>coccinelle: use $(addsuffix) in 'coccicheck' make target
>coccinelle: exclude sha1dc source files from static analysis
>coccinelle: put sane filenames into output patches
>coccinelle: extract dedicated make target to clean Coccinelle's
>  results

These patches look good to me.  Thanks a lot!

René


Re: [PATCH 10/14] format-patch: add --range-diff option to embed diff in cover letter

2018-07-23 Thread Duy Nguyen
On Sun, Jul 22, 2018 at 11:58 AM Eric Sunshine  wrote:
> diff --git a/Documentation/git-format-patch.txt 
> b/Documentation/git-format-patch.txt
> index f8a061794d..e7f404be3d 100644
> --- a/Documentation/git-format-patch.txt
> +++ b/Documentation/git-format-patch.txt
> @@ -24,6 +24,7 @@ SYNOPSIS
>[--to=] [--cc=]
>[--[no-]cover-letter] [--quiet] [--notes[=]]
>[--interdiff=]
> +  [--range-diff=]

I wonder if people will use both --interdiff= and
--range-diff= often enough to justify a shortcut
"--all-kinds-of-diff=" so that we don't have to type 
twice. But I guess we don't have to worry about this right now.
-- 
Duy


Re: [PATCH 06/14] format-patch: allow --interdiff to apply to a lone-patch

2018-07-23 Thread Duy Nguyen
On Sun, Jul 22, 2018 at 11:58 AM Eric Sunshine  wrote:
> diff --git a/log-tree.c b/log-tree.c
> index 9d38f1cf79..56513fa83d 100644
> --- a/log-tree.c
> +++ b/log-tree.c
> @@ -14,6 +14,7 @@
>  #include "sequencer.h"
>  #include "line-log.h"
>  #include "help.h"
> +#include "interdiff.h"
>
>  static struct decoration name_decoration = { "object names" };
>  static int decoration_loaded;
> @@ -736,6 +737,19 @@ void show_log(struct rev_info *opt)
>
> strbuf_release(&msgbuf);
> free(ctx.notes_message);
> +
> +   if (cmit_fmt_is_mail(ctx.fmt) && opt->idiff_oid1) {

OK putting idiff stuff in rev_info is probably the right choice. But
we all three fields prefixed with idiff_, maybe you could just add a
new struct "idiff_options" to contain them and add a pointer to that
struct in rev_info. Then "opt->idiff" is enough to know if idiff is
requested instead of relying on idiff_oid1 (seems too random).

> +   struct diff_queue_struct dq;
> +
> +   memcpy(&dq, &diff_queued_diff, sizeof(diff_queued_diff));
> +   DIFF_QUEUE_CLEAR(&diff_queued_diff);
> +
> +   next_commentary_block(opt, NULL);
> +   fprintf_ln(opt->diffopt.file, "%s", opt->idiff_title);
> +   show_interdiff(opt, 2);
> +
> +   memcpy(&diff_queued_diff, &dq, sizeof(diff_queued_diff));
> +   }
>  }
-- 
Duy


Re: [PATCH 03/14] format-patch: teach --interdiff to respect -v/--reroll-count

2018-07-23 Thread Duy Nguyen
On Sun, Jul 22, 2018 at 11:57 AM Eric Sunshine  wrote:
>
> The --interdiff option introduces the embedded interdiff generically as
> "Interdiff:", however, we can do better when --reroll-count is specified

Oh boy. --reroll-count was added in 2012 and here I am typing
--subject-prefix='PATCH vX' everyday, thinking that somebody should
really do something about it. I've learned --reroll-count today!

> diff --git a/revision.h b/revision.h
> index 61931fbac5..ffeadc261a 100644
> --- a/revision.h
> +++ b/revision.h
> @@ -215,6 +215,7 @@ struct rev_info {
> /* interdiff */
> const struct object_id *idiff_oid1;
> const struct object_id *idiff_oid2;
> +   const char *idiff_title;

I feel we're abusing struct rev_info a bit for this since this
interdiff thing is very builtin/log.c's business and not at all
related to rev walk. Is it possible (and easy) to just pass
idfff_title from cmd_format_patch to make_cover_letter()? If it's a
lot of code, then I guess we can just leave it here.
-- 
Duy


Re: [PATCH 02/14] format-patch: add --interdiff option to embed diff in cover letter

2018-07-23 Thread Duy Nguyen
On Sun, Jul 22, 2018 at 11:57 AM Eric Sunshine  wrote:

> diff --git a/interdiff.c b/interdiff.c
> new file mode 100644
> index 00..d0fac10c7c
> --- /dev/null
> +++ b/interdiff.c
> @@ -0,0 +1,17 @@
> +#include "cache.h"
> +#include "commit.h"
> +#include "revision.h"
> +#include "interdiff.h"
> +
> +void show_interdiff(struct rev_info *rev)
> +{
> +   struct diff_options opts;
> +
> +   memcpy(&opts, &rev->diffopt, sizeof(opts));
> +   opts.output_format = DIFF_FORMAT_PATCH;
> +   diff_setup_done(&opts);
> +
> +   diff_tree_oid(rev->idiff_oid1, rev->idiff_oid2, "", &opts);
> +   diffcore_std(&opts);
> +   diff_flush(&opts);
> +}

Is it worth adding a new file just for a single function? I haven't
read the rest of the series, but the cover letter's diffstat suggests
this is it. Is interdiff intended to become a lot more complicated in
the future? If not maybe just add this function in diff-lib.c
-- 
Duy


Re: [PATCH] pack-objects: fix performance issues on packing large deltas

2018-07-23 Thread Duy Nguyen
On Mon, Jul 23, 2018 at 2:34 PM Elijah Newren  wrote:
> This patch provides a stop-gap improvement for maint that increases the
> delta size back up to 32 bits (still an improvement over the 64 bit size
> of the original).

The "still an improvement" is actually not true. Due to padding and
stuff, struct object_entry's size is increased by 8 bytes, even though
you don't use all of it. It's possible to maybe shuffle fields around
a bit to keep this struct smaller. But it's not really worth it since
this is temporary anyway.

> I think Duy's comment that I'm responding to suggests he Acks this
> patch for maint, but there is that little 'if' he put in his statement.

Yep. Ack'd by me. It's up to Junio to decide if it's worth doing this
or not, of course.
-- 
Duy


Re: [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc)

2018-07-23 Thread Ben Peart




On 7/18/2018 5:34 PM, Jeff King wrote:

On Wed, Jul 18, 2018 at 08:45:14PM +, Ben Peart wrote:


When working directories get big, checkout times start to suffer.  Even with
GVFS virtualization (which limits git to only having to update those files
that have been changed locally) we�re seeing P50 times for checkout of 31
seconds and the P80 time is 43 seconds.


Funny aside: all of your apostrophes look like the unicode question
mark. Looking at raw bytes of your mail, they're actually u+fffd
(unicode "replacement character"). Your headers correctly claim to be
utf8. So presumably they got munged by whatever converted to unicode and
didn't have the original character in its translation table. I wonder if
this was send-email (so really perl's encode module), or if your smtp
server tried to do an on-the-fly conversion (I know many servers will
switch the content-transfer-encoding, but I haven't seen a charset
conversion before).



This was my bad.  I wrote the email in Word so I could get spell 
checking and it has this 'feature' where it converts all straight quotes 
to "smart quotes."  I just forgot to search/replace them back to 
straight quotes before sending the mail.



Anyway, on to the actual discussion:


Here is a checkout command with tracing turned on to demonstrate where the
time is spent.  Note, this is somewhat of a �best case� as I�m simply
checking out the current commit:

benpeart@gvfs-perf MINGW64 /f/os/src (official/rs_es_debug_dev)
$ /usr/src/git/git.exe checkout
12:31:50.419016 read-cache.c:2006   performance: 1.180966800 s: read cache 
.git/index
12:31:51.184636 name-hash.c:605 performance: 0.664575200 s: initialize 
name hash
12:31:51.200280 preload-index.c:111 performance: 0.019811600 s: preload 
index
12:31:51.294012 read-cache.c:1543   performance: 0.094515600 s: refresh 
index
12:32:29.731344 unpack-trees.c:1358 performance: 33.889840200 s: 
traverse_trees
12:32:37.512555 read-cache.c:2541   performance: 1.564438300 s: write 
index, changed mask = 28
12:32:44.918730 unpack-trees.c:1358 performance: 7.243155600 s: 
traverse_trees
12:32:44.965611 diff-lib.c:527  performance: 7.374729200 s: diff-index
Waiting for GVFS to parse index and update placeholder files...Succeeded
12:32:46.824986 trace.c:420 performance: 57.715656000 s: git 
command: 'C:\git-sdk-64\usr\src\git\git.exe' checkout


What's the current state of the index before this checkout? 


This was after running "git checkout" multiple times so there was really 
nothing for git to do.



I don't
recall offhand how aggressively we prune the tree walk based on the diff
between the index and the tree we're loading. If we're starting from > scratch, 
then obviously we do have to walk the whole thing. But in most
cases we should be able to avoid walking into sub-trees where the index
has a matching cache_tree record.

If we're not doing that, it seems like that's going to be the big
obvious win, because it reduces the number of trees we have to consider
in the first place.



I agree this could be a big win.  Especially in large trees, the 
percentage of the tree that changes between two commits is often quite 
small.  Saving 100% of that is a much bigger win than actually doing all 
that work even in parallel. Today, we aren't aggressive at all and do no 
pruning.


This brings up a concern I have with this approach altogether. In an 
earlier patch series, I tried to optimize the "git checkout -b" code 
path to not update every file in the working directory but only to 
create the new branch and switch to it.  The feedback to that patch was 
that people rely on the current behavior of rewriting every file so the 
patch was rejected.  This earlier attempt/failure to optimize checkout 
makes me worried that _any_ effort to prune the tree will be rejected 
for the same reason.


I'd be interested in how we can prune the tree and only do the work 
required without breaking the implied behavior of the current 
implementation. Would it be acceptable to have two code paths 1) the old 
one for back compat that updates every file whether there are changes or 
not and 2) a new/optimized one that only does the minimum work required? 
 Then we could put which code path executes by default behind by a new 
config setting that allows people to opt-in to the new/faster behavior.


Any other ideas or suggestions that don't require coming up with new git 
commands (ie "git fast-checkout") and retraining existing git users?



ODB cache
=
Since traverse_trees() hits the ODB for each tree object (of which there are
over 500K in this repo) I wrote and tested having an in-memory ODB cache
that cached all tree objects.  This resulted in a > 50% hit ratio (largely
due to the fact we traverse the tree twice during checkout) but resulted in
only a minimal savings (1.3 seconds).


In my experience, one major cost of object access is decompression, both
delta and zlib. Trees in particu

Re: [PATCH 0/5] Misc Coccinelle-related improvements

2018-07-23 Thread Duy Nguyen
On Mon, Jul 23, 2018 at 3:51 PM SZEDER Gábor  wrote:
>
> Just a couple of minor Coccinelle-related improvements:
>
>   - The first two patches are small cleanups.
>
>   - The last three could make life perhaps just a tad bit easier for
> devs running 'make coccicheck'.

I'm not a heavy cocci user and probably won't spot any corner case
problems. With that in mind, I've read this though and the series
looks good to me.
-- 
Duy


Re: [PATCH 0/5] Misc Coccinelle-related improvements

2018-07-23 Thread Derrick Stolee

On 7/23/2018 9:50 AM, SZEDER Gábor wrote:

Just a couple of minor Coccinelle-related improvements:

   - The first two patches are small cleanups.

   - The last three could make life perhaps just a tad bit easier for
 devs running 'make coccicheck'.


I appreciate your focus on making 'make coccicheck' easier to use!

Reviewed-by: Derrick Stolee 



Re: [PATCH 1/5] coccinelle: mark the 'coccicheck' make target as .PHONY

2018-07-23 Thread Derrick Stolee

On 7/23/2018 9:50 AM, SZEDER Gábor wrote:

The 'coccicheck' target doesn't create a file with the same name, so
mark it as .PHONY.

Signed-off-by: SZEDER Gábor 
---
  Makefile | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/Makefile b/Makefile
index e4b503d259..27bfc196dd 100644
--- a/Makefile
+++ b/Makefile
@@ -2685,6 +2685,8 @@ C_SOURCES = $(patsubst %.o,%.c,$(C_OBJ))
fi
  coccicheck: $(patsubst %.cocci,%.cocci.patch,$(wildcard 
contrib/coccinelle/*.cocci))
  
+.PHONY: coccicheck

+
  ### Installation rules
  
  ifneq ($(filter /%,$(firstword $(template_dir))),)


I did not know about phony targets [1] so thanks for teaching me 
something today.


[1] https://www.gnu.org/software/make/manual/html_node/Phony-Targets.html



Re: [PATCH 4/5] coccinelle: put sane filenames into output patches

2018-07-23 Thread Derrick Stolee

On 7/23/2018 9:50 AM, SZEDER Gábor wrote:

Coccinelle outputs its suggested transformations as patches, whose
header looks something like this:

   --- commit.c
   +++ /tmp/cocci-output-19250-7ae78a-commit.c

Note the lack of 'diff --opts  ' line, the differing number
of path components on the --- and +++ lines, and the nonsensical
filename on the +++ line.  'patch -p0' can still apply these patches,
as it takes the filename to be modified from the --- line.  Alas, 'git
apply' can't, because it takes the filename from the +++ line, and
then complains about the nonexisting file.

Pass the '--patch .' options to Coccinelle via the SPATCH_FLAGS 'make'
variable, as it seems to make it generate proper context diff patches,
with the header starting with a 'diff ...' line and containing sane
filenames.  The resulting 'contrib/coccinelle/*.cocci.patch' files
then can be applied both with 'git apply' and 'patch' (even without
'-p0').

Signed-off-by: SZEDER Gábor 
---
  Makefile | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 73e2d16926..72ea29df4e 100644
--- a/Makefile
+++ b/Makefile
@@ -564,7 +564,7 @@ SPATCH = spatch
  export TCL_PATH TCLTK_PATH
  
  SPARSE_FLAGS =

-SPATCH_FLAGS = --all-includes
+SPATCH_FLAGS = --all-includes --patch .


Thank you for this! These garbage filenames cause significant pain when 
trying to apply the suggested patch.




Re: [PATCH 2/2] merge-recursive: preserve skip_worktree bit when necessary

2018-07-23 Thread Ben Peart






  merge-recursive.c   | 16 
  t/t3507-cherry-pick-conflict.sh |  2 +-
  2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 113c1d696..fd74bca17 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -3069,10 +3069,26 @@ static int merge_content(struct merge_options *o,
if (mfi.clean &&
was_tracked_and_matches(o, path, &mfi.oid, mfi.mode) &&
!df_conflict_remains) {
+   int pos;
+   struct cache_entry *ce;
+
output(o, 3, _("Skipped %s (merged same as existing)"), path);
if (add_cacheinfo(o, mfi.mode, &mfi.oid, path,
  0, (!o->call_depth && !is_dirty), 0))
return -1;
+   /*
+* However, add_cacheinfo() will delete the old cache entry
+* and add a new one.  We need to copy over any skip_worktree
+* flag to avoid making the file appear as if it were
+* deleted by the user.
+*/


nit - I find it a little odd to start a comment with "However" as if you 
are continuing a conversation.



+   pos = index_name_pos(&o->orig_index, path, strlen(path));
+   ce = o->orig_index.cache[pos];
+   if (ce_skip_worktree(ce)) {
+   pos = index_name_pos(&the_index, path, strlen(path));
+   ce = the_index.cache[pos];
+   ce->ce_flags |= CE_SKIP_WORKTREE;
+   }
return mfi.clean;
}
  
diff --git a/t/t3507-cherry-pick-conflict.sh b/t/t3507-cherry-pick-conflict.sh

index 25fac490d..9b1456a7c 100755
--- a/t/t3507-cherry-pick-conflict.sh
+++ b/t/t3507-cherry-pick-conflict.sh
@@ -392,7 +392,7 @@ test_expect_success 'commit --amend -s places the sign-off 
at the right place' '
test_cmp expect actual
  '
  
-test_expect_failure 'failed cherry-pick with sparse-checkout' '

+test_expect_success 'failed cherry-pick with sparse-checkout' '
 pristine_detach initial &&
 git config core.sparseCheckout true &&
 echo /unrelated >.git/info/sparse-checkout &&



Thanks Elijah, I can verify this fixes the problem with the preferred 
solution (ie it preserves the skip-worktree bit).  As such, this is:


Reviewed-by: Ben Peart 

That said, I would propose the test be updated to include a specific 
test for the skip-worktree bit so that if a future patch reverts to the 
old behavior of clearing the skip-worktree bit and writing out the file 
to the working directory, we do it explicitly instead of by accident.


diff --git a/t/t3507-cherry-pick-conflict.sh 
b/t/t3507-cherry-pick-conflict.sh

index 9b1456a7c3..bc8863ff36 100755
--- a/t/t3507-cherry-pick-conflict.sh
+++ b/t/t3507-cherry-pick-conflict.sh
@@ -392,17 +392,17 @@ test_expect_success 'commit --amend -s places the 
sign-off at the right place' '


test_cmp expect actual
 '

-test_expect_success 'failed cherry-pick with sparse-checkout' '
+test_expect_success 'cherry-pick preserves sparse-checkout' '
pristine_detach initial &&
git config core.sparseCheckout true &&
echo /unrelated >.git/info/sparse-checkout &&
git read-tree --reset -u HEAD &&
test_must_fail git cherry-pick -Xours picked>actual &&
+   test "$(git ls-files -t foo)" = "S foo" &&
test_i18ngrep ! "Changes not staged for commit:" actual &&
echo "/*" >.git/info/sparse-checkout &&
git read-tree --reset -u HEAD &&
git config core.sparseCheckout false &&
rm .git/info/sparse-checkout
 '

 test_done



[PATCH 3/9] cache.h: extract enum declaration from inside a struct declaration

2018-07-23 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

While it is technically possible, it is confusing. Not only the user,
but also VS Code's intellisense.

Signed-off-by: Johannes Schindelin 
---
 cache.h | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/cache.h b/cache.h
index 89a107a7f..2380136f6 100644
--- a/cache.h
+++ b/cache.h
@@ -1425,18 +1425,20 @@ extern void *read_object_with_reference(const struct 
object_id *oid,
 extern struct object *peel_to_type(const char *name, int namelen,
   struct object *o, enum object_type);
 
+enum date_mode_type {
+   DATE_NORMAL = 0,
+   DATE_RELATIVE,
+   DATE_SHORT,
+   DATE_ISO8601,
+   DATE_ISO8601_STRICT,
+   DATE_RFC2822,
+   DATE_STRFTIME,
+   DATE_RAW,
+   DATE_UNIX
+};
+
 struct date_mode {
-   enum date_mode_type {
-   DATE_NORMAL = 0,
-   DATE_RELATIVE,
-   DATE_SHORT,
-   DATE_ISO8601,
-   DATE_ISO8601_STRICT,
-   DATE_RFC2822,
-   DATE_STRFTIME,
-   DATE_RAW,
-   DATE_UNIX
-   } type;
+   enum date_mode_type type;
const char *strftime_fmt;
int local;
 };
-- 
gitgitgadget



[PATCH 7/9] vscode: use 8-space tabs, no trailing ws, etc for Git's source code

2018-07-23 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

This adds a couple settings for the .c/.h files so that it is easier to
conform to Git's conventions while editing the source code.

Signed-off-by: Johannes Schindelin 
---
 contrib/vscode/init.sh | 8 
 1 file changed, 8 insertions(+)

diff --git a/contrib/vscode/init.sh b/contrib/vscode/init.sh
index face115e8..29f2a729d 100755
--- a/contrib/vscode/init.sh
+++ b/contrib/vscode/init.sh
@@ -21,6 +21,14 @@ cat >.vscode/settings.json.new <<\EOF ||
 "editor.wordWrap": "wordWrapColumn",
 "editor.wordWrapColumn": 72
 },
+"[c]": {
+"editor.detectIndentation": false,
+"editor.insertSpaces": false,
+"editor.tabSize": 8,
+"editor.wordWrap": "wordWrapColumn",
+"editor.wordWrapColumn": 80,
+"files.trimTrailingWhitespace": true
+},
 "files.associations": {
 "*.h": "c",
 "*.c": "c"
-- 
gitgitgadget



[PATCH 5/9] vscode: only overwrite C/C++ settings

2018-07-23 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

The C/C++ settings are special, as they are the only generated VS Code
configurations that *will* change over the course of Git's development,
e.g. when a new constant is defined.

Therefore, let's only update the C/C++ settings, also to prevent user
modifications from being overwritten.

Ideally, we would keep user modifications in the C/C++ settings, but
that would require parsing JSON, a task for which a Unix shell script is
distinctly unsuited. So we write out .new files instead, and warn the
user if they may want to reconcile their changes.

Signed-off-by: Johannes Schindelin 
---
 contrib/vscode/init.sh | 23 ---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/contrib/vscode/init.sh b/contrib/vscode/init.sh
index 494a51ac5..ba9469226 100755
--- a/contrib/vscode/init.sh
+++ b/contrib/vscode/init.sh
@@ -13,7 +13,7 @@ die "Could not create .vscode/"
 
 # General settings
 
-cat >.vscode/settings.json <<\EOF ||
+cat >.vscode/settings.json.new <<\EOF ||
 {
 "C_Cpp.intelliSenseEngine": "Default",
 "C_Cpp.intelliSenseEngineFallback": "Disabled",
@@ -51,7 +51,7 @@ esac
 
 # Default build task
 
-cat >.vscode/tasks.json <.vscode/tasks.json.new .vscode/launch.json <.vscode/launch.json.new <

[PATCH 4/9] mingw: define WIN32 explicitly

2018-07-23 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

This helps VS Code's intellisense to figure out that we want to include
windows.h, and that we want to define the minimum target Windows version
as Windows Vista/2008R2.

Signed-off-by: Johannes Schindelin 
---
 config.mak.uname | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/config.mak.uname b/config.mak.uname
index 684fc5bf0..2be2f1981 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -528,7 +528,7 @@ ifneq (,$(findstring MINGW,$(uname_S)))
COMPAT_OBJS += compat/mingw.o compat/winansi.o \
compat/win32/pthread.o compat/win32/syslog.o \
compat/win32/dirent.o
-   BASIC_CFLAGS += -DPROTECT_NTFS_DEFAULT=1
+   BASIC_CFLAGS += -DWIN32 -DPROTECT_NTFS_DEFAULT=1
EXTLIBS += -lws2_32
GITLIBS += git.res
PTHREAD_LIBS =
-- 
gitgitgadget



[PATCH 6/9] vscode: wrap commit messages at column 72 by default

2018-07-23 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

When configuring VS Code as core.editor (via `code --wait`), we really
want to adhere to the Git conventions of wrapping commit messages.

Signed-off-by: Johannes Schindelin 
---
 contrib/vscode/init.sh | 4 
 1 file changed, 4 insertions(+)

diff --git a/contrib/vscode/init.sh b/contrib/vscode/init.sh
index ba9469226..face115e8 100755
--- a/contrib/vscode/init.sh
+++ b/contrib/vscode/init.sh
@@ -17,6 +17,10 @@ cat >.vscode/settings.json.new <<\EOF ||
 {
 "C_Cpp.intelliSenseEngine": "Default",
 "C_Cpp.intelliSenseEngineFallback": "Disabled",
+"[git-commit]": {
+"editor.wordWrap": "wordWrapColumn",
+"editor.wordWrapColumn": 72
+},
 "files.associations": {
 "*.h": "c",
 "*.c": "c"
-- 
gitgitgadget



[PATCH 9/9] vscode: let cSpell work on commit messages, too

2018-07-23 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

By default, the cSpell extension ignores all files under .git/. That
includes, unfortunately, COMMIT_EDITMSG, i.e. commit messages. However,
spell checking is *quite* useful when writing commit messages... And
since the user hardly ever opens any file inside .git (apart from commit
messages, the config, and sometimes interactive rebase's todo lists),
there is really not much harm in *not* ignoring .git/.

The default also ignores `node_modules/`, but that does not apply to
Git, so let's skip ignoring that, too.

Signed-off-by: Johannes Schindelin 
---
 contrib/vscode/init.sh | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/contrib/vscode/init.sh b/contrib/vscode/init.sh
index a134cb4c5..27de94994 100755
--- a/contrib/vscode/init.sh
+++ b/contrib/vscode/init.sh
@@ -33,6 +33,8 @@ cat >.vscode/settings.json.new <<\EOF ||
 "*.h": "c",
 "*.c": "c"
 },
+"cSpell.ignorePaths": [
+],
 "cSpell.words": [
 "DATAW",
 "DBCACHED",
-- 
gitgitgadget


[PATCH 8/9] vscode: add a dictionary for cSpell

2018-07-23 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

The quite useful cSpell extension allows VS Code to have "squiggly"
lines under spelling mistakes. By default, this would add too much
clutter, though, because so much of Git's source code uses words that
would trigger cSpell.

Let's add a few words to make the spell checking more useful by reducing
the number of false positives.

Signed-off-by: Johannes Schindelin 
---
 contrib/vscode/init.sh | 169 -
 1 file changed, 168 insertions(+), 1 deletion(-)

diff --git a/contrib/vscode/init.sh b/contrib/vscode/init.sh
index 29f2a729d..a134cb4c5 100755
--- a/contrib/vscode/init.sh
+++ b/contrib/vscode/init.sh
@@ -32,7 +32,174 @@ cat >.vscode/settings.json.new <<\EOF ||
 "files.associations": {
 "*.h": "c",
 "*.c": "c"
-}
+},
+"cSpell.words": [
+"DATAW",
+"DBCACHED",
+"DFCHECK",
+"DTYPE",
+"Hamano",
+"HCAST",
+"HEXSZ",
+"HKEY",
+"HKLM",
+"IFGITLINK",
+"IFINVALID",
+"ISBROKEN",
+"ISGITLINK",
+"ISSYMREF",
+"Junio",
+"LPDWORD",
+"LPPROC",
+"LPWSTR",
+"MSVCRT",
+"NOARG",
+"NOCOMPLETE",
+"NOINHERIT",
+"RENORMALIZE",
+"STARTF",
+"STARTUPINFOEXW",
+"Schindelin",
+"UCRT",
+"YESNO",
+"argcp",
+"beginthreadex",
+"committish",
+"contentp",
+"cpath",
+"cpidx",
+"ctim",
+"dequote",
+"envw",
+"ewah",
+"fdata",
+"fherr",
+"fhin",
+"fhout",
+"fragp",
+"fsmonitor",
+"hnsec",
+"idents",
+"includeif",
+"interpr",
+"iprog",
+"isexe",
+"iskeychar",
+"kompare",
+"mksnpath",
+"mktag",
+"mktree",
+"mmblob",
+"mmbuffer",
+"mmfile",
+"noenv",
+"nparents",
+"ntpath",
+"ondisk",
+"ooid",
+"oplen",
+"osdl",
+"pnew",
+"pold",
+"ppinfo",
+"pushf",
+"pushv",
+"rawsz",
+"rebasing",
+"reencode",
+"repo",
+"rerere",
+"scld",
+"sharedrepo",
+"spawnv",
+"spawnve",
+"spawnvpe",
+"strdup'ing",
+"submodule",
+"submodules",
+"topath",
+"topo",
+"tpatch",
+"unexecutable",
+"unhide",
+"unkc",
+"unkv",
+"unmark",
+"unmatch",
+"unsets",
+"unshown",
+"untracked",
+"untrackedcache",
+"unuse",
+"upos",
+"uval",
+"vreportf",
+"wargs",
+"wargv",
+"wbuffer",
+"wcmd",
+"wcsnicmp",
+"wcstoutfdup",
+"wdeltaenv",
+"wdir",
+"wenv",
+"wenvblk",
+"wenvcmp",
+"wenviron",
+"wenvpos",
+"wenvsz",
+"wfile",
+"wfilename",
+"wfopen",
+"wfreopen",
+"wfullpath",
+"which'll",
+"wlink",
+"wmain",
+"wmkdir",
+"wmktemp",
+"wnewpath",
+"wotype",
+"wpath",
+"wpathname",
+"wpgmptr",
+"wpnew",
+"wpointer",
+"wpold",
+"wpos",
+"wputenv",
+"wrmdir",
+"wship",
+"wtarget",
+"wtemplate",
+"wunlink",
+"xcalloc",
+"xgetcwd",
+"xmallocz",
+"xmemdupz",
+"xmmap",
+"xopts",
+"xrealloc",
+"xsnprintf",
+"xutftowcs",
+"xutftowcsn",
+"xwcstoutf"
+],
+"cSpell.ignoreRegExpList": [
+"\\\"(DIRC|FSMN|REUC|UNTR)\\\"",
+"u[0-9a-fA-Fx]{4}\\b",
+"\\b(filfre|frotz|xyzzy)\\b",
+"\\bCMIT_FMT_DEFAULT\\b",
+"\\bde-munge\\b",
+"\\bGET_OID_DISAMBIGUATORS\\b",
+"\\bHASH_RENORMALIZE\\b",
+"\\bTREESAMEness\\b",
+"\\bUSE_STDEV\\b",
+"\\Wchar *\\*\\W*utfs\\W",
+"cURL's",
+"nedmalloc'ed",
+"ntifs\\.h",
+],
 }
 EOF
 die "Could not write settings.json"
-- 
gitgitgadget



[PATCH 0/9] Add support to develop Git in Visual Studio Code

2018-07-23 Thread Johannes Schindelin via GitGitGadget
[Visual Studio Code](https://code.visualstudio.com/) (nickname "VS Code") is a 
light-weight, cross-platform, Open Source development environment, with an 
increasingly powerful extension to support C/C++ development. In particular the 
intellisense support as well as all the other niceties developers might have 
come to expect from Integrated Development Environments will help accelerate 
development.

This topic branch makes it easy to get started using VS Code to develop Git 
itself.

To get started, run the script `./contrib/vscode/init.sh`. This will initialize 
the `.vscode/` directory and some files in that directory. After that, simply 
open Git's top-level directory as "folder" in VS Code.

The files have to be generated because of the curious way Git determines what 
flags to pass to the C compiler, in particular which constants are defined, 
because they change the compile flow in rather dramatic ways (determining, e.g. 
which SHA-1 backend to use).

Johannes Schindelin (9):
  contrib: add a script to initialize VS Code configuration
  vscode: hard-code a couple defines
  cache.h: extract enum declaration from inside a struct declaration
  mingw: define WIN32 explicitly
  vscode: only overwrite C/C++ settings
  vscode: wrap commit messages at column 72 by default
  vscode: use 8-space tabs, no trailing ws, etc for Git's source code
  vscode: add a dictionary for cSpell
  vscode: let cSpell work on commit messages, too

 .gitignore|   1 +
 cache.h   |  24 ++-
 config.mak.uname  |   2 +-
 contrib/vscode/.gitattributes |   1 +
 contrib/vscode/README.md  |  14 ++
 contrib/vscode/init.sh| 375 ++
 6 files changed, 405 insertions(+), 12 deletions(-)
 create mode 100644 contrib/vscode/.gitattributes
 create mode 100644 contrib/vscode/README.md
 create mode 100755 contrib/vscode/init.sh


base-commit: 53f9a3e157dbbc901a02ac2c73346d375e24978c
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-2%2Fdscho%2Fvscode-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2/dscho/vscode-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/2
-- 
gitgitgadget


[PATCH 1/9] contrib: add a script to initialize VS Code configuration

2018-07-23 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

VS Code is a lightweight but powerful source code editor which runs on
your desktop and is available for Windows, macOS and Linux. Among other
languages, it has support for C/C++ via an extension.

To start developing Git with VS Code, simply run the Unix shell script
contrib/vscode/init.sh, which creates the configuration files in
.vscode/ that VS Code consumes.

Signed-off-by: Johannes Schindelin 
---
 .gitignore|   1 +
 contrib/vscode/.gitattributes |   1 +
 contrib/vscode/README.md  |  14 +++
 contrib/vscode/init.sh| 165 ++
 4 files changed, 181 insertions(+)
 create mode 100644 contrib/vscode/.gitattributes
 create mode 100644 contrib/vscode/README.md
 create mode 100755 contrib/vscode/init.sh

diff --git a/.gitignore b/.gitignore
index 388cc4bee..592e8f879 100644
--- a/.gitignore
+++ b/.gitignore
@@ -206,6 +206,7 @@
 /config.mak.autogen
 /config.mak.append
 /configure
+/.vscode/
 /tags
 /TAGS
 /cscope*
diff --git a/contrib/vscode/.gitattributes b/contrib/vscode/.gitattributes
new file mode 100644
index 0..e89f2236e
--- /dev/null
+++ b/contrib/vscode/.gitattributes
@@ -0,0 +1 @@
+init.sh whitespace=-indent-with-non-tab
diff --git a/contrib/vscode/README.md b/contrib/vscode/README.md
new file mode 100644
index 0..8202d6203
--- /dev/null
+++ b/contrib/vscode/README.md
@@ -0,0 +1,14 @@
+Configuration for VS Code
+=
+
+[VS Code](https://code.visualstudio.com/) is a lightweight but powerful source
+code editor which runs on your desktop and is available for
+[Windows](https://code.visualstudio.com/docs/setup/windows),
+[macOS](https://code.visualstudio.com/docs/setup/mac) and
+[Linux](https://code.visualstudio.com/docs/setup/linux). Among other languages,
+it has [support for C/C++ via an 
extension](https://github.com/Microsoft/vscode-cpptools).
+
+To start developing Git with VS Code, simply run the Unix shell script called
+`init.sh` in this directory, which creates the configuration files in
+`.vscode/` that VS Code consumes. `init.sh` needs access to `make` and `gcc`,
+so run the script in a Git SDK shell if you are using Windows.
diff --git a/contrib/vscode/init.sh b/contrib/vscode/init.sh
new file mode 100755
index 0..3cc93243f
--- /dev/null
+++ b/contrib/vscode/init.sh
@@ -0,0 +1,165 @@
+#!/bin/sh
+
+die () {
+   echo "$*" >&2
+   exit 1
+}
+
+cd "$(dirname "$0")"/../.. ||
+die "Could not cd to top-level directory"
+
+mkdir -p .vscode ||
+die "Could not create .vscode/"
+
+# General settings
+
+cat >.vscode/settings.json <<\EOF ||
+{
+"C_Cpp.intelliSenseEngine": "Default",
+"C_Cpp.intelliSenseEngineFallback": "Disabled",
+"files.associations": {
+"*.h": "c",
+"*.c": "c"
+}
+}
+EOF
+die "Could not write settings.json"
+
+# Infer some setup-specific locations/names
+
+GCCPATH="$(which gcc)"
+GDBPATH="$(which gdb)"
+MAKECOMMAND="make -j5 DEVELOPER=1"
+OSNAME=
+X=
+case "$(uname -s)" in
+MINGW*)
+   GCCPATH="$(cygpath -am "$GCCPATH")"
+   GDBPATH="$(cygpath -am "$GDBPATH")"
+   MAKE_BASH="$(cygpath -am /git-cmd.exe) --command=usrbinbash.exe"
+   MAKECOMMAND="$MAKE_BASH -lc \\\"$MAKECOMMAND\\\""
+   OSNAME=Win32
+   X=.exe
+   ;;
+Linux)
+   OSNAME=Linux
+   ;;
+Darwin)
+   OSNAME=macOS
+   ;;
+esac
+
+# Default build task
+
+cat >.vscode/tasks.json .vscode/launch.json .vscode/c_cpp_properties.json <<\EOF ||
+include Makefile
+
+vscode-init:
+   @mkdir -p .vscode && \
+   incs= && defs= && \
+

[PATCH 2/9] vscode: hard-code a couple defines

2018-07-23 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

Sadly, we do not get all of the definitions via ALL_CFLAGS. Some defines
are passed to GCC *only* when compiling specific files, such as git.o.

Let's just hard-code them into the script for the time being.

Signed-off-by: Johannes Schindelin 
---
 contrib/vscode/init.sh | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/contrib/vscode/init.sh b/contrib/vscode/init.sh
index 3cc93243f..494a51ac5 100755
--- a/contrib/vscode/init.sh
+++ b/contrib/vscode/init.sh
@@ -115,7 +115,19 @@ include Makefile
 vscode-init:
@mkdir -p .vscode && \
incs= && defs= && \
-   for e in $(ALL_CFLAGS); do \
+   for e in $(ALL_CFLAGS) \
+   '-DGIT_EXEC_PATH="$(gitexecdir_SQ)"' \
+   '-DGIT_LOCALE_PATH="$(localedir_relative_SQ)"' \
+   '-DBINDIR="$(bindir_relative_SQ)"' \
+   '-DFALLBACK_RUNTIME_PREFIX="$(prefix_SQ)"' \
+   '-DDEFAULT_GIT_TEMPLATE_DIR="$(template_dir_SQ)"' \
+   '-DETC_GITCONFIG="$(ETC_GITCONFIG_SQ)"' \
+   '-DETC_GITATTRIBUTES="$(ETC_GITATTRIBUTES_SQ)"' \
+   '-DGIT_LOCALE_PATH="$(localedir_relative_SQ)"' \
+   '-DCURL_DISABLE_TYPECHECK', \
+   '-DGIT_HTML_PATH="$(htmldir_relative_SQ)"' \
+   '-DGIT_MAN_PATH="$(mandir_relative_SQ)"' \
+   '-DGIT_INFO_PATH="$(infodir_relative_SQ)"'; do \
case "$$e" in \
-I.) \
incs="$$(printf '% 16s"$${workspaceRoot}",\n%s' \
-- 
gitgitgadget



[PATCH 4/5] coccinelle: put sane filenames into output patches

2018-07-23 Thread SZEDER Gábor
Coccinelle outputs its suggested transformations as patches, whose
header looks something like this:

  --- commit.c
  +++ /tmp/cocci-output-19250-7ae78a-commit.c

Note the lack of 'diff --opts  ' line, the differing number
of path components on the --- and +++ lines, and the nonsensical
filename on the +++ line.  'patch -p0' can still apply these patches,
as it takes the filename to be modified from the --- line.  Alas, 'git
apply' can't, because it takes the filename from the +++ line, and
then complains about the nonexisting file.

Pass the '--patch .' options to Coccinelle via the SPATCH_FLAGS 'make'
variable, as it seems to make it generate proper context diff patches,
with the header starting with a 'diff ...' line and containing sane
filenames.  The resulting 'contrib/coccinelle/*.cocci.patch' files
then can be applied both with 'git apply' and 'patch' (even without
'-p0').

Signed-off-by: SZEDER Gábor 
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 73e2d16926..72ea29df4e 100644
--- a/Makefile
+++ b/Makefile
@@ -564,7 +564,7 @@ SPATCH = spatch
 export TCL_PATH TCLTK_PATH
 
 SPARSE_FLAGS =
-SPATCH_FLAGS = --all-includes
+SPATCH_FLAGS = --all-includes --patch .
 
 
 
-- 
2.18.0.408.g42635c01bc



[PATCH 5/5] coccinelle: extract dedicated make target to clean Coccinelle's results

2018-07-23 Thread SZEDER Gábor
Sometimes I want to remove only Coccinelle's results, but keep all
other build artifacts left after my usual 'make all man' build.  This
new 'cocciclean' make target will allow just that.

Signed-off-by: SZEDER Gábor 
---
 Makefile | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/Makefile b/Makefile
index 72ea29df4e..aee629cbaf 100644
--- a/Makefile
+++ b/Makefile
@@ -2903,7 +2903,10 @@ profile-clean:
$(RM) $(addsuffix *.gcda,$(addprefix $(PROFILE_DIR)/, $(object_dirs)))
$(RM) $(addsuffix *.gcno,$(addprefix $(PROFILE_DIR)/, $(object_dirs)))
 
-clean: profile-clean coverage-clean
+cocciclean:
+   $(RM) contrib/coccinelle/*.cocci.patch*
+
+clean: profile-clean coverage-clean cocciclean
$(RM) *.res
$(RM) $(OBJECTS)
$(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB)
@@ -2915,7 +2918,6 @@ clean: profile-clean coverage-clean
$(RM) -r $(GIT_TARNAME) .doc-tmp-dir
$(RM) $(GIT_TARNAME).tar.gz git-core_$(GIT_VERSION)-*.tar.gz
$(RM) $(htmldocs).tar.gz $(manpages).tar.gz
-   $(RM) contrib/coccinelle/*.cocci.patch*
$(MAKE) -C Documentation/ clean
 ifndef NO_PERL
$(MAKE) -C gitweb clean
@@ -2931,7 +2933,7 @@ endif
$(RM) GIT-USER-AGENT GIT-PREFIX
$(RM) GIT-SCRIPT-DEFINES GIT-PERL-DEFINES GIT-PERL-HEADER 
GIT-PYTHON-VARS
 
-.PHONY: all install profile-clean clean strip
+.PHONY: all install profile-clean cocciclean clean strip
 .PHONY: shell_compatibility_test please_set_SHELL_PATH_to_a_more_modern_shell
 .PHONY: FORCE cscope
 
-- 
2.18.0.408.g42635c01bc



[PATCH 3/5] coccinelle: exclude sha1dc source files from static analysis

2018-07-23 Thread SZEDER Gábor
sha1dc is an external library, that we carry in-tree for convenience
or grab as a submodule, so there is no use in applying our semantic
patches to its source files.

Therefore, exclude sha1dc's source files from Coccinelle's static
analysis.

This change also makes the static analysis somewhat faster: presumably
because of the heavy use of repetitive macro declarations, applying
the semantic patches 'array.cocci' and 'swap.cocci' to 'sha1dc/sha1.c'
takes over half a minute each on my machine, which amounts to about a
third of the runtime of applying these two semantic patches to the
whole git source tree.

Signed-off-by: SZEDER Gábor 
---
 Makefile | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index 8f509576e9..73e2d16926 100644
--- a/Makefile
+++ b/Makefile
@@ -2666,10 +2666,16 @@ check: command-list.h
fi
 
 C_SOURCES = $(patsubst %.o,%.c,$(C_OBJ))
-%.cocci.patch: %.cocci $(C_SOURCES)
+ifdef DC_SHA1_SUBMODULE
+COCCI_SOURCES = $(filter-out sha1collisiondetection/%,$(C_SOURCES))
+else
+COCCI_SOURCES = $(filter-out sha1dc/%,$(C_SOURCES))
+endif
+
+%.cocci.patch: %.cocci $(COCCI_SOURCES)
@echo '' SPATCH $<; \
ret=0; \
-   for f in $(C_SOURCES); do \
+   for f in $(COCCI_SOURCES); do \
$(SPATCH) --sp-file $< $$f $(SPATCH_FLAGS) || \
{ ret=$$?; break; }; \
done >$@+ 2>$@.log; \
-- 
2.18.0.408.g42635c01bc



[PATCH 2/5] coccinelle: use $(addsuffix) in 'coccicheck' make target

2018-07-23 Thread SZEDER Gábor
The dependencies of the 'coccicheck' make target are listed with the
help of the $(patsubst) make function, which in this case doesn't do
any pattern substitution, but only adds the '.patch' suffix.

Use the shorter and more idiomatic $(addsuffix) make function instead.

Signed-off-by: SZEDER Gábor 
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 27bfc196dd..8f509576e9 100644
--- a/Makefile
+++ b/Makefile
@@ -2683,7 +2683,7 @@ C_SOURCES = $(patsubst %.o,%.c,$(C_OBJ))
then \
echo '' SPATCH result: $@; \
fi
-coccicheck: $(patsubst %.cocci,%.cocci.patch,$(wildcard 
contrib/coccinelle/*.cocci))
+coccicheck: $(addsuffix .patch,$(wildcard contrib/coccinelle/*.cocci))
 
 .PHONY: coccicheck
 
-- 
2.18.0.408.g42635c01bc



[PATCH 0/5] Misc Coccinelle-related improvements

2018-07-23 Thread SZEDER Gábor
Just a couple of minor Coccinelle-related improvements:

  - The first two patches are small cleanups.

  - The last three could make life perhaps just a tad bit easier for
devs running 'make coccicheck'.


SZEDER Gábor (5):
  coccinelle: mark the 'coccicheck' make target as .PHONY
  coccinelle: use $(addsuffix) in 'coccicheck' make target
  coccinelle: exclude sha1dc source files from static analysis
  coccinelle: put sane filenames into output patches
  coccinelle: extract dedicated make target to clean Coccinelle's
results

 Makefile | 24 +---
 1 file changed, 17 insertions(+), 7 deletions(-)

-- 
2.18.0.408.g42635c01bc



[PATCH 1/5] coccinelle: mark the 'coccicheck' make target as .PHONY

2018-07-23 Thread SZEDER Gábor
The 'coccicheck' target doesn't create a file with the same name, so
mark it as .PHONY.

Signed-off-by: SZEDER Gábor 
---
 Makefile | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Makefile b/Makefile
index e4b503d259..27bfc196dd 100644
--- a/Makefile
+++ b/Makefile
@@ -2685,6 +2685,8 @@ C_SOURCES = $(patsubst %.o,%.c,$(C_OBJ))
fi
 coccicheck: $(patsubst %.cocci,%.cocci.patch,$(wildcard 
contrib/coccinelle/*.cocci))
 
+.PHONY: coccicheck
+
 ### Installation rules
 
 ifneq ($(filter /%,$(firstword $(template_dir))),)
-- 
2.18.0.408.g42635c01bc



[PATCH 1/1] t7406: avoid failures solely due to timing issues

2018-07-23 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

Regression tests are automated tests which try to ensure a specific
behavior. The idea is: if the test case fails, the behavior indicated in
the test case's title regressed.

If a regression test that fails, even occasionally, for any reason other
than to indicate the particular regression(s) it tries to catch, it is
less useful than when it really only fails when there is a bug in the
(non-test) code that needs to be fixed.

In the instance of the test case "submodule update --init --recursive
from subdirectory" of the script t7406-submodule-update.sh, the exact
output of a recursive clone is compared with a pre-generated one. And
this is a racy test because the structure of the submodules only
guarantees a *partial* order. The 'none' and the 'rebasing' submodules
*can* be cloned in any order, which means that a mismatch with the
hard-coded order does not necessarily indicate a bug in the tested code.

See for example:
https://git-for-windows.visualstudio.com/git/_build/results?buildId=14035&view=logs

To prevent such false positives from unnecessarily costing time when
investigating test failures, let's take the exact order of the lines out
of the equation by sorting them before comparing them.

This test script seems not to have any more test cases that try to
verify any specific order in which recursive clones process the
submodules, therefore this is the only test case that is changed in this
manner.

Signed-off-by: Johannes Schindelin 
---
 t/t7406-submodule-update.sh | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
index 9e0d31700..4937ebb67 100755
--- a/t/t7406-submodule-update.sh
+++ b/t/t7406-submodule-update.sh
@@ -115,17 +115,17 @@ Submodule path '../super/submodule': checked out 
'$submodulesha1'
 EOF
 
 cat ../../actual 
2>../../actual2
) &&
test_i18ncmp expect actual &&
-   test_i18ncmp expect2 actual2
+   sort actual2 >actual2.sorted &&
+   test_i18ncmp expect2 actual2.sorted
 '
 
 cat 

  1   2   >